Introduction

The importance of the gut microbiota in human health and disease is unequivocal. Humans have evolved over millennia with a complex ecosystem of microorganisms, which provides critical health benefits including regulation of the immune system, metabolic processes and homeostatic control1,2,3. Disruption of the microbial balance has been associated with a wide range of diseases including inflammatory bowel disease4,5,6, colorectal cancer7,8,9,10 and extra-intestinal conditions such as obesity11,12,13, diabetes14,15, liver disease16 and autoimmune conditions17,18. Other studies have also linked gut microbiota composition to factors including mode of birth19, ageing20, environmental exposures21 and medication usage22,23.

As our understanding of the gut microbiota increases in the context of complex diseases, there is an ever-increasing reliance on longitudinal sampling of large cohorts of patients and healthy controls. Study participants are often required to collect stool samples at home and return them to processing centres; this may include a postal return of the sample which can take several days to arrive. Based on their physiological characteristics, bacteria in stool have markedly varied growth and survival characteristics when stored at ambient temperature. Storing stool samples at room temperature has been associated with significant changes in the microbial community after 2 days24; the deterioration is primarily attributed to microbial fermentation occurring at room temperature25. Moreover, changes seen in the microbial community after days at room temperature may be caused by environmental factors, such as exposure to aerobic conditions and temperature fluctuations, which can lead to substantial degradation of metabolites in faecal samples post-collection25. Therefore, it is paramount to develop optimal biostabilization approaches for stool samples to ensure participant self-collected samples retain the same microbial composition and metabolomic profiles as freshly collected samples. However, until recently, it was assumed that metabolomic profiles required untreated stool and therefore unbuffered samples were the only choice26.

Various storage solutions are available to preserve the integrity of faecal bacterial composition. These include RNAlater, 95% ethanol, Stratech PSP buffer, OMNIgene-GUT and Metabolokeeper reported27,28. When assessed for the measurement of microbial metabolites, although significantly less studies have been undertaken, RNAlater and OMNIgene-GUT have both yielded poor results after sample storage at room temperature26,29.

Any devised method for collecting stool samples must be participant-friendly and scientifically robust to ensure high-quality reliable data to be obtained irrespective of suboptimal field conditions such as changes in ambient temperature, and variable timing between sample preparation and return to the processing centre. Additionally, there must be flexibility in the collection approach to be effective for all stool consistencies and disease-related changes.

There have been significant efforts made to systematically assess and standardise DNA extraction protocols for human stool samples30. The existing literature predominantly addresses the standardization of optimal storage conditions for samples within the laboratory and effective approaches for metagenomic analysis. There is a significant lack of data and consideration regarding the collection methods utilized for preserving stool samples sent via mail in longitudinal studies, especially before reaching the laboratory. For example, Costea, et al.30 explored the effectiveness of 21 DNA extraction protocols to strive towards standardising human faecal sample processing in metagenomic studies. To determine the most effective DNA extraction method and ensure no bias or other influencing factors were considered, multiple aliquots from two samples were made and shipped to 21 collaborating laboratories in 11 countries. Within 4 h of collection, the samples were homogenized and frozen at – 80 °C30. Standardizing DNA extraction methods for metagenomic studies is a large part of improving human faecal sampling processing, however, the study did not include any consideration for variations in sample collection.

There have previously been few studies which have focused on the impact of preservation medium, time to sample return, and temperature on microbial community structure31. Moreover, there seems to be an almost complete lack of information regarding stool microbial metabolomic profiles when different preservation media are used.

Preserving sample integrity from evacuation to laboratory processing is essential for developing patient stool collection strategies in microbial analysis. The inclusion of a preservation buffer ensures complete metabolite profiles, resembling fresh samples, enhancing authenticity and reliability in microbiome studies. Additionally, enabling subjects to collect and mail samples from home supports increased volunteerism, expands study populations, facilitates serial sample collection, and extends the geographical reach of effective study incorporation by allowing samples to be shipped at ambient temperature.

In this study, we describe a systematic evaluation of the effects of various preservation media and storage conditions on the composition of stool microbiota as analysed by 16S rRNA gene profiling (V1–V2 region) using Illumina MiSeq sequencing and microbial metabolic output, in the form of gas chromatography-mass spectrometry (GC–MS) of short-chain fatty acid (SCFA) profiles. The objective of the study was to define a suitable preservation strategy that allowed stable microbiota profiles to be retained in stool samples when stored at ambient temperature prior to sample return.

Results

To evaluate the effectiveness of the various faecal stabilisation buffers, faecal samples from 6 healthy subjects were processed/homogenised within 1 h of collection. One-gram aliquots of each faecal sample were added to tubes containing 8 ml of RNAlater (RNAlater), 95% ethanol (ethanol) or Invitek PSPStool stabilising buffer (PSP) or kept without buffer (dry stool) and stored at room temperature (20 °C), 4 °C or – 80 °C (Gold standard/original) (Table 1; Fig. 1). We found considerable variation in the quantity of extracted DNA across the different preservation methods, which aligns with previous reports30. Compared to unbuffered (dry stool) samples, PSP buffered stool samples yielded similar DNA quantities (p = 0.065), however, both ethanol and RNAlater buffered samples gave significantly lower amounts of DNA despite repeated sample extraction attempts (p,0.0001 RNAlater; p = 0.022 ethanol; Fig. 2). Based on the extremely low DNA yield obtained following RNAlater stabilisation, a washing step was introduced before DNA extraction, which dramatically improved DNA yield to levels comparable to dry stool and PSP buffered samples (p = 0.6 RNAlater washed vs dry stool). RNAlater samples without PBS washing were removed from further analysis. This yielded 20 unique samples per participant (120 samples in total).

Table 1 Sample categorisation (buffer used, storage temperature and duration) and sample inclusion/exclusion in 16S bioinformatic analysis.
Figure 1
figure 1

Study protocol depicting the sampling strategy for participants. Created with BioRender.com.

Figure 2
figure 2

Comparison of DNA yields between sample preservation methods. Equal amounts of stool were used in each preservation method. DNA extraction was undertaken in parallel as described previously49. Comparisons in DNA yield were performed using Mann Whitney U testing, statistically significant outcomes are marked as bold. Red horizontal line represents group median.

Quality control for variability in microbial composition

The 120 samples were sent for 16S rRNA amplicon sequencing, even if DNA concentrations were below the recommended cut-off. Out of 120 DNA samples, 94 yielded sufficient 16S rRNA sequence reads to be included in the bioinformatic analysis (Table 1). Of the 26 samples that failed to generate sufficient sequence reads, six were unbuffered samples among which four were the – 80 °C stored samples. Six PSP buffer samples failed to sequence, of which five were stored at 4 °C and all six had very low DNA yield (< 1 ng/μl). Only one RNAlater stored sample failed to sequence (participant 4: 1-day room temperature). Thirteen failed samples were ethanol-buffered samples, including two of the – 80 °C samples, four of the room temperature samples, and seven of the 4 °C samples. Ethanol-buffered sample failures were recorded in five of the six participants.

A total of 5,204,740 16S rRNA sequence reads were obtained following quality filtering and denoising, equating to 1–310,224 reads per sample. After read merging and removal of chimeras, the number of mapped sequencing reads, per sample, ranged from 0 to 89,249 (mean ± SEM: 38,806 ± 3276; Table S1). Samples with a final depth of < 4628 were removed giving a mean ± SEM of 48,609 ± 3445 (Table S1). Removal of rare amplicon sequence variants (ASVs) was carried out, defined as ASVs with a mean relative abundance of ≤ 1e−5, which reduced the initial 3965 ASVs down to 1,406 but kept relative abundance for each sample at > 0.98. Rarefaction analysis, using only retained ASVs and samples, demonstrated sufficient sequencing depth (Fig. S1). After the removal of rare ASVs, a total of 4,538,690 sequences, equating to 48,284 ± 3427 (mean ± SEM) reads per sample remained. A total of 30,539 rare ASV sequences were removed, equating to 325 ± 25 (mean ± SEM) reads per sample (Table S1).

Impact of preservation medium and storage conditions on faecal microbial diversity

Global changes in the gut microbiota were visualised by Principal Coordinate Analysis (PCoA) ordination of unweighted UniFrac distances (Fig. 3A). Statistical analysis of the ordination methods was carried out with PERMANOVA. Analysis indicated a clear clustering of samples by participant, with significant PERMANOVA adjusted P values for each participant pairwise comparison (Table S2). Within the participant groups, samples were clustered by the type of buffer used; temperature and storage condition did not significantly contribute to variability in microbiota composition (Fig. S2A–C; Table S2). Moreover, interrogating both buffer and storage condition was not significant (Fig. S2D; Table S2). Axis.1 of the plot seems to mainly separate Participant 1’s PSP and RNAlater samples from the other samples (Fig. S2A). Most of the samples are within the − 0.05 and 0.01 coordinates whilst the PSP and RNAlater samples from Participant 1 are in the coordinate range of – 0.015 and – 0.075.

Figure 3
figure 3

Gut microbiota composition in study subjects. (A) Principal component analysis of stool microbiota composition based on DADA2 produced ASVs. (B) Relative abundance at the order level derived from 16S rRNA gene sequences within each sample. Bar charts are faceted by participant (1–6) and buffer used metadata information. (C) Diversity (observed species, chao1 and Shannon) of the microbiota of stool samples stratified by storage buffer (ethanol, no buffer, PSP buffer, RNAlater).

The changes caused by the four main metadata/effect categories (buffer, participant, storage temperature, and number of storage days) were investigated by divergence violin plots (Fig. S3A,B). Differences between participants led to the highest dissimilarity scores. The buffer used caused higher dissimilarity compared to the storage temperature or storage days showing it had a larger impact on the microbiome composition. Differences between the buffer used were examined (Fig. S3B). The two buffers that produced the most similar results were PSP & RNAlater. Additionally, the 16S composition of the PSP and RNAlater samples were more similar to the composition of the No buffer samples compared to Ethanol samples. Ethanol had a wide range of dissimilarity values in its comparisons with No buffer, PSP, and RNAlater.

In line with other studies, bacteria belonging to Bacteroidota and Firmicutes phyla were the most abundant, comprising > 50% of sequences except for unbuffered and ethanol participant one and four samples (1C, 1Q, 4B, 4Q; Fig. 3B). These samples were stored at room temperature and the samples 1C, 1Q and 4B had a bloom of Enterobacteriaceae. The bloom was not seen in the reciprocal samples stored in the fridge or buffer indicating this was most likely due to room temperature storage of raw stool. Sample 4Q was almost entirely composed of the order Campylobacterales. Alpha diversity scores of the stool microbiota was quantified by observed ASVs, chao1 and Shannon. These values were visualised with box plots and statistical analysis of groupings was carried out with Paired Wilcoxon tests (Fig. 3C; Fig. S4B; Table S3). No significant differences based on buffer type was found (Fig. 3C). The largest variations were caused by the differences between participants, with these differences having the only significant results with Paired Wilcoxon tests (Fig. S4B; Table S3). Further alpha diversity interrogation to define the impact of storage conditions, temperature and days stored, confirmed that these factors were not influencing diversity profiles (Fig. S4A–C; Table S3). Interrogating on both the type of buffer used and storage condition highlighted the lowest diversity scores in samples stored at room temperature as well as the greatest variability in ethanol-buffered samples (Fig. S4D). However, these variations were not consistent across all participants (Fig. S4E–F). Collectively analysis confirmed that PSP and RNAlater buffered samples most closely recapitulated the microbial diversity profile of the original sample. Irrespective of storage time and/or temperature, both buffers robustly retained the microbial diversity profile of the original samples without the inclusion of the Enterobacteriaceae bloom in room temperature samples.

We next examined the microbiome profiles to see which genera were significantly enriched or depleted in the various storage buffers and conditions compared to the gold standard baseline sample (immediate – 80 °C storage with no buffered sample). This was carried out with MaAsLin2. Overall, 24 bacterial genera were shown to alter in abundance between gold standard baseline samples and the various test conditions (Fig. 4). Without exception Prevotella abundance was depleted in all test conditions. As expected, ethanol-buffered samples showed the most variability compared to the gold standard with depleted genera including Parabacteroides, Alistipes, Bacteroides and Oscillospira and enriched genera including Sutterella, Haemophilus, Cloacibacillus, Anaerostipes and Adlercreutzia (Fig. 4; Table S4). RNAlater storage was associated with the depletion of three genera, namely Ruminococcus, Subdoligranulum and Sporobacter WAL 1855D (Fig. 4; Table S4). Of the buffer options, PSP-stored samples shared the most similarity with gold standard baseline samples. Further stratification at the participant level was also undertaken across the top 24 genera (Fig. S5). Collectively analysis confirmed that PSP and RNAlater buffered samples most closely recapitulated the microbial diversity profile of the no buffer/gold standard samples. Irrespective of storage time and/or temperature, both buffers robustly retained the microbial diversity profile of the original (immediately − 80 °C frozen) samples without the inclusion of the Enterobacteriaceae bloom in room temperature samples.

Figure 4
figure 4

Biomarkers heatmap. Heatmap showing the genera (y-axis) with significant associations discovered by Maaslin2. Comparisons were carried out using the type of buffer used and the storage conditions metadata information (x-axis). Storage condition included Temperature (Fridge:4 °C or RT:20 °C) and Days stored before being frozen at − 80 °C (1D: One day, 3D: Three days), Baseline samples were immediately frozen at − 80 °C. The no buffer at Baseline:– 80 °C group was used as the reference. Participant numbers were used for random effects for the model. Values on heatmap are (− log(qval)*sign(coeff)).

Impact of preservation medium and storage conditions on faecal SCFA profiles

We next assessed faecal short-chain fatty acid (SCFA) profiles using GC–MS. Faecal concentrations of the main SCFAs acetate, propionate, butyrate, iso-butyrate, valerate and iso-valerate, were detected in almost all samples apart from ethanol-preserved samples (Fig. 5A; Table S6). Ethanol buffer sample SCFA profiles were significantly altered compared to all other samples both in terms of SCFA profile and concentration. On the basis that DNA yields were low, microbial composition by 16S rRNA gene sequencing was only achieved in 57% of samples and SCFA profiles were compromised. Comparing the SCFA profiles with Bray–Curtis PCoA and PERMANOVA statistical analysis found that the samples stratified by participant, buffer, participant & buffer, and buffer & storage temperature (Fig. 5B,C; Table S6). The main difference was caused by the choice of buffer with all PERMANOVA pairwise comparisons producing significant results. Each buffer group formed distinct clusters with some outlier samples. The ethanol samples had more outliers from their main cluster compared to the other buffers with much greater distances (Fig. 5B). Due to this PCoA ordination was carried out excluding the ethanol samples (Fig. 5C). This along with the PERMANOVA analysis (Table S6) showed overlapping participant clusters. PERMANOVA pairwise comparison of the participants found significant differences between 1 & 2, 1 & 4, and 1 & 6 only. No significant difference was found for Temperature or Storage conditions (temperature & storage time).

Figure 5
figure 5

GC–MS derived SCFA profiles in study samples. (A) Relative abundance of SCFAs in study samples faceted based on the type of buffer used (Ethanol, No buffer, PSP buffer, or RNAlater). (B) Bray–Curtis based principal component analysis (PCoA) of stool SCFA composition (set to even sampling depth of 1000) stratified by participant and storage temperature (Baseline:– 80 °C, Fridge:4 °C or RT:20 °C). Bottom plots are zoomed in sections to display clustering without outliers. (C) Bray–Curtis based PCoA excluding ethanol samples of stool SCFA composition (set to even sampling depth of 1000) stratified by storage time and temperature (Baseline:– 80 °C, Fridge:4 °C or RT:20 °C).

Comparing the buffer and temperature groupings found various differentiations. None of the groupings of the same buffer were found to be significantly different, except PSP baseline and PSP fridge (Table S6). The Ethanol & buffer groupings were significantly different to all the other buffer groupings (including no buffer). The PSP and RNAlater temperature groupings were all significantly different. The no buffer fridge and room temperature groups were significantly different to all the PSP groupings whilst the No buffer baseline group was not. The only significant difference between no buffer and RNAlater groups were between RNAlater at room temperature and no buffer at baseline and fridge temperature.

Additionally, buffered sample data identified variability caused by room temperature samples and the baseline sample from participant 3 (RNAlater) and the baseline sample from participant 6 (PSP; Fig. 5B). Interrogation to assess whether the length of storage (1 or 3 days) altered SCFA profiles, revealed only participant 3 RNAlater data appeared to show an effect due to the longer storage time (Fig. 5C).

We then looked to see if any set of participant samples was driving variability more than the rest. We assessed all pairwise Bray–Curtis distances within participant samples to define intra-participant variability which showed that there was no difference between participants indicating all sample sets were behaving equally (Fig. S6). We excluded the ethanol samples due to their high variability.

We further investigated changes caused by the four main metadata/effect categories (buffer, participant, storage temperature, and number of storage days), excluding ethanol (Fig. S7A,B). The participants caused the highest difference followed by the buffer used. Storage temperature and storage days caused the lowest difference. When comparing the differences between the specific buffers used PSP and RNAlater had the most similar profiles (Fig. S7B). There were low, medium, high, and extreme values when comparing no buffer against PSP and against RNAlater.

In summary, SCFA analysis indicated that PSP buffered samples were the most robust across storage time and temperature whilst preserving the integrity of the original (immediately − 80 °C frozen) sample.

Assessing the relationship between stool gut microbiota and SCFA profiles

The relationship between stool microbes and SCFA profiles was determined in samples that yielded both data sets. Spearman’s correlation analysis between stool microbes and SCFA profiles was carried out based on the sample set irrespective of storage strategy on relative abundance. There were 102 significant (adjusted p-value (fdr) < 0.05) correlations observed between the groups, linking 56 bacterial genera with SCFAs (Fig. 6).

Figure 6
figure 6

Correlation of bacterial genera with SCFA metabolites. Spearman’s correlation was carried out on all the samples with matching ASV (≥ 4528 reads) and SCFA data. Relative abundance values of SCFAs and genera were used for correlation. Only genera with significant correlations are displayed (adjusted p-value (fdr) < 0.05), all correlation values between the SCFAs and these genera are displayed. The y-axis represents the SCFA metabolites. The x-axis represents the genera of bacteria (based on 16S rRNA gene sequences). Red and blue indicate positive and negative correlation, respectively.

Pelomonas had the most significant correlations with SCFAs. A significant negative correlation with acetate, propionate, isobutyrate, butyrate, and valerate was identified (Fig. 6; Table S7). On the other hand, it had a significant positive association with isovalerate. There were four and three significant positive correlations with cc_115 and Bacteroides, respectively, and one significant negative correlation each (Fig. 6; Table S7). Overall, 20 and 26 of the genera only had a significant correlation with two and one SCFAs, respectively (Fig. 6; Table S7).

Isobutyrate had 22 correlations; 10 positive and 12 negative (Fig. 6; Table S7). Acetate, propionate, butyrate, isovalerate, and valerate had more than 10 significant correlations each (Fig. 6; Table S7). Formate, lactate, and succinate had significant positive correlations with two, one and five genera, respectively (Fig. 6; Table S7).

The same correlation analysis was carried out after subsetting the data by each buffer used (Table S8). RNAlater showed the most correlations with 87 whilst only two correlations were discovered within the ethanol data. The no buffer and PSP data found 26 and 45 correlations, respectively. For each of these datasets around half of the correlations were also discovered in the full dataset (no buffer [14/26], RNAlater [45/87], ethanol [1/2], PSP [25/45]). Correlations with eight genera were not found within the full dataset correlations but were identified in the subsetted datasets, these included 02d06, Lachnospiraceae_Clostridium, Odoribacter, Parabacteroides, Phascolarctobacterium, Rothia, and Turicibacter.

Discussion

Our study centred on evaluating the acceptability, safety, and effectiveness of commercially available stool stabilization solutions, with a primary emphasis on maintaining sample microbial community structure integrity. Most longitudinal studies require participants to provide self-collected stool samples and rely on postal service returns of samples. Many studies are recruiting participants across a wide geographic area meaning there can be significant delays; sometimes days between sample collection and return to the processing laboratory. Until now, the predominant emphasis in studies on stool sample collection, transport, storage, and analysis has been on samples supplied within a laboratory setting or those that can be promptly frozen. The focus of our study was to identify a preservation method which retained sample integrity in terms of bacterial composition and metabolite profiles in patient self-collected stool samples returned by surface mail.

During the transportation of samples from the participant to the laboratory, the environmental conditions surrounding the samples may vary, and the temperature of the samples can differ among participants situated in diverse geographical locations and seasons. Low-temperature (– 80 °C) cryopreservation is the ‘gold standard’ for sample preservation24,32; however, this is not immediately possible for samples being collected at home and shipped via mail. Our study assessed the influence of storage temperature (– 80 °C, 4 °C and 20 °C) on each preservation buffer, revealing no temperature-related impact on microbial communities in any of the buffered samples. This underscores the stability of the preservation buffers at all temperatures, thereby enhancing their overall dependability.

Ethanol preservation resulted in the largest number of problems with many samples failing to generate sequencing or SCFA profiles compared to both PSP and RNAlater. Similar results were found in the Byrd, et al. (2020) study when 95% ethanol was compared to RNAlater as a storage buffer for faecal samples33. Ethanol is commonly used as a preservation buffer due to its ability to inactivate nucleases through penetration32. Previously, Erben et al. (2014) assessed the effectiveness of various solvents for human stool metabolomics analysis, including phosphate-buffered saline, isopropanol, methanol, ethanol, acetonitrile34. Solvents were added to stool samples in the laboratory, meaning the need to consider participant acceptability in terms of chemical safety was not a feature of their study. Despite differences in the panel of metabolites assessed and the chromatographic techniques used, both the Erben study and our findings confirmed that ethanol alone is suboptimal for stool metabolomic profiling. Metabolomic profiles were improved in the Erben study by adding methyl tert-butyl ether (MTBE) alongside ethanol. However, MTBE is a carcinogen, most commonly used as an additive for unleaded petrol, and therefore is unsuitable for participant sample collection strategies34. On the other hand, the long-term storage of stool samples at – 80 °C has shown to be advantageous with the addition of 70% ethanol; yet, storing samples at 4 °C with 70% ethanol showed low similarity to fresh samples35. Moreover, an earlier study investigated sample self-collection and shipping methods in functional human microbiome studies36. Samples were self-collected and then stored on ice for delivery to the laboratory facilities within 24 h. Aliquots of each stool sample were either fixed in ethanol or RNAlater and stored at ambient temperature for 48 h to stimulate shipping conditions (mock-shipped). From the 24 samples analysed from eight participants, it was concluded that the mock-shipped samples had minimal effect on metabolic profiling, particularly in contrast to the typically large intrasubject differences36. It should also be noted that ethanol is flammable, and therefore its use is restricted in terms of transportation and may not be suitable for posting samples32,37.

RNAlater demonstrated comparable results to PSP buffer. However, the process required an additional washing step before DNA extraction extended the processing time which is a factor to consider for laboratories handling large numbers of samples. RNAlater has previously been shown as a viable alternative to immediate freezing, particularly for short-term storage over a few days, serving as an option to preserve the faecal microbiome in metagenomic studies38. Additionally, the use of RNAlater does not compromise the subject-specificity or time-point specificity of the gut microbiome. In a laboratory setting, a meta-analysis demonstrated that storing samples in RNAlater for 1 month did not significantly alter alpha diversity compared to samples stored at – 80 °C, suggesting RNAlater is effective for preservation of alpha diversity within this timeframe32. As previously mentioned, the study conducted by Franzosa et al. showed using RNAlater was an effective preservation buffer for samples being mailed36. Our study findings provide additional support to the existing literature, endorsing the use of RNAlater as an effective preservation buffer.

Based on DNA concentration values, microbial 16S profiling and SCFA results, the Invitek PSPStool stabilising buffer appears to be the best buffer solution for preserving microbial integrity compared to RNAlater and ethanol. The research into preservation buffers for stool sample analysis is limited. Interestingly, a study conducted by Wu et al. (2021) showed that a self-made preservation buffer kept faecal microbiota stable at room temperature for up to 4 weeks and endured high temperatures at 50 °C for up to 5 days37. The preservation buffer consisted of ethylenediaminetetraacetic acid (EDTA) disodium salt dihydrate, sodium citrate trisodium salt dihydrate and ammonium sulphate. The high temperatures were representative of conditions mimicking that of shipping samples during the summer seasons. Whilst such an approach is feasible for small studies, commercially available preservation buffers provide the necessary quality control, batch reproducibility and global availability required for longitudinal cohort studies.

Many variables may influence the outcome and performance of microbial community profiling, and our intention was to compare preservation buffers in use in the field although we acknowledge other buffers are available. In the study, a standardised extraction approach was used and performed all protocols in-house rather than engaging other established microbiome laboratories. However, we have previously undertaken such quality assurance studies and have demonstrated high correlation with other laboratories39. It is also important to highlight that individual study participants handling of their samples will also influence data integrity. In our experience, the readability of sample collection instructions and explaining (in layperson terms) to participants the reasons for following instructions is key to successful participant engagement. Pictorial representations are also an effective way to convey sample collection detail.

The study results demonstrated that reproducible SCFA profiles can be obtained from samples buffered with PSP and RNAlater buffer. This discovery has the potential to broaden the capabilities of previously collected and stored buffered stool biobanks. The RNAlater buffered stool metabolite profiles contrasts the findings of Wang et al.29 which failed to generate metabolite profiles from RNAlater buffered samples. However, technical differences in metabolite analysis methodologies including the use of different extraction solvents, methanol (Wang) vs ether (current study) highlight the need to assess all factors. While we cannot generalize these findings to imply that all metabolites can be measured using buffered stool, our discoveries open a previously unexplored source material avenue. To further validate our findings and move towards a standardized approach to sample transportation in studies with an outpatient setting, studies are required to explore PSP and RNAlater as potential preservation buffers. These future studies could include assessing the effects of higher storage temperatures to mimic summer temperatures, comparing PSP and RNAlater against a wider range of other preservation buffers—including the recently described ‘Metabolokeeper®28, increasing sample storage time, and conducting untargeted metabolomics screening to look at the complete metabolite profile.

The gut microbiota has a huge influence on host physiology which is linked to their metabolic activity. The major end products of bacterial fermentation in colon are organic acids such as acetate, propionate, butyrate, lactate and succinate. The abundance of SCFAs present in stool samples is dependent on levels of available substrate but also the scale of cross-feeding happening within the bacterial community. In this study relationships between SCFAs and individual taxa were largely concordant in both PSP buffer and RNAlater although these often deviated from the ‘no buffer’ sample. The ethanol buffered samples failed to demonstrate any relationships. The study confirmed several previously reported positive associations including Bacteroides, Phascolarctobacterium, Lachnospira, Faecalibacterium and propionate. The succinate pathway is the major route for propionate formation from dietary carbohydrates and is present in Bacteroidetes species and Firmicutes within the Negativicutes class which includes all the bacteria described above40. Additionally, a few negative associations were identified including Anaerostipes, Caldicoprobacter, Dialister and propionate, Barnsiella, Coprococcus, Lachnobacterium and valerate, Cloacibacillus, Oxalobacter and butyrate. The functional implications of these correlations require further interrogation as these have not been reported previously.

In conclusion, we have demonstrated the benefit of incorporating stool preservation buffers into participant stool collection strategies for microbiota profiling and targeted metabolomic analysis. The inclusion of PSP or RNAlater into stool collection protocols may help improve cross-study comparisons as well as extend the utility of patient-stored sample banks. Our study demonstrates that the selection of sample preservation buffer largely impacts the observed microbial composition. Additionally, we provide the first confirmation that targeted metabolomic assessment is possible with specific sample preservation buffers. Standardizing stool sample shipping methods enhances research reliability, reduces costs, and facilitates global studies. Consistent protocols not only minimize sample degradation but also enable meaningful cross-study data comparisons, fostering a comprehensive understanding of the human gut microbiota.

Materials and methods

Subject recruitment and sample collection

To assess the impact of storage buffer, temperature, and time on microbial community composition and SCFA profiles, we collected a single stool sample from six healthy individuals (defined as participants 1–6) who had not taken antibiotics for 3-months prior to sample donation. Within 1 h of sample collection, samples were homogenised, and 1 g aliquots of stool were maintained as ‘no buffer’ (no buffer) or 1 g aliquots were added to tubes containing 8 ml of RNAlater (RNAlater), 95% ethanol (ethanol) or Invitek PSPStool stabilising buffer (PSP). Tubes with buffers were vortexed to thoroughly mix the contents. Post preparation, samples were stored at room temperature (20 °C), 4 °C or – 80 °C (Fig. 1). Samples to be stored immediately at – 80 °C were divided into 8 replicate aliquots prior to storage to standardise freeze/thawing across different storage conditions. Samples stored at room temperature or 4 °C, were kept for either 1 or 3 days before being divided into 8 replicate aliquots and stored at – 80 °C until extracted as a complete study batch. Equal weight of stool was aliquoted to ensure comparability in downstream analyses. ​​The study was conducted in accordance with the Declaration of Helsinki and approved by the North of Scotland Research Ethics Committee (Study Code 09/S0802/106). Informed consent was obtained from all subjects.

DNA extraction and 16S sequencing

Genomic DNA was extracted from stool samples using the commercially available Stratech PSP kit (Stratech, Germany) using minor modifications of the manufacturer’s instructions, which included mechanical lysis (beadbeater, MP Bio, UK; 30 s at max speed using Lysing Matrix E bead tubes) during the initial lysis period to ensure complete lysis of the stool samples prior to DNA extraction. The no buffer samples had 8 ml of Stool DNA stabiliser added, followed by being divided into eight replicate aliquots before DNA extraction. PSP buffered samples proceeded directly to extraction whilst RNAlater and ethanol buffered samples were centrifuged at 13,000 rpm and the supernatant discarded. Each pellet received a total volume of 1 ml of Stool DNA stabiliser, followed by vortexing before DNA extraction. Based on low DNA concentration values from RNAlater buffered samples, an additional wash step was introduced. The additional wash step involved adding 2 ml of PBS to RNAlater pellets followed by sample vortexing prior to further centrifuging at 13,000 rpm; the PBS/supernatant was discarded. Faecal DNA was quantified by Nanodrop mass spectrophotometry. The V1–V2 region of the 16S rRNA gene was amplified using 27F and 338R primers as described previously41. Briefly, the primers were designed with the Illumina adapter overhang already included. Amplification was performed using the Q5 polymerase kit following the manufacturer’s instructions (NewEnglandBio, Ipswich, MA, USA). Post-amplification, samples were purified using AMPure XP (Beckman Coulter, Brea, CA, USA) according to the manufacturer's protocol. The samples were then indexed using the Nextera XT IndexKit V2 (Illumina, San Diego, CA, USA) and KAPA HiFi Hotstart ReadyMix (Kapa Biosystems, Cape Town, South Africa) with a short cycle PCR step followed by a clean-up with AMPure XP. The libraries were quantified using Quant-iT™ dsDNA Assay Kit HS (Thermo Fisher Scientific, Waltham, MA, USA) and analysed on a FLUOstar Omega plate reader (BMG LABTECH, Ortenberg, Germany). The library size was determined using the Agilent 2200 TapeStation (Agilent Technologies, Santa Clara, CA, USA). The libraries were pooled at equimolar concentrations in preparation for sequencing. Sequencing was performed using an Illumina MiSeq sequencer (Illumina, San Diego, CA, USA) using IlluminaV3 chemistry and paired-end 2 × 300 base pair reads. All sequencing was performed in a single MiSeq run. Initial sequence data processing was performed by the Illumina MiSeq Reporter to de-multiplex samples and remove adapter and primer sequences sequence data was exported in the FASTQ format41.

Bioinformatic analysis

The 16S rRNA gene sequence data was further processed using QIIME2 version 2020.642. Cutadapt was utilised to remove the PCR primers (Forward: AGMGTTYGATYMTGGCTCAG; Reverse: GCTGCCTCCCGTAGGAGT)43. DADA2 denoising was carried out with a forward truncation length of 270, a reverse truncation length of 220, and a quality truncation threshold of 244. This was carried out on each separate Illumina run. The resulting ASV table artifact files and representative sequence artefact files were merged into one ASV table artifact file and one representative sequence artefact file. Samples with an ASV < 4,628 were removed. The ASV representative sequences were taxonomically classified using the Greengenes 13_8 database, creating a taxonomy artefact45. The classifier was trained with a Naïve Bayes method using reads extracted from the 16S region of interest46. A phylogenetic tree was reconstructed of the ASV representative sequences; these sequences were aligned with MAAFT47. The alignment was masked with a maximum gap frequency of 1 and a minimum conservation of 0.448. An un-rooted tree was created with FASTTREE49 and rooted at the midpoint.

SCFA analysis

SCFA concentrations were measured in stool samples using gas chromatography as described previously50. Dry stool samples were reconstituted in sterile water (8 ml per 1 g) and, alongside buffered stool samples, were transferred to 15 ml falcon tubes to be centrifuged at 3000 g for 15 min; after this, 1 ml aliquots of the supernatant fluids were removed for subsequent analysis. Internal standards, as described previously50, were added to a final concentration of 5 mmol/l to samples. Acids were then extracted by the addition of 0.5 ml concentration of hydrochloric acid and 2 ml of diethyl ether with samples, then vortexed for 1 min to mix them. After centrifugation at 3000 g for 10 min, the ether layer was removed and transferred to a new vial; a further 1 ml diethyl ether was added to the aqueous layer and a second extraction was performed. The ether extracts were combined, and N-methyl-N-t-butyldimethylsilyltrifluoroacetamide (Sigma) were added before heating at 80 °C for 20 min. The reaction mixture was left at room temperature for a further 24 h to ensure complete derivatization of lactic acid. After derivatisation, 1 μl of the sample was analysed using an Agilent Technology’s 6809N (Agilent Technologies, UK) gas chromatograph fitted with a fused silica capillary column (J&W HP-1 GC Column, 30 m, 0.25 mm, 0.10 µm, 7-inch cage; Agilent Technologies) with helium as a carrier gas. Peaks were identified according to the retention time based on SCFA standards (Sigma). SCFA concentrations were calculated from the relative response factor with respect to the internal standard for two-ethylbutyrate and were based on a single sample.

Statistical analysis and image creation

Comparisons in DNA yield were performed using Mann Whitney U testing. Figures were created using R (version 4.2.1) and various R packages (R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/). The following link (https://m-gemmell.github.io/Gemmell_et_al_2023/01-Intro.html) contains a full record of the statistical analysis and figure production carried out in R, this record was created by the R package bookdown for both the 16S rRNA and SCFA data51. The R packages phyloseq and microbiome were used to carry out analysis and create plots with phyloseq objects52,53,54,55. The R package MaAsLin2 was used to carry out differential abundance analysis of the 16S genera data47. The Vegan package was used to carry out PAERMANOVA tests of the beta diversity ordination plots with the adonis2 function48. Qiime2 artifacts were converted into a phyloseq object with the R package qiime2R (qiime2R: Importing QIIME2 artifacts and associated data into R sessions)56. The R packages ggplot2, ggforce, ggh4x, were used for plot generation and customisation57,58,59. Various functions from the tidyverse, dplyr, reshape2, and microViz were used60,61,62,63. The R package DT was used to produce HTML-friendly tables64.