Introduction

Intestinal colonization with extended-spectrum cephalosporin-resistant (ESC-R) Enterobacterales (Ent), such as extended-spectrum β-lactamase (ESBL)- and/or plasmid-mediated AmpC (pAmpC)-producing Escherichia coli, may lead to difficult-to-treat infections1,2. Importantly, the high global prevalence of people colonized at the intestinal level with ESBL-producing E. coli is estimated to be 17.6% (range: 6–35.1%) and 21.1% (range: 4.9–45.6%) in both community and healthcare settings, respectively3. In addition, colonization rates of 0.3%–3.2% for pAmpC-E. coli have been reported in healthy people4,5,6. Overall, such E. coli pathogens contribute to the growing burden of antimicrobial resistance in human health, which is associated with an estimated 1.3 million deaths worldwide7. Therefore, there is an urgent need for the development of rapid and comprehensive diagnostic tests to accurately identify pathogenic bacteria and their associated antimicrobial resistance genes (ARGs) responsible for gut colonization8,9.

Standard culture-based methods for ESC-R-Ent intestinal colonization screening (e.g., patient hospitalized, travelers) are typically performed on stool or swab specimens in 24 h, but further characterization of strains requires more time (e.g., susceptibility tests in an additional 24 h). In contrast, molecular-based methods (e.g., those PCR-based) typically result in faster turnaround times (TATs)1, although they accurately identify only targeted ARGs or underlying disease-causing bacterial species (e.g., diarrheagenic E. coli) and they do not simultaneously characterize resistome and microbiome.

Shotgun metagenomic sequencing (SMS) aims for a comprehensive characterization of the gut contents. Indeed, many recent studies using the short-read Illumina-based SMS (Illumina-SMS) approach have demonstrated its utility, as it can provide a comprehensive view of the resistome and species composition [e.g.,10,11,12,13,14]. However, we have previously shown that Illumina-SMS may lack the sensitivity to detect epidemiologically relevant ESBL and AmpC genes such as blaCTX-Ms or blaDHAs, respectively15. In addition, Illumina-SMS may not have the resolution to identify ARGs (e.g., blaCTX-Ms/blaDHAs) and their associated genetic environments [AGEs; i.e., chromosome or mobile genetic elements (MGEs)]16.

In contrast to Illumina-SMS, the long-read Nanopore-based SMS (Nanopore-SMS) may simultaneously identify important ARGs and their AGEs, as well as the potentially pathogenic bacteria that carry them in the stool (e.g.,16,17,18,19,20,21,22,) in short sequencing times (< 1 h) (e.g.,18,19,). However, most studies have so far used small sample sizes of representative clinical samples17,18,19,20,21, which may bias and overestimate the sensitivity of Nanopore-SMS to detect pathogens/ARGs present in low concentrations, as we have previously shown for stool15. Regardless of this, the fast sequencing times of the Nanopore-SMS approach for detecting pathogens and ARGs promise to be advantageous in the diagnostic setting of the near future23. In this context, we note that the feasibility of Nanopore-SMS as a screening tool for intestinal colonization in samples from healthy individuals has not yet been assessed.

In this study, we therefore evaluated the current Nanopore-SMS technology as a potential screening tool for intestinal colonization. We used a PCR-free approach using the latest Nanopore chemistry (i.e., R10.4.1) in combination with a culture-based selective pre-enrichment to target blaCTX-M and blaDHA genes in stool samples of healthy people. Finally, we compared the performance of native and pre-enriched SMS methods at multiple sequencing time points.

Results

In Fig. 1, we summarize the overall workflow of the present study. In particular, we show the analyses described in the following sections, such as stool sample processing, Nanopore sequencing, and bioinformatic pipelines used for screening ARGs (blaCTX-Ms/blaDHAs) and their associated AGEs.

Fig. 1: Graphical representation of the workflow of the stool sample processing, sequencing and bioinformatics analysis pipeline used in this study.
figure 1

In workflow (a), we show stool sample processing, genomic DNA (gDNA) isolation, further gDNA purification and quality control (QC), as well as library preparation and Nanopore sequencing. Bioinformatic analyses are shown in the workflow (b), which is divided into read preprocessing and analysis [e.g., screening of antimicrobial resistance genes (ARGs) and associated genomic elements (AGEs)] of the entire sequencing run (48 h) and at each sequencing time point (6 in total). The start of each workflow is represented by a green circle, while arrows indicate the next step in the process. Only estimated processing times (hours, h; minutes, m) for major workflows (e.g., gDNA isolation, MAG assembly, and polishing) are shown. Figure 1 was created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license (https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en). a Workflow (a) processing times were estimated based on manual processing of up to 6 stool samples, while workflow (b) processing times were estimated based on processing of a 1542.6 Mb fastq file (see methods for computational requirements).

Culture-based and Illumina-SMS screening

This study included a subset of 25 frozen stool samples that were previously screened and characterized by culture-based methods and Illumina-SMS. In particular, according to the native culture-based method, 15 stools carried ESC-R-Ent, whereas 10 resulted in a negative result. In contrast, using the selective pre-enriched approach (gold standard), 22 stools were positive, whereas 3 were negative15.

ESC-R-Ent concentration in native stool [colony forming units per gram of stool (CFU/g); range: 1.78 × 102–3.97 × 108], colonization status by culture-based (native and pre-enriched) and Illumina-SMS [native (n = 25; 7 positives / 18 negatives); pre-enriched (n = 21; 14 positives / 7 negatives)] are shown in Supplementary Data 1. Overall, compared to the gold standard, the sensitivity and specificity of native culture-based, native Illumina-SMS, and pre-enriched Illumina-SMS were 75.9% and 100%, 59.5% and 100%, 78.3% and 75%, respectively.

Strain whole-genome sequencing (WGS)

For the purpose of this study, 26 ESC-R-Ent isolated from the 22 blaCTX-M-/blaDHA-positive stool samples after pre-enrichment were further characterized to generate complete genome assemblies using both Illumina and Nanopore platforms (Supplementary Table S1). In addition, these strains were used as reference sequences to validate the AGEs of the blaCTX-M/blaDHA genes inferred by Nanopore-SMS (see below).

As shown in Supplementary Table S1, the recovered strains consisted of the majority of E. coli (n = 24) of various sequence types (STs) including ST131 (n = 3), ST1193 (n = 2), ST648 (n = 1), and ST38 (n = 1) clones. Most of the ESC-R-Ent carried only blaCTX-Ms (n = 23, of which 15 blaCTX-M-15), both blaCTX-Ms and blaDHA-1 (n = 2) or blaDHA-1 alone (n = 1) associated to MGEs (i.e., plasmids, n = 18) or chromosome (n = 9).

Nanopore-SMS: native vs. pre-enriched SMS

From the 25 stools, 50 paired samples were subjected to native (n = 25) and selective pre-enriched (n = 25) Nanopore-SMS. The overall sequencing output (48 h run) generated a mean of 213,677.2 and 335,932.6 reads (sequencing depth) with a read length range of 748.1–2198-bp and 1042–3663.9-bp for the native and pre-enriched SMS, respectively (Supplementary Fig. S1; Source Data file). After preprocessing (adapter trimming and human read decontamination), a mean of 205,101.3 and 328,411.9 reads were observed, corresponding to ~4% and ~2.2% read loss, with a read length range of 643–2295-bp and 930.1–3568.9-bp, respectively.

For the final read dataset used in downstream analyses (preprocessed reads), in comparison to native, pre-enriched Nanopore-SMS resulted in significantly greater sequencing depth (205,101.3 vs. 328,411.9 reads; p < 0.001), longer mean read length (1355.4-bp vs. 2437.5-bp; p < 0.0001), mean read length N50 (2807-bp vs. 4819.5-bp; p < 0.0001), and better mean quality score (Q-Score; 13.7 vs. 14.4; p < 0.0001), respectively (Supplementary Fig. S1).

Performance of Nanopore-SMS screening of bla CTX-M/bla DHA genes

The preprocessed reads, and meta-assembled genomes (MAGs) from native and pre-enriched Nanopore-SMS were used to detect blaCTX-Ms/blaDHAs along with their AGEs (i.e., plasmids or chromosomes).

Specifically, for downstream analyses, we used high-quality MAGs that were polished in 3 different steps (metaFlye 5X, Racon 4X, and Medaka 1X), which overall contained fewer fragmented genes [i.e., Benchmarking universal single-copy orthologs (BUSCO)] than the original metaFlye assembly (Supplementary Table S2). Furthermore, the final polished MAGs possessed blaCTX-Ms/blaDHAs and their AGEs with a low frequency of nucleotide variants (i.e., single nucleotide polymorphisms, SNPs) as opposed to a single polishing step, supporting the identification of the genes described below (Supplementary Table S3). Overall, compared to the gold standard, the sensitivity and specificity to detect blaCTX-M or blaDHA genes for native Nanopore-SMS were 61.1% and 100%, whereas for the pre-enriched Nanopore-SMS, they were 81.5% and 75%, respectively (Supplementary Data 1). One false-positive sample (S1-TAN-01) was detected with the pre-enriched SMS, which corresponded to a blaCTX-M-202-like read. In particular, this single read (~ 3 kb) partially aligned to a small region of a ~94 kb blaCTX-M-15-positive plasmid in a K. pneumoniae strain (Supplementary Data 1) that was not detected by culture-base methods (i.e., no isolation of CTX-M-producing strains).

In the native Nanopore-SMS-positive samples (n = 8), blaCTX-Ms could be detected directly from reads (n = 8) and MAGs (n = 1; S1-ESP-03), whereas no blaDHAs were detected in samples confirmed to possess bacterial hosts carrying them (i.e., S1-CAM-02, S1-IND-02, S1-SRL-01). In contrast, in the pre-enriched Nanopore-SMS-positive samples (n = 18), blaCTX-Ms or blaDHAs were detected in both reads and MAGs (n = 14) or only in reads (4 blaCTX-Ms) (Supplementary Data 1).

Identification of the bla CTX-M and bla DHA genetic environment by Nanopore-SMS

The location of blaCTX-Ms or blaDHAs could be associated with a plasmid (n = 9) or chromosome (n = 4) genetic environment implementing the Nanopore-SMS (Supplementary Data 1 and Supplementary Table S1).

In 2 native Nanopore-SMS-positive samples, the location of a single blaCTX-M-117-like read in sample S1-NGR-02 had a 97.7% identity and 5% coverage to IncFIB(H89-PhagePlasmid)-type plasmid p2-S1-NGR-02-A, while the S1-ESP-03 MAG-positive for blaCTX-M-15 mapped with 99.9% identity and 100% coverage to IncFII-type plasmid p1-S1-ESP-03-A (Supplementary Data 1 and Fig. 2).

Fig. 2: Circular BLAST alignments of blaCTX-M-/blaDHA-positive reads or MAG contigs to reference plasmids.
figure 2

In the figure, we show 9 Nanopore-SMS samples where the location of blaCTX-M/blaDHA genes could be inferred from a reference plasmid sequence (Supplementary Data 1). In each circle, we show the sample name in the center, the reference sequence as the inner circle, and the comparison sequence(s) as the outer circles with corresponding colors. Below each alignment, both the reference plasmid and the comparison sequences (MAG contig or reads) are shown along with their corresponding length in base pairs (bp). The alignment identity and coverage for each comparison are shown below. For each visualization, we show blaCTX-M/blaDHA genes in red and other ARGs in blue; replicon sequences are shown in gray. Guanine cytosine (GC) content and skew legend are shown in the upper right corner.

In 8 of the pre-enriched Nanopore-SMS-positive samples, blaCTX-M-15 could be linked to IncFII-type (n = 3; 99.9-100% identity and 44–100% coverage), one hybrid plasmid of IncFII(pRSB107)- and IncFIB(AP001918)-type (p1-S1-IND-02-A; 99.9% identity and 61% coverage), and to an IncB/O/K/Z-type plasmid (p1-S1-KEN-03-A; 100% identity and 99% coverage) (Supplementary Data 1 and Fig. 2).

Other blaCTX-M variants such as blaCTX-M-55 and blaCTX-M-176-like were associated to an IncFII- and IncFIB(AP001918)-type (pN16EC0879-1; 99.9% identity and 44% coverage) and to IncFIB(H89-PhagePlasmid)-type (p2-S1-NGR-02-A; 97-97.5% identity and 8-9% coverage) plasmids, respectively. Furthermore, the blaDHA-1 gene was found co-localized with blaCTX-M-27 in a single IncFII/IncFIB(AP001918)/IncFIA-type hybrid plasmid (p1-S1-CAM-02-A; 99.9% identity and 99% coverage). Lastly, in 3 samples (S1-IVC-02, S1-THA-01, S1-KEN-07), the blaCTX-Ms were harbored in the chromosome of ST10, ST131, ST1193-like E. coli strains [average nucleotide identity (ANI) range: 99.83%–99.99%], while blaDHA-1 inserted in the chromosome of an ST394 E. coli (ANI: 99.99% identity) (Supplementary Data 1 and Supplementary Fig. S2).

Time required to detect bla CTX-M/bla DHA and their associated genetic environments (AGEs)

To determine the minimum time required to detect blaCTX-Ms/blaDHAs and their AGEs, the complete sequencing run (48 h) was analyzed at 6 individual time points (T1h, T3h, T6h, T12h, T24h, and T48h).

At each time point, sequencing depth increased for both native and pre-enriched Nanopore-SMS, as shown by the increase in total ARGs detected (Fig. 3a; Source Data file). However, significantly more ARGs were identified by the pre-enriched Nanopore-SMS at all sequencing time points [T1h to T48h; p < 0.001].

Fig. 3: Antimicrobial resistance gene (ARG) counts in both native (n = 25) and pre-enriched (n = 25) samples and overall presence of blaCTX-M/blaDHA reads at 6 different sequencing time-points (T1h to T48h).
figure 3

In (a), box plots represent the differences in ARG counts between Nanopore-SMS types (native vs. pre-enriched). The median is represented by a line in the center of the box, while the 25th and 75th percentiles correspond to the lower and upper bounds of the box, respectively. The lower and upper whiskers extend the box and represent data points outside the interquartile range (1.5 times). Circles represent random data points, and arrows represent extreme outliers not shown in the plot [S1-INA-02 (pre-enriched); 56, 68, and 75 ARGs at T12h, T24h, and T48h timepoints, respectively]. The non-parametric Wilcoxon signed-rank test (two-sided) was used to compare groups (native vs. pre-enriched) at each time point. Statistical significance between groups is indicated by significance level (Holm corrected for multiple comparisons) noted with an asterisk (***, p < 0.001). The corrected and exact p-values for the time points 1, 3, 6, 12, 24, and 48 h are 0.000672, 0.000672, 0.000642, 0.000642, 0.000642, and 0.000672, respectively. In (b), the stacked bar plots represent the cumulative sum of blaCTX-M- and blaDHA-type reads identified at 6 different sequencing time points. The number of reads positive for blaCTX-M or blaDHA genes is shown within the bar plots. The percentage of positive samples at each time point is shown above the bar plots as indicated in the plot legend in the center. Source data are provided as a Source Data file.

Similarly, 20 reads corresponding to blaCTX-M genes were rapidly detected within T1h of sequencing, while blaDHAs reads (n = 8) were detected within T3h. Moreover, at only T3h, the proportion of blaCTX-M and blaDHA reads corresponding to native and pre-enriched Nanopore-SMS-positive samples was 50% and 61.1%, respectively (Fig. 3b; Source Data file). Subsequently, the increase in the number of native and pre-enriched Nanopore-SMS-positive samples was 62.5% and 83.3% at T12h and 75% and 88.9% at T24h, respectively.

Finally, to fully identify the plasmid-associated blaCTX-M or blaDHA genes, a minimum of 1X coverage (reads mapping a reference sequence) was achieved between T12h and T24h of sequencing (Fig. 4a; Source Data file). In contrast, 1X coverage was accomplished between T1h and T3h for chromosomally-associated blaCTX-M and blaDHA-1 genes (Fig. 4b; Source Data file).

Fig. 4: Distribution of mapping coverage of plasmid and chromosome reads to reference sequences at different sequencing time-points.
figure 4

Considering all reads, in Figure (a), we show box plots that represent the mean mapping coverage at multiple time points (T1h to T48h) per sample [n = 11 plasmid mappings per time point (2 each for S1-NGR-02 and S1-ESP-03); black dots] to reference plasmid sequences (Fig. 2 and Supplementary Data 1). Similarly, in Figure (b), the box plots represent the mean mapping coverage at multiple time points per sample (n = 4 chromosome mappings per time point) to reference chromosomal sequences (Supplementary Fig. S2). For both figures, the median is represented by a line in the center of the box, while the 25th and 75th percentiles correspond to the lower and upper bounds of the box, respectively. The lower and upper whiskers extend the box and represent data points outside the interquartile range (1.5 times). The coverage values are shown as log2(X + 1) transformed; the 1X coverage (not log2 transformed) is marked with a red line (Source data are provided as a Source Data file).

Microbiota composition: native vs. pre-enriched Nanopore-SMS

To understand the differences in microbiota composition using both Nanopore-SMS approaches (native and pre-enriched), species-level classification of long reads was performed (Fig. 5a; Source Data file).

Fig. 5: Relative and differential abundance of species in the native (n = 25) and pre-enriched (n = 25) Nanopore-SMS complete sequencing run (T48h).
figure 5

In (a), the relative abundance plot shows the top 10 species common to all native and pre-enriched samples (shown in pairs); other species that are not in the top 10 are shown in gray. In (b), we show the log2 fold change of the top 10 differentially abundant species in pre-enriched vs. native (comparison) samples ranked by p-value [two-sided Wald test (DESEq2 default) adjusted for multiple comparisons using the Benjamini-Hochberg correction; alpha was set to 0.01 cutoff]; individual log2 fold values are shown above the colored circles. Species marked with an arrow correspond to species in the top 10 relative abundance plot (a). Source data are provided as a Source Data file.

In the total sequencing run (T48h), the top 10 species composition shared by both Nanopore-SMS approaches consisted of commensal Gram-positive (e.g., Enterococcus spp., Bifidobacterium adolescentis) and Gram-negative (e.g., E. coli, Phocaeicola spp.) bacteria. Notably, a clear compositional difference in the mean proportion of E. coli relative to other bacteria was observed between the two Nanopore-SMS approaches (3.1% and 47%, native and pre-enriched samples, respectively) (Fig. 5a; Source Data file).

Differential species abundance analysis showed that E. coli and Enterococcus spp. were significantly more abundant (5.69 and 7.36–7.99 log2 fold, respectively) in the pre-enriched than in the native SMS approach (Fig. 5b; Source Data file).

Other differences in the microbiota composition between both Nanopore-SMS methods were evident throughout the sequencing run (T1h to T48h). No statistically significant differences in the mean number of observed species between native and pre-enriched Nanopore-SMS (e.g., ~2762 vs. 2063 species at T48h, respectively) were found (Fig. 6a; Source Data file). In contrast, less bacterial diversity (lower Shannon diversity index, SDI) within samples was observed in the pre-enriched Nanopore-SMS at each sequencing time point (p < 0.0001) than in the native Nanopore-SMS (Fig. 6a; Source Data file). Finally, differences in the clustering of the Nanopore-SMS type were also noticeable, as shown by the non-metric multidimensional scaling (NMDS) ordination of the Bray-Curtis dissimilarity matrix [permutational multivariate analysis of variance using distance matrices (Adonis); p < 0.001 (p = 1.00E-04) (Fig. 6b; Source Data file).

Fig. 6: Alpha and beta diversity measures for native (n = 25) and pre-enriched (n = 25) Nanopore-SMS.
figure 6

In (a), the observed (left) and Shannon (i.e., Shannon diversity index, SDI; right) diversity measures are shown per sequencing time point (T1h to T48h). In both cases, the data points are represented by circles colored by Nanopore-SMS type (native or pre-enriched). For observed, the y-axis represents the number of observed species, while for Shannon, the SDI. Differences in the observed number of species between paired samples from native and pre-enriched groups were tested with a pairwise t test (two-sided), while differences in the SDI between paired samples were tested with the non-parametric Wilcoxon signed-rank test (two-sided). Statistical significance between groups is indicated by the significance levels (Holm-corrected for multiple comparisons) noted with an asterisk (ns not significant; ****, p < 0.0001). For the t test, the exact and corrected p-values for the time points 1, 3, 6, 12, 24, and 48 h are 0.0969, 0.0774, 0.0765, 0.0774, 0.0756, and 0.0765, respectively. For the Wilcoxon signed-rank test, the exact and corrected p-values for the time points 1, 3, 6, 12, 24, and 48 h are 5.37E-07, 4.76E-07, 5.37E-07, 3.58E-07, 5.37E-07, and 3.58E-07, respectively. In (b), a non-metric multidimensional scaling (NMDS) ordination of the Bray-Curtis dissimilarity matrix is shown (2 dimensions). Paired native and pre-enriched samples are colored by Nanopore-SMS type while sequencing time points are colored by the unique symbols shown in the legend. The stress value of the ordination is shown in the plot. Source data are provided as a Source Data file.

Discussion

The capacity of the Illumina-SMS for the detection of pathogens or clinically-relevant ARGs (e.g., those encoding ESBLs and/or carbapenemases) in stool has been demonstrated in numerous studies (e.g., refs. 10,11,12,13,14,15). However, the Nanopore-SMS promises to be a more viable alternative due to its lower instrument cost, long-read technology capable of identifying AGEs, and shorter sequencing times with the ability to detect ARGs in real-time24.

Feasibility of Nanopore-SMS as a colonization screening tool

In this study, the native Nanopore-SMS achieved sensitivity and specificity of 61.1% and 100%, respectively (Supplementary Data 1). However, we were able to increase the detection sensitivity by selectively targeting the resistance mechanisms of interest, which in this case were the blaCTX-M and blaDHA genes carried by ESC-R-Ent. In fact, a short (6 h) pre-enrichment step in the presence of cefuroxime [3 µg/mL] followed by Nanopore-SMS resulted in improved sensitivity (81.5%) at the expense of reduced specificity (75%), a phenomenon consistent with the results of our previous study using the Illumina-SMS15. Moreover, as for the PCR-based methods1, the occurrence of false-positive results is a common challenge for SMS-based methods, where the detection of ARGs does not necessarily indicate an association with alive bacteria25. For this reason, complementary culture-based methods remain the gold standard in the clinical setting and should be used to validate SMS screening results.

Only a few studies using the Illumina-SMS have implemented a culture-based pre-enrichment to increase the detection of ARGs and/or pathogens in stool15,26,27,28. However, a pre-enrichment approach has not been explored with Nanopore-SMS. In fact, many small Nanopore-SMS studies have used native representative clinical stool samples (e.g., for studying the resistome from diseased individuals16,19,20,21,29), which may overestimate the feasibility of Nanopore-SMS to detect pathogens and/or ARGs.

Nevertheless, it is important to note that clinical stool samples may be associated with higher concentrations of bacteria in the stool (e.g., > 107 CFU/g) than in healthy people13,15,30,31. We emphasize that in the present study, for the detection of blaCTX-Ms/blaDHAs from reads, the concentration of bacteria in the native stool ranged from 102 to 108 CFU/g for Nanopore-SMS-positive samples (Supplementary Data 1). However, an accurate genetic location of blaCTX-Ms/blaDHAs from MAGs was only determined in one native Nanopore-SMS-positive sample (S1-ESP-03) with a concentration of 3.97 × 108 CFU/g, whereas the pre-enriched Nanopore-SMS detected plasmids and chromosomes from a starting minimum load of 1.86 × 102 CFU/g (Fig. 2 and Supplementary Fig. S2). Therefore, to perform an amplification-free Nanopore-SMS screening aimed at detecting low levels of antibiotic-resistant bacteria (carrying ARGs) and their AGEs in stool, a selective pre-enrichment is beneficial, as native Nanopore-SMS requires a higher bacterial concentration in stool7.

Nanopore-SMS can detect plasmid- or chromosomally-harboring bla CTX-M and bla DHA genes

One of the critical aspects of intestinal colonization of ESC-R-Ent is not only the association with increased risk of infection1, but also the epidemiological impact of acquiring pathogens carrying important ARGs in MGEs that have the strong ability to disseminate via conjugation to further bacterial species1,2,32,33,34.

With the pre-enriched Nanopore-SMS approach, we were able to detect blaCTX-Ms/blaDHAs carrying plasmids of various complexities (Fig. 2). In 4 of 9 plasmids, complete plasmid structures were determined by MAG, which corresponded to hybrid- [IncFII/IncFIA/IncFIB(AP001918)], IncB/O/K/Z-, and IncFII-type plasmids, while the rest (n = 5) resulted in partial plasmid reconstructions of similar replicon-type. Importantly, such replicon-type plasmids (e.g., IncB/O/K/Z and IncFII) are hyperepidemic and well-known to be carried by E. coli or other closely-related species (e.g., Klebsiella pneumoniae, Escherichia ruysiae), which sometimes may be associated with carbapenem resistance, intestinal colonization after traveling to endemic regions, and/or urinary tract infections (UTIs)34,35,36. Furthermore, with the use of pre-enriched MAGs, we were able to accurately determine the chromosomal location of one blaDHA-1 and 3 blaCTX-Ms in ST394, ST10, ST1193-like, and ST131 E. coli strains (Supplementary Fig. S2). Such hyperepidemic E. coli lineages are known to be associated with human extraintestinal infections (e.g., UTIs) and may carry further chromosomal- or plasmid-associated ARGs encoding carbapenem or colistin resistance traits2,37,38 (Supplementary Fig. S2). To the best of our knowledge, strain resolution at the chromosomal ST level has not been achieved in previous Nanopore-SMS studies using stool samples without the need for Illumina short reads18. Therefore, this approach (pre-enriched Nanopore-SMS) is not only suitable for the detection of hyperepidemic plasmids but also for the potential identification of clinically relevant chromosomal ARGs (e.g., AmpCs or OXA β-lactamases) in Gram-negatives.

We note that with the implementation of a 3-step MAG polishing, a higher quality of sequence data of blaCTX-Ms/blaDHAs, as well as their AGEs was usable (Supplementary Table S3). In particular, fewer SNPs and a lower frequency of deletions/insertions were noted, especially with the pre-enriched SMS MAGs (Supplementary Table S2)39,40. Therefore, the Nanopore-SMS, especially with pre-enriched samples, can be used as an intestinal colonization surveillance tool with the ability to identify epidemiologically important pathogens/MGEs and enable rapid implementation of infection control measures to prevent outbreaks with high clinical and economic impacts8,9. In addition, the use of high-quality polished MAGs could allow for high-resolution SNP analyses, important for the evolution and tracking of outbreaks41.

Important preanalytical considerations in Nanopore-SMS

Regardless of the Nanopore-SMS approach utilized (native or pre-enriched), factors such as genomic DNA (gDNA) isolation, gDNA input (concentration) during library preparation, systematic long-read errors, and many others may be major drawbacks of Nanopore-SMS affecting detection sensitivity41,42. For instance, the effect of different methods used to isolate stool gDNA on SMS performance is well studied [e.g.,43,44,45,]. In fact, an optimized stool gDNA isolation method specifically for long-read SMS-based studies has recently been developed, but its implementation (DNA extraction timing: 8 h)29 requires longer TATs46.

In our study, the stool gDNA isolation protocol (<2 h long using a commercial kit; see Methods) resulted in mean read lengths of 1355.4-bp and 2437.5-bp for native and pre-enriched Nanopore-SMS, respectively. Such outputs were sufficient to detect ARGs, their genetic location, and the pathogen of interest (E. coli) (Supplementary Data 1, Fig. 2 and Supplementary Figs. S1, S2). However, other gDNA isolation methods or commercial kits (including automation) must be compared in future studies as they may result in improvements in sample processing (i.e., faster execution times) and sequencing performance (e.g.,45,47,).

Notably, both Nanopore-SMS approaches were implemented using the newest Nanopore R10.4.1 flow cell and the rapid barcoding kit (RBC; kit 14 chemistry). This new generation kit, specifically the RBC, requires a low gDNA input (50 ng) for the library preparation as opposed to previous versions (~ 400–1500 ng; e.g., RBC, native, ligation-based) used in earlier studies15,17,19,20,21,47.

We have found, as for the Illumina-SMS, that a low input gDNA library preparation may also be sufficient for Nanopore-SMS studies. However, a pre-enriched vs. native Nanopore-SMS approach may be used to compensate for potential loss in resolution (i.e., to detect pathogens/ARGs of interest), improving sequencing depth and quality scores. Although some previous flow cell versions (e.g., R9.4.1) and compatible chemistries may be discontinued in late 2024 (store.nanoporetech.com/), further and continuous comparative studies are needed to validate new methodologies (e.g., on-device adaptive enrichment17), upcoming Nanopore chemistries and development of SMS-specific kits.

The short Nanopore-SMS TAT makes it promising for the clinical context

Nanopore-SMS has the potential to be used for rapid and comprehensive identification of pathogens and associated AGEs in the clinical laboratory in the near future46. For example, in a study by Yee R et al. Nanopore-SMS was implemented on clinical samples (swabs), identifying a blaKPC-possessing K. pneumoniae in 2.3 minutes of sequencing4,18. In another study, Nanopore-SMS was used to characterize the resistome and microbiota in the stool of preterm infants, successfully detecting pathogenic bacteria (K. pneumoniae, Enterobacter cloacae) in 1 h of sequencing19.

Consistent with the above results, with the implementation of pre-enriched Nanopore-SMS, we were able to identify both plasmid- and chromosomally-associated blaCTX-M and blaDHA genes at 1X coverage within T1h, with all genomic locations being determined at 1X coverage at T24h and T3h, respectively (Fig. 4). As for the blaCTX-Ms and blaDHAs, the implementation of pre-enriched Nanopore-SMS resulted in their fast identification at T1h and T3h of sequencing, respectively (Fig. 3b). As discussed above, the use of a pre-enriched Nanopore-SMS approach resulted not only in increased detection of blaCTX-Ms/blaDHAs and their genetic environments, but also in the detection of more ARGs in all time-points compared to the native SMS (Fig. 3a). We hypothesize that the results of the pre-enriched SMS may in fact be due to the selective pressure of the pre-enrichment itself (i.e., cefuroxime), an observation consistent with our previous Illumina-SMS results and those of other groups15,27,28. On the other hand, these results indicate that a pre-enriched Nanopore-SMS approach alone may not be sufficient to fully study the resistome and obtain a representative sample of all ARGs present in the stool.

The two Nanopore-SMS approaches also showed fundamental differences in the microbiota composition output. We have previously shown that using the pre-enrichment Illumina-SMS resulted in more E. coli classified reads than the native Illumina-SMS15. Consistent with that study, the effect of pre-enrichment Nanopore-SMS was also evident in the larger proportion of E. coli classified reads and the differential abundance compared to native Nanopore-SMS (47% vs. 3.1% and 5.69 log2 fold, respectively) (Fig. 5; Source Data file).

Furthermore, despite no significant differences in the observed number of species between the two Nanopore-SMS methods, both varied in species composition as the pre-enriched SMS samples were significantly less species diverse (Fig. 6). This dissimilarity in both ARGs and species composition may also be explained by the effect of the specific selective pre-enrichment implemented15,26,27,28. Therefore, as anticipated above, a pre-enrichment coupled with a Nanopore-SMS approach should only be used for high-resolution targeted screening purposes (e.g., detection of ARGs and AGEs) and not to infer the overall resistome or microbiota composition in stool samples.

Compatibility with the clinical laboratory of the future

In this study, native stool samples were ready for sequencing in ~ 3 h compared to ~ 9 h for pre-enriched samples (Fig. 1). This was followed by a maximum of 48 h of sequencing and ~ 4:13 h of bioinformatic analyses, including: (i) read preprocessing (~ 6 min, m), (ii) MAG generation and polishing [~ 4 h: ~ 3 h MetaFlye assembly with 5 polishing iterations; ~ 20 m Racon polishing (4 iterations); ~ 40 m Medaka polishing (1 iteration)], (iii) ARG and AGE screening, and (iv) read and MAG characterization (~ 7 m total).

Since our results show that native stool samples are not ideal for screening the resistome, pre-enriched stool is so far a compromise that guarantees a better detection of ARGs (Fig. 3a)15. Moreover, this higher detection of ARGs occurs early enough during sequencing that it is possible to perform sequencing runs for only 6 h (as opposed to 48 h), which in this study was sufficient to detect ~ 72% of all pre-enriched SMS-positive samples (Fig. 3b). Importantly, given that 6 h pre-enrichment prior to sequencing may prolong TATs, shorter culture pre-enrichment times (e.g., 3-4 h) should be explored as they may also result in ARG detection improvements1.

The generation of high-quality MAGs for the purpose of detecting ARGs and their AGEs was possible with the 3-steps polishing protocol implemented in this study (Fig. 1). Such polishing approach (i.e., with Racon and Medaka tools) is a well-established methodology to improve metagenome assemblies39. Although further optimization could reduce the total time (~ 4 h) required to perform all polishing steps (e.g., 5 metaFlye polishing iterations in ~ 1 h after assembly), the generation of high-quality MAGs allowed for the accurate identification of plasmids and chromosomes containing low sequence variants (e.g., SNPs, insertions, deletions) (Fig. 2 and Supplementary Table S3). On the other hand, if the goal is to identify only ARGs directly from MAGs (e.g., blaCTX-Ms/blaDHAs), then further polishing after the initial metaFlye assembly is not necessary as the quality of these genes is not improved (Supplementary Table S3). However, if MAGs are not required, ARGs can be derived directly from the Nanopore reads (e.g., using a KMA-based approach), as was done in this study, saving ~ 4 h of analyses (Figs. 13).

It is therefore evident that the Nanopore-based SMS is a technology that has the potential to be optimized and tailored to the clinical laboratory for the detection of pathogens, ARGs, and their AGEs in stool10,23,24. Especially with the short TAT requirements of the clinical laboratory, speed in both sequencing and data processing is necessary46. In this context, further optimizations to reduce culture-based pre-enrichments and computational time (e.g., MAG polishing) should be addressed in future studies. Moreover, the cost of Nanopore sequencing (e.g., the portable MinION device and kit used in this study), as well as the low computational requirements to perform downstream analyses (e.g., the mid-level workstation used in this study), are essential and attractive options for laboratories wishing to evaluate such technologies48. Finally, for laboratories and researchers interested in implementing rapid Nanopore-SMS pipelines for screening clinical specimens, further studies using diseased stool specimens (e.g., patient screening) compared to the gold-standard culture-based methods should be conducted as a final proof-of-concept prior to implementation in real-world scenarios.

In conclusion, the implementation of the latest Nanopore V14 chemistry and R10.4.1 flow cells in SMS-based studies has the potential to be used for intestinal colonization screening of multidrug-resistant bacteria of clinical interest. As a proof-of-concept, we showed that Nanopore-SMS could rapidly (as fast as 1 h of sequencing) detect blaESBL and blapAmpC genes, their AGEs, and, sometimes, also their bacterial host present in the stool. As a result, this makes the Nanopore-SMS approach a promising and potent tool adaptable to clinical and epidemiological investigations. However, to ensure more sensitive detection using amplification-free Nanopore-SMS approaches, a culture-based selective pre-enrichment must be implemented. In this context, a strategic decision to target ARGs of clinical interest (e.g., blaCTX-Ms/blaDHAs) must be made prior to screening with Nanopore-SMS. Further optimization of stool gDNA isolation techniques and implementation of rapid bioinformatic analyses are also essential for potential clinical applications.

Methods

The study was conducted in accordance with the research guidelines of the Declaration of Helsinki and the research requirements of the University of Bern. Ethical approval was obtained by the Ethikkommission des Kantons Bern (https://www.gsi.be.ch/de/start/ueber-uns/kommissionen-gsi/ethikkommission.html) (Approval number: 2020-01683).

Study characteristics

This analysis is in part of an ongoing study on intestinal colonization in Swiss expats (data.snf.ch/grants/grant/192514). All participants provided informed written consent.

Stool samples from 25 healthy Swiss volunteers living abroad that were previously characterized by culture-based methods, PCR, Illumina draft WGS, and Illumina-SMS were included in this study15. Volunteers’ demographics and clinical characteristics, culture-based stool processing (i.e., ESC-R-Ent screening and isolation), antimicrobial susceptibility tests (ASTs), and CFUs/g of stool are described in detail in our aforementioned study15.

Sample selection

A total of 25 stool samples, of which 22 culture-based pre-enrichment positives for ESC-R-Ent previously characterized, were included in this study (Fig. 1)15. Based on the strain WGS results (see below), 22 stool samples carrying strains (total, n = 26) associated with blaCTX-Ms (n = 23), blaDHAs (n = 1), or co-localized blaCTX-Ms/blaDHAs (n = 2) in plasmids or chromosomes were selected (n = 18 and n = 9, respectively) (Supplementary Table S1). The 22 stool samples (and their corresponding 26 ESC-R-Ent strains) were chosen to: (i) assess the feasibility of Nanopore-SMS to detect blaCTX-M and blaDHA genes and (ii) determine the AGE of genes (i.e., plasmids or chromosomes).

Notably, the above 25 stool samples corresponded to 14 and 7 Illumina-SMS pre-enriched-positive and negative samples, respectively; 4 samples (S1-IVC-02, S1-NGR-02, S1-ESP-03, S1-SRB-02) not previously tested with Illumina-SMS were also included.

Screening result definitions

A culture-based positive sample was defined when an isolated ESC-R-Ent was confirmed by ASTs and Illumina draft strain WGS (i.e., the presence of blaCTX-Ms and/or blaDHAs), whereas an Illumina-SMS-positive sample was defined when blaCTX-Ms and/or blaDHAs were identified with ≥ 60% identity and ≥ 60% coverage15. In the present study, a Nanopore-SMS-positive sample was defined when a blaCTX-Ms and/or blaDHAs were identified with ≥ 60% identity and ≥ 60% coverage in at least ≥ 1 read or in a MAG contig (see below).

Sequencing and generation of strain WGS

A total of 26 ESC-R-Ent strains previously characterized by Illumina draft WGS were also sequenced on the Nanopore platform to generate complete genome assemblies as previously described15.

In brief, gDNA was extracted using the Invitrogen™ PureLink™ Microbiome DNA Purification Kit (Thermo Fisher Scientific) according to the manufacturer recommendations, followed by quantity and quality assessment by NanoDrop™ and Qubit™ 3 (Thermo Fisher Scientific). Illumina strain WGS was performed on a NovaSeq 6000 system (2 × 150-bp paired-end output) by Eurofins genomics (eurofinsgenomics.eu/), while Oxford Nanopore strain WGS was conducted on a MinION™ Mk1B (Oxford Nanopore Technologies) sequencer using the rapid barcoding library prep (SQK-RBK004) and a FLO-MIN 106D R9 flow cells, as described previously34,49. The resulting raw reads were preprocessed with Trimmomatic v0.36 (github.com/usadellab/Trimmomatic) and Porechop v0.2.4 (github.com/rrwick/Porechop) for Illumina and Nanopore data with default parameters, respectively. Nanopore reads were further quality filtered with Filtlong v0.2.1 (github.com/rrwick/Filtlong) (parameters: minimum length, 1000-bp; target bases, 1 billion). Complete and circular genome assemblies were generated with Unicycler v0.4.8 (github.com/rrwick/Unicycler) using the hybrid pipeline (i.e., using preprocessed Illumina and Nanopore reads).

Preparation of native and pre-enriched stool gDNA

Native stool gDNA isolations were performed with the kit described above, but according to the stool sample protocol (Publication number MAN0014266; revision A.0) using ~ 200 mg of stool material. The same native gDNA isolations used in our previous study (i.e., for PCR and native Illumina-SMS) were reanalyzed using Nanopore-SMS (i.e., native Nanopore-SMS)15.

The pre-enriched stool gDNA isolations used in this study were optimized to increase gDNA concentration to account for the amplification-free library preparation (see below) used for Nanopore-SMS (i.e., pre-enriched Nanopore-SMS). Stool aliquots (~ 50–100 µg) that were stored at − 80 °C were briefly thawed and pre-enriched for 6 h in 10 mL Luria-Bertani broth supplemented with cefuroxime (30 µg disks; BD BBL™ Sensi-Disc™) at 36 ± 1 °C as previously done15,50,51,52.

Following the 6 h pre-enrichment, an 8 mL aliquot was centrifugated for 10 min (14,000 × g) with subsequent removal of the supernatant. The resulting pellet was resuspended in 800 µL of lysis buffer S1 and transferred to a bead tube (step 1a). All subsequent steps were done according to the stool protocol of the Invitrogen™ PureLink™ Microbiome DNA Purification Kit (Thermo Fisher Scientific) described above. After gDNA isolation, samples were further purified with magnetic beads using the CleanNGS kit (CleanNA) following manufacturer recommendations (single tube protocol; manual revision v7.00). Overall, gDNA isolation per sample took approximately 1 h and 15 min, followed by 30 min of magnetic bead cleanup (Fig. 1).

Nanopore-SMS

Library preparations were done using the Nanopore SQK-RBK114-24 kit using 50 ng of input gDNA (native and pre-enriched SMS) following manufacturer recommendations (protocol version: RBK_9176_V114_REVJ_27NOV2022). The prepared libraries were loaded on R10.4.1 flow cells and sequenced on a MinION™ Mk1B device in 4 separate batches (up to 18 samples per flow cell). Sequencing time was set to 48 h with a minimum read length of 200-bp, read filtering equal to quality score (Q-Score) of 9, and read splitting “ON”. Data acquisition was done with MinKNOW v23.04.5 and base calling with Guppy v6.5.7 using the high-accuracy (400-bps) setting (Fig. 1).

Nanopore read preprocessing and sequencing time point extraction

For each sequencing sample, raw Nanopore reads were extracted at 6 different sequencing time points (i.e., T1h, T3h, T6h, T12h, T24h, T48h) with OnTime v0.2.3 (github.com/mbhall88/ontime) followed by trimming of adapters with Porechop v0.2.4 using default parameters. Before downstream analyses, reads were decontaminated from host reads by mapping them to the human reference genome GRCh38.p14 [RefSeq: GCF_000001405.40] with minimap2 v2.26-r1175 (github.com/lh3/minimap2) and samtools v1.18 (github.com/samtools/) with the arguments “-ax map-ont” and “fastq -n -f 4”, respectively. Finally, for the complete sequencing run (i.e., T48h), summaries of the read statistics were generated using NanoStat v1.4.0 (github.com/wdecoster/nanostat) with default parameters (Fig. 1).

Generation of MAG assemblies

Preprocessed reads belonging to T48h (i.e., the complete run) were assembled with metaFlye v2.9.2 with the arguments “--nano-raw --meta --iterations 5” to generate MAGs53. The resulting MAGs were improved with 4 rounds of Racon v1.5.0 polishing with default parameters54. The Racon-polished MAGs were polished one more time with Medaka v1.8.0 (github.com/nanoporetech/medaka) using “medaka_consensus” and the model argument “-m r1041_e82_400bps_hac_g632” closest to our Guppy basecaller version (see above). The quality of the resulting 3-steps polished MAGs (i.e., metaFlye 5X, Racon 4X, and Medaka 1X) was estimated using BUSCO v5.7.1 (parameters: “-m genome --lineage_dataset bacteria_odb10”) specifically to quantify the presence of fragmented genes40.

ARGs, replicon and multilocus sequence typing (MLST) screening

Preprocessed reads from all time points and MAGs were screened for ARGs with the command line version of ResFinder v.4.3.2 (bitbucket.org/genomicepidemiology/resfinder/src/master/), which conveniently implements k-mer alignment with KMA v1.3.19 when reads are provided55,56. The arguments “--acquired” and “--ifq” or “--ifa” for read and MAGs were used, respectively. For both minimum template identity and template length, coverage was set to 60%, as previously done15. In order to consider all possible ARG hits, especially single reads containing blaCTX-Ms or blaDHAs, the ResFinder argument “--nanopore” was not used, as it automatically sets KMA arguments (“-ont -md 5”) too stringent for the purpose of this study.

For the read-based screening, reads corresponding to blaCTX-Ms and blaDHAs were counted and extracted from the ResFinder KMA output file (*.frag.gz) using bash commands, while for MAG-based screening, contigs carrying blaCTX-Ms and blaDHAs were extracted with Geneious Prime v2023.0.4 from the MAGs. Furthermore, the extracted reads, and MAG contigs were screened with PlasmidFinder v2.0 (parameters: 50% minimum template identity and 60% template length coverage) and MLST v2.0 (parameter: E. coli number 1 configuration)57,58. Lastly, ANI of MAG contigs positive for blaCTX-Ms or blaDHAs to strain WGS reference sequences (E. coli chromosomes) were estimated with FastANI v1.33 (github.com/ParBLiSS/FastANI) using default parameters.

Variant analyses of polished MAGs

The nucleotide sequences of blaCTX-Ms and blaDHAs were manually extracted from individual 3-steps polished MAGs based on the ARG screening results (see above) using Geneious Prime. Similarly, blaCTX-M- and blaDHA-positive plasmids or chromosomes extracted from contigs (see above) of each individual polished MAG were used. For both, genes (blaCTX-Ms/blaDHAs) and AGEs (plasmid and chromosomes) were individually mapped to reference nucleotide sequences (see below) using Snippy v4.4.5 (parameter: “--ctgs”) (https://github.com/tseemann/snippy) to identify sequence variants (i.e., SNPs).

Comparison of reads and MAGs to strain WGS

The extracted reads, and MAG contigs were aligned to references sequences (strain WGS; plasmids or chromosomes) or to the nucleotide collection database (GenBank+EMBL + DDBJ + PDB + RefSeq sequences) with minimap2 (parameters: “-ax map-ont”) and QualiMap (parameters: “bamqc -bam”) (http://qualimap.conesalab.org/) or by BLAST (Web BLAST; megablast algorithm), respectively (Fig. 1). Circular alignments were visualized with BLAST Ring Image Generator v0.95 (github.com/happykhan/BRIG) and annotated for style with Inkscape v1.0.1 (gitlab.com/inkscape).

Microbiota analyses

Read classification at the species level for all sequencing time-points was done with Kraken2 v2.1.1 (standard Kraken2 database build on: 03/28/2022) with arguments “--use-names --report-zero-counts --report”59. Kraken reports from all samples were merged with the Python scripts kraken-multiple.py and kraken-multiple-taxa.py (github.com/npbhavya/Kraken2-output-manipulation). The python-merged reports were combined into a single dataset with only bacterial species considered for downstream analyses in R programming language v4.2.2. Species relative abundance and alpha diversity (observed and SDI) analyses were conducted in Phyloseq v1.42.0 in R, considering species with at least > 1 read count. Differential species abundance was estimated with DESeq2 v1.38.3 in R with default parameters using pre-enriched vs. native Nanopore-SMS as contrast (Fig. 1).

Before beta diversity analyses, species read counts were normalized with the Trimmed Mean of M-values (TMM) method to account for sequencing depth differences with edgeR v3.40.2 R package. The NMDS ordination of the Bray–Curtis dissimilarity matrix was performed with Phyloseq. The R package tidyverse v2.0.0 was used for all data wrangling described above, while ggpubr v0.6.0 was used to generate publication-quality plots. Additional style annotations were done in Inkscape.

Statistical analyses

Statistical tests were performed using the R language stats v4.2.2 and rstatix v0.7.2 packages. Paired T-tests were conducted for datasets with a normal distribution and equal variance with rstatix (t_test, paired = TRUE). The non-parametric Wilcoxon signed-rank test was used to compare two paired groups from not normally distributed datasets with rstatix (wilcox_test, paired = TRUE). In the case of multiple testing, p-values were corrected with the Holm method with rstatix. The Adonis test was used to explain Bray-Curtis dissimilarity distance by sample type (native and pre-enriched Nanopore-SMS) and sequencing time point (T1h to T48h) with vegan v2.6-4 R package [parameters: “permutations = 10000”, “set.seed(1718)”]. Data normalization (default: median of ratios) and adjusted p-value rankings (default method: Benjamini-Hochberg; cutoff: p < 0.01) were done automatically by DESeq2 with default parameters for the species differential abundance analyses. Significance levels are represented by an asterisk (*) as shown in the figures, which correspond to the following p-values: *, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.0001.

Sensitivity and specificity for the detection of blaCTX-M and blaDHA genes were calculated using the results of the gold standard (culture-based selective pre-enrichment), as previously described15.

Computational requirements

Nanopore sequencing and bioinformatic analyses described above (summarized in Fig. 1) were performed on a 16-core Intel Core i9-12900K 3.2 GHz processor with 64GB DDR5 4800 MHz memory, 5TB NVME SSD drive, and Nvidia RTX 3080TI 12 GB GPU mid-level workstation running Linux Ubuntu 22.04.4 LTS.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.