Characterization of antibiotic resistomes by reprogrammed bacteriophage-enabled functional metagenomics in clinical strains

Functional metagenomics is a powerful experimental tool to identify antibiotic resistance genes (ARGs) in the environment, but the range of suitable host bacterial species is limited. This limitation affects both the scope of the identified ARGs and the interpretation of their clinical relevance. Here we present a functional metagenomics pipeline called Reprogrammed Bacteriophage Particle Assisted Multi-species Functional Metagenomics (DEEPMINE). This approach combines and improves the use of T7 bacteriophage with exchanged tail fibres and targeted mutagenesis to expand phage host-specificity and efficiency for functional metagenomics. These modified phage particles were used to introduce large metagenomic plasmid libraries into clinically relevant bacterial pathogens. By screening for ARGs in soil and gut microbiomes and clinical genomes against 13 antibiotics, we demonstrate that this approach substantially expands the list of identified ARGs. Many ARGs have species-specific effects on resistance; they provide a high level of resistance in one bacterial species but yield very limited resistance in a related species. Finally, we identified mobile ARGs against antibiotics that are currently under clinical development or have recently been approved. Overall, DEEPMINE expands the functional metagenomics toolbox for studying microbial communities.

Functional metagenomics is a powerful experimental tool to identify antibiotic resistance genes (ARGs) in the environment, but the range of suitable host bacterial species is limited. This limitation affects both the scope of the identified ARGs and the interpretation of their clinical relevance. Here we present a functional metagenomics pipeline called Reprogrammed Bacteriophage Particle Assisted Multi-species Functional Metagenomics (DEEPMINE). This approach combines and improves the use of T7 bacteriophage with exchanged tail fibres and targeted mutagenesis to expand phage host-specificity and efficiency for functional metagenomics. These modified phage particles were used to introduce large metagenomic plasmid libraries into clinically relevant bacterial pathogens. By screening for ARGs in soil and gut microbiomes and clinical genomes against 13 antibiotics, we demonstrate that this approach substantially expands the list of identified ARGs. Many ARGs have species-specific effects on resistance; they provide a high level of resistance in one bacterial species but yield very limited resistance in a related species. Finally, we identified mobile ARGs against antibiotics that are currently under clinical development or have recently been approved. Overall, DEEPMINE expands the functional metagenomics toolbox for studying microbial communities.
Metagenomics allows the exhaustive analysis of microbial communities, including species that cannot be cultivated in laboratory conditions. By extracting genomic data from environmental samples, researchers gain knowledge on the species compositions and functionality of the microbiome in a range of natural environments 1 . In particular, functional metagenomics is devoted to screening metagenomic DNA for the presence of genes that encode specific molecular functions [2][3][4] . Cloning and expressing fragmented metagenomic DNA in a bacterial host can reveal previously undescribed proteins. Applications of functional metagenomics include the identification of enzymes, exploring bioactive agents and screening for antibiotic resistance genes residing in the environment [5][6][7][8] . The libraries typically contain millions Article https://doi.org/10.1038/s41564-023-01320-2 packaged in two previously characterized hybrid T7 phage particles that display tail fibre proteins from Salmonella phage ΦSG-JL2 and Klebsiella phage K11 21 . The three metagenomic libraries were transduced into Salmonella enterica subsp. enterica serovar Typhimurium str. LT2 and K. pneumoniae NCTC 9131, both of which are known bacterial targets of these two hybrid T7 bacteriophage particles 21 . In parallel, we electroporated the libraries in the model bacterium E. coli K12 (Methods). Finally, we analysed whether transduction by T7 phage particles introduces any bias into the size and composition of the libraries (Methods).
Strikingly, both the ΦSG-JL2 and K11 tail-displaying hybrid T7 bacteriophage particles delivered the plasmid libraries into its targeted bacterial strain at least as efficiently as electroporation does into the laboratory E. coli model strain ( Fig. 1b and Supplementary Table 2). In particular, the maximum number of plasmids delivered to the host bacteria were at least two orders of magnitude higher by transduction than by electroporation (Extended Data Fig. 1a and Supplementary Table 2).
Additionally, long-read deep sequencing shows that both the average DNA fragment sizes and the fragment diversities of the libraries delivered by T7 phage particles are comparable to that of the library delivered by electroporation into E. coli (Fig. 1c,d and Supplementary Table 3). This indicates that transduction by reprogrammed bacteriophage particles has no serious distorting effect on the size and diversity of the delivered metagenomic libraries. Finally, we sequenced the plasmid content of 38 isolated individual bacterial clones after phage transduction. Reassuringly, co-transduction of two plasmids into the same cell, a phenomenon that results in false positive hitchhiker hits in a screening campaign, was detected in only 5% of the cells, while co-transformation of two plasmids into the same cell by electroporation occurred in 10% of the cells (Extended Data Fig. 1b). Overall, these results indicate that certain T7 transducing bacteriophage particles with exchanged tail fibres are suitable delivery vehicles for functional metagenomics.

Directed evolution optimizes DNA library delivery
Our next goal was to generalize our approach for the involvement of additional bacterial pathogen species. Transduction efficiencies of most hybrid phage particles are well below the threshold (>10 7 transductants per ml) required for the delivery of entire functional metagenomic libraries into the target bacterial cells 21 . Moreover, the delivery of such libraries requires the use of high concentrations of the transducing phage particles. In such cases, replicative phage contamination, a common issue of transducing bacteriophage particle generation 27 , kills a large fraction of the target cells (Extended Data Fig. 1c).
To overcome these two problems, we set up a directed evolution experiment to genetically modify the tail fibre regions in the T7 phage particles. Specifically, we aimed to select for point mutations in the host-range-determining regions (HRDRs) of the phage tail fibres that alter host specificity 28,29 . To this end, we first selected three tail fibres (Escherichia phage T7, Salmonella phage ΦSG-JL2 and Salmonella phage Vi06) with especially broad host ranges 21 . Then, we identified potential HRDRs in the tail fibre gene gp17 of Salmonella phage ΦSG-JL2 and vi06_43 of Salmonella phage Vi06. The identification was based on sequence homology to four HRDRs in the receptor binding domain (RBD) of the well-characterized T7 and T3 phage tail fibre gene gp17 (Methods and Supplementary Table 4) 28,29 . Next, we introduced randomly distributed mutations within and in the vicinity of the HRDRs of tail fibre genes derived from ΦSG-JL2, Vi06 and T7 phages using a high-frequency site-directed mutagenesis method called DIvERGE ( Fig. 2a and Methods) 22 . Compared with other mutagenesis protocols, DIvERGE has the advantage of introducing random mutations along multiple DNA sites simultaneously, and can cover relatively long DNA segments, potentially beyond the predicted HRDRs 22 .
Using a transduction optimization protocol 21 , we next selected phage tail variants with an improved capacity to deliver plasmid of DNA fragments, corresponding to a total coverage of 5-100 Gb, the size of thousands of bacterial genomes 7,9,10 .
Although functional metagenomics can potentially be useful for several research areas, in its present form the methodology is far from perfect, limiting its applicability. Given the enormous size of the plasmid libraries, efficient introduction of these libraries into a bacterial host is of central importance. However, this processtypically by electroporation, conjugation or conventional bacteriophage transduction-is cumbersome and is only efficient for a limited range of laboratory strains 11,12 . This limitation has far-reaching consequences on the applicability of functional metagenomic screens and the generality of conclusions that can be drawn 13,14 . For example, it hinders screening for biotechnologically or clinically relevant genes that are functional in only specific bacterial species 12,15,16 . In particular, most metagenomic screens for antibiotic resistance genes (ARGs) rely heavily on the use of laboratory strains of Escherichia coli as bacterial hosts 5,17,18 . Therefore, ARGs that do not provide resistance in these strains but do so in other clinically relevant pathogens remain undetectable. Indeed, previous studies indicate that the impact of antibiotic resistance mutations on resistance phenotypes depends on the bacterial host's genetic background 19 . Additionally, metagenomic screens in multiple host bacteria could provide valuable information on interspecies functional compatibility and mobility of ARGs 20 .
In this paper, we present Reprogrammed Bacteriophage Particle Assisted Multi-species Functional Metagenomics (DEEPMINE), which provides a solution to these problems (Fig. 1a). DEEPMINE is based on a previous work that aimed to extend the host range of T7 phage particles for DNA transduction by exchanging the tails between different types of bacteriophages 21 . DEEPMINE employs such modified bacteriophage transducing particles to deliver large metagenomic plasmid libraries into a range of bacterial species. Additionally, we applied directed laboratory evolution to increase the efficiency of such library delivery 22 . Using this approach, we performed metagenomic screens in clinically relevant bacterial pathogens from the Enterobacteriaceae family. We identified several previously unreported ARGs with species-specific effects on antibiotic susceptibility. Additionally, we studied a set of antibiotics that have only recently been approved for clinical use or are in late-stage clinical development, and show that these new antibiotics are just as prone to resistance formation as old antibiotics after decades of clinical use (Extended Data Table 1).

DNA library delivery by reprogrammed bacteriophage particles
We first tested whether hybrid T7 bacteriophage particles with exchanged tail proteins are suitable tools to deliver functional metagenomic plasmid libraries into bacterial cultures. In brief, we created metagenomic libraries to obtain environmental and clinical resistomes 23 , including (1) river sediment and soil samples from seven antibiotic polluted industrial sites in the close vicinity of antibiotic production plants in India (that is, anthropogenic soil microbiome) 24,25 , (2) feacal samples from 10 European individuals who had not taken any antibiotics for at least 1 yr before sample donation (that is, gut microbiome) and (3) samples from a pool of 68 multi-drug resistant bacteria isolated in healthcare facilities or obtained from strain collections (that is, clinical microbiome; see Methods, Fig. 1a and Supplementary Table 1).
DNA fragments ranging from 1.5 to 5 kb in size were shotgun cloned into a low-copy cloning plasmid capable of replication in selected orders of the class Gammaproteobacteria 26 (see Methods). The plasmid DNA carries a packaging signal sequence that allows translocation of the plasmid into the T7 bacteriophage independent of the T7 genome (Fig. 1a). Each constructed library contained 3-5 million DNA fragments, corresponding to a total coverage of 25 Gb (that is, the size of ~5,000 bacterial genomes). The resulting plasmid libraries were  Fig. 2b and Supplementary Table 4). Simultaneously, as a positive control, we selected the T7 phage tail library with the same protocol in the presence of a phage-resistant E. coli model strain (BW25113ΔtrxAΔwaaR) with deficient cell wall-embedded lipopolysaccharide receptors of T7-like phages 30,31 .
As a result of directed evolution, DNA transduction efficiency was improved by one to seven orders of magnitude in all three pathogenic bacterial strains tested (Fig. 2b). With Shigella sonnei HNCMB 25021, the transduction efficiency reached the level suitable for the delivery of entire metagenomic plasmid libraries (Fig. 2b) Table 5). Reassuringly, the transduction of the three metagenomic libraries into Shigella sonnei HNCMB 25021 by this T7 phage tail variant resulted in functional metagenomic libraries that are as large and diverse as the library achieved by electroporation in the E. coli K12 strain (Extended Data Fig. 4 and Supplementary Tables 2 and 3). Overall, we found that directed evolution of the phage tail improves the delivery of metagenomic libraries into previously untapped bacterial strains compared with the delivery of the same libraries by electroporation.

Involving multiple pathogenic hosts expands the ARG repertoire
Our next goal was to improve sampling of the bacterial antibiotic resistome through functional metagenomics in multiple bacterial hosts. To this end, we screened the above-described three metagenomic libraries (soil, gut, clinical) in three pathogenic bacterial hosts (Salmonella enterica LT2, K. pneumoniae NCTC 9131 and Shigella sonnei HNCMB 25021) and in E. coli BW25113. The screens were performed on solid agar in the presence of one of 13 selected antibiotics covering five major antibiotic classes (Extended Data Table 1) Table 1). All studied antibiotics, including CEF 32 , have demonstrated activity against Gram-negative pathogens. Of note, APS has been used in veterinary medicine for over a decade but is currently under clinical trial to treat systemic Gram-negative infections in humans 33 .
The obtained resistance-conferring plasmids were pooled and sequenced with a modified dual-barcoded shotgun expression library sequencing pipeline (Extended Data Fig. 5 and Methods; see also ref. 34 ). The protocol avoids PCR amplification of resistance-conferring DNA fragments, thus preserving the original composition of the samples. By aligning the obtained DNA sequences to antibiotic resistance genes in relevant databases 35,36 , we found that 84% of the 571 fragments displayed sufficient sequence similarity (Methods) to known resistance genes (Supplementary Table 6). As many of the detected ARGs were isolated on several different DNA fragments, ARGs were clustered at 95% identity and coverage to reduce sequence redundancy in the dataset 37 . To quantify the reproducibility of the pipeline, we repeated the full protocol (one library delivery, screening and sequencing) with K. pneumoniae. Reassuringly, 83.3% of the ARGs were isolated in both biological replicates (Fig. 3a).
In total, 114 ARGs were detected, many of which were present in multiple DNA fragments (Supplementary Tables 6 and 7). The analysis also revealed substantial differences in the identified ARG repertoires across the four examined host bacterial species. In particular, when the analysis was restricted to E. coli as the bacterial host, 43% of the total 114 ARGs remained undetected (Fig. 3b-d and Extended Data Fig. 6). Article https://doi.org/10.1038/s41564-023-01320-2 This indicates that DEEPMINE allows a more comprehensive sampling of the bacterial resistomes by the utilization of multiple host bacteria. Efflux pumps, their corresponding transcriptional regulators and antibiotic inactivating enzymes were common among the detected ARGs ( Fig. 3c and Extended Data Fig. 7a). A substantial fraction of the ARGs isolated from the gut, soil and clinical microbiomes originated from Proteobacteria, which are phylogenetically close relatives of the host bacterial species in our screens (Extended Data Fig. 7b). Then, we determined whether the ARGs detected in our screen are prone to horizontal gene transfer in nature. ARGs that have been mobilized in the past in human-associated environments may pose a higher health hazard as they have the potential to become widespread among human pathogens 38 . To investigate this issue, we generated a mobile gene catalogue on the basis of identification of nearly identical genes that are shared by distantly related bacterial genomes 37,39,40 . Specifically, we carried out the pairwise alignment of 2,794 genomes of phylogenetically diverse human-related bacterial species (Supplementary Table 8). This dataset was extended with a sequence database of 27,939 natural plasmids derived from diverse environments (ref. 41 , Methods). ARGs carried by plasmids were especially likely to be transferred between bacterial species, with a 91% agreement between the two datasets on mobile ARGs (Supplementary Table 7). Remarkably, ARGs present in multiple DNA fragments in our screen were more frequently subjected to horizontal gene transfer in nature compared with ARGs that are only present in a single DNA fragment (Fig. 3e).

Species-specific activity of ARGs across bacterial species
Next, we asked how the variation in the detected ARG repertoires across the four bacterial hosts can be explained. The first hypothesis was that certain ARGs remain undetected due to stochastic plasmid loss. This can happen during transduction of the metagenomic library into their new hosts or during the screening process. Alternatively, the transferred ARGs may not be functionally compatible with the physiology of all bacterial hosts 20 . Therefore, several ARGs provide resistance in specific bacterial species only. While the first hypothesis is certainly relevant, several lines of evidence indicate substantial differences in the resistance phenotype of ARGs across bacterial species.
To test these hypotheses, we first examined how DNA fragments that provide antibiotic resistance in E. coli shape antibiotic susceptibility in the other three host bacterial species. We analysed a representative set of 13 resistance-conferring DNA fragments derived from our screens by measuring the levels of antibiotic resistance they provide across the bacterial hosts. As certain ARGs have been detected in multiple antibiotic screens, we studied 20 antibiotic-DNA fragment combinations in total (Fig. 4a). In seven out of the 20 studied cases, the DNA fragment provided no changes in resistance level in at least  ATCC 23355 Article https://doi.org/10.1038/s41564-023-01320-2 one of the three other bacterial species (using a twofold change in minimum inhibitory concentrations (MIC) as a cut-off). Therefore, on average, only 80% of the functional ARGs overlapped between the pairs of E. coli and the other three species. Additionally, we observed a substantial, up to 256-fold variation in the resistance level provided by the specific DNA fragments ( Fig. 4a and Supplementary Table 9). Efflux pumps, transcriptional regulatory proteins and antibiotic-modifying enzymes alike displayed such major variation in resistance levels across the studied bacterial species (Fig. 4a). Finally, we re-investigated all resistance-conferring DNA fragments detected in the metagenomic screens. We pooled the corresponding plasmids and re-introduced the resulting pre-selected plasmid library into each of the four native bacterial host species. We subsequently performed new antibiotic selection screens with this library on solid agar, as previously described. To control for stochastic plasmid loss during transduction, we sequenced the new plasmid library before and after antibiotic selection. Of the ARGs, 70% (80 out of 114) were represented by at least one plasmid in all four bacterial host species after transduction, but before antibiotic selection (Supplementary Table 10). After antibiotic selection, 63 of these ARGs were detected to show antibacterial activity in at least one of the four bacterial host species (Supplementary Table 10). Notably, 16 out of the 17 ARGs lost during antibiotic selection were encoded by only a single resistance-conferring DNA fragment (Extended Data Fig. 8a). After adjusting the overlaps with the accuracy of the screen (Extended Data Fig. 8b), on average, 70% of the ARGs overlapped between pairs of species ( Fig. 4b and Extended Data Fig. 8c). In total, only ~46% of the ARGs (~29 out of 63) provided resistance in all four bacterial host species (Extended Data Fig. 8d). Clearly, future work on larger metagenomic datasets should reveal the exact biochemical, cellular and phylogenetic features that shape the species-specificity profiles of ARGs.
Together, these results indicate that ARGs, when transferred to new bacterial hosts, frequently have species-specific effects on antibiotic susceptibility.   Table 6). e, Number of mobile (depicted as HGT to denote the detection of involvement in horizontal gene transfer) and non-mobile (depicted as non-HGT to denote the lack of involvement in horizontal gene transfer) ARGs present on multiple and single contigs in the metagenomic libraries (two-sided Fisher's exact test, P = 1.058 × 10 −5 , n = 114; Supplementary Table 6).

Potential resistance to recently developed antibiotics
Next, we estimated how prone the 'recent' antibiotics are to ARG mobilization compared to the 'old' antibiotics. We found that the overall numbers of ARGs are statistically the same for the two antibiotic groups (Fig. 5a, Table 1), regardless of the microbiomes that were considered (Extended Data Fig. 9a). Moreover, when the analysis was restricted to ARGs with established horizontal gene transfer events, the above results remained (Fig. 5b and Extended Data Fig. 9b). As expected, the resistance mechanisms largely overlap between 'old' and 'recent' antibiotics belonging to the same drug classes (Fig. 5c), suggesting that cross-resistance could be prevalent. CEF, a fifth-generation cephalosporin that has recently been approved for the treatment of hospitaland community-acquired pneumonia 42,43 highlights this point. Both the overall frequency of ARGs (for example, β-lactamases) and the frequency of mobile ARGs were exceptionally high against CEF (Table 1), even when compared to those of 'old' β-lactam antibiotics with decades of clinical use (Fig. 5a-c). Indeed, extended-spectrum β-lactamases (ESBLs) generally hydrolyse ceftobiprole 44 , hence its clinical utility against Gram-negative multidrug-resistant pathogens producing such ESBLs is limited 45 .
A notable exception to this trend is APS, an antibiotic in clinical trial for application in humans. Only a single ARG was detected against this antibiotic in the gut resistome and none in the pooled collection of clinical isolates (Supplementary Table 7). However, in agreement with extensive use of APS in veterinary medicine for decades, multiple ARGs against APS were detected in the soil microbiome (Fig. 5c). The identified ARGs are mostly aminoglycoside acetyltransferases that are functionally compatible in multiple pathogenic hosts (Table 1,  Supplementary Table 7 and Fig. 5c). This suggests that these genes can be of potential clinical risk in the future. In agreement with this expectation, one of these aminoglycoside acetyltransferases, AAC(3)-IV, has already been detected in APS-resistant clinical bacteria 46 . Overall, DEEPMINE could be a useful tool to predict ARGs currently only detectable in non-human-associated microbiomes with potential health implications.

Discussion
In this work, we introduce DEEPMINE, an approach that broadens the range of host bacterial species applicable in functional metagenomics. Previous work showed that bacteriophage host range can be broadened by exchanging the tail fibre of the E. coli phage T7 or by generating random mutations in the T7 tail-fibre-encoding genes 21 . DEEPMINE employs such reprogrammed bacteriophage transducing particles with exchanged and/or mutagenized tail fibres to deliver large metagenomic plasmid libraries into a range of bacterial species (Fig. 1). The main advantage of DEEPMINE over existing techniques for functional metagenomics, such as electroporation or conjugation, is its higher efficiency. In particular, we found that DEEPMINE is more suitable for introducing small-insert (1.5 kb-5 kb) metagenomic plasmid libraries to the selected bacterial hosts than electroporation ( Fig. 1 and Extended Data Fig. 1) 4,47 . While conjugation is frequently used to deliver libraries with large insert sizes (10 kb-40 kb) that typically contain 10 4 -10 5 clones, it is very challenging to obtain more than 10 6 -10 7 transconjugants with this technique 48,49 . On the other hand, a small-insert (1.5 kb-5 kb) metagenomic library such as used in this study usually requires more than >10 8 plasmids to deliver libraries with sufficient coverage.
Using our approach, we performed 156 metagenomics screens with all possible combinations of 13 antibiotics, three metagenomic libraries (isolated from soil, gut and clinical microbiomes) and four related Enterobacteriaceae species. We demonstrate that by studying multiple host species, the bacterial resistome is substantially expanded; 43% of the non-overlapping ARGs remain undetected when only a single species (E. coli) was considered (Fig. 3). Accordingly, DEEPMINE allows the identification of ARGs that provide resistance only in specific clinically relevant pathogens. Indeed, we identified a large set of ARGs against recently developed antibiotics with potential to become future health risks (Fig. 5). On the basis of these results, we anticipate that DEEPMINE will be a useful tool to predict the future dissemination of ARGs for which there is a growing general interest 6,16,37,38,50 . However, the  Table 9. b, Adjusted Jaccard similarity coefficients that represent the overlaps of functional ARG sets between pairs of host species after controlling for measurement noise (see Methods and Extended Data Fig. 8). Numbers in brackets represent 95% confidence intervals (Methods).
Article https://doi.org/10.1038/s41564-023-01320-2 current limitation of DEEPMINE is that it takes considerable time and resources to engineer suitable phage particles to enable host bacteria of interest to be used for functional metagenomics. In summary, our work provides a deeper insight into the forces that shape the mobile resistome. Future work should expand the metagenomic libraries involved to classify mobility and functional compatibility of the detected ARGs in a more comprehensive manner and test in a broader range of clinical isolates. . Volunteer participants were selected on the basis of strict criteria that (1) they did not take any antibiotics for at least one yr before sample donation and (2) they are in a good health. These requirements are standard in the field and secure a bias-free comparison of the antibiotic resistomes in the healthy human gut microbiome. Informed consent was obtained from all participants. Soil and river sediment sample collection from around the city of Hyderabad and Lucknow was approved by the National Biodiversity Authority (NBA), India (application number: NBA/Tech Appl/9/1822/17/18-19/3535). No statistical methods were used to pre-determine sample sizes, but our sample sizes are similar to those reported in previous publications 18,51,52 . Samples were not allocated to experimental groups. Samples for each individual experiment were handled by one person in charge. Data collection and analysis were not performed blind to the conditions of the experiments. No data were excluded from the analysis. Unless otherwise stated, when using a kit, we followed the manufacturer's instructions.

Plasmid construction for DEEPMINE
A custom plasmid was created from pZE21 expression vector (Supplementary Table 11) for compatibility with the T7 transduction and the sequencing pipelines. Specifically, the replication origin was switched from ColE1 to p15A, and the packaging signal of the T7 bacteriophage was introduced (enzymes and primers used are listed in Supplementary  Table 11). Subsequently, the pZE21_p15A vector was amplified by PCR using a mixture of primers containing 10-nt-long random barcodes (Supplementary Table 11), followed by digestion and self-ligation.

Sample collection and construction of metagenomic libraries
For the gut microbiome library, we collected faecal samples from 10 unrelated, healthy individuals with no history of taking antibiotics in the year before sample donation. For the anthropogenic soil microbiome, samples were collected from highly antibiotic-contaminated industrial areas in India 53 . Metagenomic DNA from the gut and soil samples was extracted using DNeasy PowerSoil kit (Qiagen, 47016). Genomic DNA of clinical bacterial isolates (Supplementary Table 1) was isolated using the Sigma GenElute bacterial genomic DNA kit (Sigma, NA2110-1KT).
From each sample, 40 µg of extracted DNA was digested with MluCI enzyme (NEB, R0538L) (10 min, 37 °C), followed by inactivation (20 min, 85 °C). The quantity of the MluCI enzyme was varied to obtain DNA in the target size range of 1-5 kbp. DNA was isolated with pulsed   Article https://doi.org/10.1038/s41564-023-01320-2 field gel electrophoresis (Sage Science, PB02901) with a 0.75% agarose gel cassette and low-voltage 1-6 kbp marker S1 cassette definition. The metagenomic DNA fragments were ligated into the pZE21_p15A plasmid at the EcoRI site using a 3:1 mass ratio of insert:vector. Pure ligation mixture was electroporated into 40 µl of either E. coli MegaX (Invitrogen, C640003) or E. coli 10G ELITE (Lucigen, 60080-2) cells. Following one h of incubation at 37 °C, transformants were plated onto 50 µg ml −1 kanamycin containing Luria Bertani (LB) agar plates in 10 1 ×, 10 2 × and 10 3 × dilutions for colony forming unit determination. The rest of the recovered cells were grown overnight on LB agar plates supplemented with kanamycin. The next day, plasmids were isolated. Insert size distribution was estimated by PCR amplification of relevant plasmid regions from 10-20 randomly selected clones. The average insert size was determined to be 2-3 kbp.

Measuring transduction efficiency
Transduction efficiencies were measured as previously described 21 .
In brief, target bacterial cells were grown to OD 600 ~0.5 (250 r.p.m. at 37 °C), followed by 15-min-long incubation on ice, during which dilutions of the transducing phage particles were prepared with tenfold dilution steps. Then, 50 µl of target cells were mixed with 50 µl of phage particles from each dilution. Plates were incubated at 37 °C at 180 r.p.m. for 1 h. Samples then were spotted on antibiotic-supplied agar plates. Transductant forming units per ml (t.f.u. ml −1 ) were calculated on the basis of colony counts.

Assembly of transducing particles containing the metagenomic libraries
E. coli K12 BW25113 strain containing phage-tail-encoding plasmids were electroporated with 30 ng of each plasmid library in five parallels to achieve suitable colony numbers, then plated on antibiotic-containing LB agar plates and grown overnight. Following growth, cells were stored in 20% glycerol at −80 °C. Next, frozen cells containing the library were grown in 40 ml LB supplemented with kanamycin 50 and streptomycin 100 by shaking at 230 r.p.m. at 37 °C until OD 600 0.7. Cells were cooled down on ice, centrifuged at 2,000 × g (4 °C, 10 min) and resuspended in LB medium. Then, the T7∆(gp11-12-17) bacteriophage was used to infect cells at MOI 2-3. Following 2 h of incubation (100 r.p.m. at 37 °C), cells were treated with 2% chloroform and vortexed. The mixture was then centrifuged and supernatant was collected.

Delivery of the metagenomic libraries by transducing phage particles and by electroporation
Overnight cultures of the corresponding bacterial strains were diluted to OD 600 0.1 in 50 ml LB medium to grow at 230 r.p.m. at 37 °C until OD 600 0.5. Next, we added 20 ml of library containing transducing particles to the cells, followed by one h incubation at the same parameters. Next, cells were centrifuged at 2,200 × g for 10 min at 4 °C, resuspended in 1-5 ml LB medium, plated on LB + kanamycin 50 and grown overnight. The next day, cells were collected and stored with glycerol at −80 °C. Of each library, 50 ng was electroporated into E. coli K12 BW25113 in five parallels. Cells were recovered in SOC medium for one h at 37 °C and plated on LB + kanamycin50 plates and grown overnight. The next day, cells were collected and stored in 20% glycerol at −80 °C.

Phage tail mutagenesis
To locate the HRDRs of the tail fibre genes, we used pairwise sequence alignment, where the recently identified HRDRs of gp17 of T3 coliphage 29

Selection of mutant phage tails with improved transduction efficiency
To select for tail mutants with improved delivery capacity, we applied a transduction optimization protocol. Finally, from each phage stock, 10 µl was dropped onto the top agar in 1-10 10 times dilutions.

Site-directed mutagenesis of phage-tail-encoding plasmids
For functional metagenomic library delivery, the mutation identified in the T7 gp17 V544G phage tail variant was introduced into plasmid MGP4240 21 by using whole plasmid amplification with primers carrying the corresponding mutation, followed by DpnI (Thermo Fisher, ER1701) treatment to eliminate the original, methylated template plasmid DNA and subsequent gel electrophoresis, gel extraction and self-ligation. The plasmids were then electroporated into E. coli BW25113 cells. Transformants carrying the desired constructs were identified by PCR and validated via sequencing.

Functional selection of antibiotic resistance
Functional selections for resistance were performed on Mueller Hinton Broth (Sigma, 90922) agar plates containing a concentration gradient of a given antibiotic (adapted from ref. 54 ). Antibiotics were purchased from Sigma or MedChem Express. The number of plated cells covered at least 10× the size of the corresponding metagenomic library. Plates were incubated at 37 °C for 24 h. For each functional selection, a control plate was prepared with the same number of cells containing the empty plasmid (that is, the plasmid without a cloned DNA fragment in the multiple cloning site) that showed the inhibitory zone of the antimicrobial compound for the cells without any resistance plasmid. The resistant clones from the libraries were isolated by washing together the sporadic colonies from the plate region (distal to the inhibition zone and containing higher antibiotic concentration), defined by visual inspection in comparison to the inhibition zone from the control plate. Half of the culture suspended in LB was used for plasmid isolation (GeneJET plasmid miniprep kit; Thermo Fisher, PLN70-1KT), and the rest was frozen with glycerol and stored at −80 °C.

Sample preparation for sequencing
The obtained resistance-conferring plasmids were sequenced with a hybrid sequencing pipeline (Extended Data Fig. 5) based on ref. 34 . Long-read sequencing identifies the metagenomic DNA fragments (inserts) and the two 10-nt-long random barcodes pre-cloned up-and down-stream (uptag and downtag, respectively) of each metagenomic DNA fragment. Aliquots of plasmid DNA preparations obtained from each screen were pooled in an equimolar ratio. Genomic DNA contamination was removed from the mixture by Lambda-exonuclease and Exonuclease-I double digestion. The resulting sample was cleaned (DNA Clean and Concentrator-5, Zymo Research kit) and quantified. Next, the plasmid mixture was linearized by adding 5 U of SrfI restriction endonuclease (NEB, R0629S) for every 1 µg of plasmid DNA (one h at 37 °C, followed by inactivation at 65 °C for 20 min), and DNA was quantified using Qubit dsDNA broad-range assay kit (Thermo Fisher,Q33266) before applying to Oxford Nanopore long-read sequencing. Parallel, multiplexed short-read deep sequencing was applied on each functional metagenomic plasmid DNA preparation (previous pooling) to associate nanopore contigs with screening samples (Extended Data Fig. 5). To this end, we amplified the up-and downtag barcodes on the plasmid preparations of each selection experiment separately, using Illumina specific forward and reverse primer pairs. Each primer pair contained P5 and P7 adapter sequences, respectively, and 8-nt-long barcodes for multiplexing and plasmid annealing sites (Supplementary  Table 11). We performed PCR using Phusion high-fidelity DNA polymerase (Thermo Fisher, F530S) using the following reaction mixture: 15 ng of template plasmid DNA, 4 µl 5× GC buffer, 0.2 µl Phusion high-fidelity DNA polymerase, 0.6 µl DMSO (dimethyl sulfoxide), 0.2 mM dNTPs, 0.5-0.5 µM forward and reverse primers and water in a final volume of 20 µl. The following thermocycler conditions were used: 95 °C for five min, 30 cycles of 95 °C for 30 s + 59 °C for 30 s + 72 °C for 5 s, 72 °C for seven min. Following concentration measurement of each PCR reaction, we mixed the samples in a 1:1 mass ratio. Next, we isolated the 137-bp-long fragment mixture from 0.75% agarose gel.

Nanopore sequencing
Libraries were prepared by using a ligation sequencing kit (Oxford Nanopore Technologies, SQK-LSK109) with 1 µg plasmid DNA. The DNA was end-prepped with the NEBNext FFPE Repair (M6630S) and Ultra II End Prep kit (E7546S), purified using Agencourt AMPure XP (Beckman Coulter, A63882) and then the adapter ligated using NEB-Next Quick T4 DNA ligase (E6056S). Finally, the adapted library was purified by Agencourt AMPure, quantified using Qubit 3.0, mixed with ONT running buffer and loading beads, primed with FLO-MIN106 9.4.1 SpotON flow cell attached to a MinION device and run for 72 h. Guppy algorithm (v8.25) with high-accuracy config settings was used for basecalling. Raw reads were filtered on the basis of quality value (QC ≥ 7) and length (4,000-8,000 bp) using NanoFilt v2.7.1 55 . Reads were mapped to the reference sequence with minimap2 (v2.17) 56 ; SAM files were converted to sorted BAMs; the insert sequences were exctracted, and barcodes were identified and added to the read/insert names applying samtools tview (1.11-9-ga53817f) subcommand 57 ; individual FASTQ files were created using SEQTK (v0.13.2) 58 ; consensus sequences were generated using SPOA (v4.0.2) 59 with the following parameters: -l 0 -r 0 -g -2. Finally, the raw consensus inserts were polished using the relevant set of insert sequences by minimap2 and racon (v1.4.19) 56 to create the final consensus inserts with at least 100× coverage. Delivered metagenomic DNA fragment lengths and diversities were determined by using long-read deep sequencing right after electroporation into E. coli BW25113 and transduction into Salmonella enterica subsp. enterica serovar Typhimurium str. LT2, K. pneumoniae NCTC 9131 and S. sonnei HNCMB 25021. Shannon alpha diversity indices (H) were calculated on the basis of the frequency of each of the contigs of all hosts using the vegan R package (2.5-7) 60 .

Illumina sequencing
Pooled sequencing libraries were denatured with 0.1 M NaOH, diluted to 12 pM with HT1 hybridization buffer (Illumina) and mixed with 40% PhiX Control v3 (Illumina) sequencing control library. Denatured sequencing pools were loaded onto MiSeq Reagent kit V2-300 (Illumina) and 2 × 70 bp sequence reads were generated with an Illumina MiSeq instrument with custom read 1, read 2 and index 1 sequencing primers spiked in the appropriate cartridge positions (12, 14 and 13, respectively) at a final concentration of 0.5 µM.

Host ranges of the ARGs encoded by the functional metagenomic DNA contigs
Resistant plasmid pools collected from the metagenomic screen were mixed and re-transformed or re-electroporated into the four hosts. Selection experiments were performed on gradient agar plates as described previously (see 'Functional selection of antibiotic resistance' above). Resistant colonies were collected and following plasmid preparation, barcodes on the plasmids were sequenced by Illumina sequencing (Supplementary methods). For calculating the overlaps between functional ARG sets across species, we first estimated the accuracy of the screen by comparing the results to that of the MIC measurements of the 13 selected resistance-conferring DNA fragments. On the basis of these comparisons, we estimated the true positive, false positive, true negative and false negative rates of the screen. Next, we calculated an adjusted Jaccard index for each species pair, which takes into account the screen's accuracy as follows. For each species, we replaced the original vector of presence/absence of detected resistance instances with a new vector where the original presence (absence) values were randomly kept with a probability equal to the positive (negative) predictive value (that is, the proportion of true positives among all positive cases and the proportion of true negatives among all negative cases). The procedure was repeated 50,000 times, and the medians and 95% Article https://doi.org/10.1038/s41564-023-01320-2 confidence intervals of the Jaccard indices between pairs of species were calculated.

Resistance levels in the bacterial hosts
We measured how DNA fragments that provide antibiotic resistance to E. coli influence susceptibility in Shigella sonnei HNCMB 25021, K. pneumoniae NCTC 9131 and Salmonella enterica subsp. enterica serovar Typhimurium str. LT2. For this purpose, we used a representative set of 13 plasmids that were isolated in our antibiotic selection screens. For each strain, the provided resistance levels (that is, the MIC) were measured with a standard 12-step microdilution method in 96-well plates, and the MIC fold change was determined by comparing them to the MIC of the corresponding empty vector harbouring control strains. MICs were determined on the basis of cell growth (OD 600 ) after 24 h incubation (37 °C, 180 r.p.m.).

Sequencing data analysis and functional annotation of ARGs
Each consensus insert sequence from nanopore sequencing was associated with screening samples (host, resistome, antibiotic) by combining the Nanopore and Illumina datasets through the unique uptag and downtag barcodes with a custom R script. To identify ARGs in the metagenomic contigs, two parallel approaches were used: (1) Open Reading Frame (ORF) prediction with prodigal 61 , followed by annotation with BLASTP search against CARD 35 and ResFinder 36 databases, with coverage >50 bp at e-value < 10 −5 and (2) BLASTX search with the same parameters but without ORF prediction to decrease the risk of truncated ORFs due to frame-shifting sequencing errors. To remove low-fidelity sequencing data from the dataset, metagenomic DNA fragments supported by <10 consensus insert sequences in the nanopore dataset and <9 reads in the Illumina uptag and downtag barcode dataset were filtered out. If a metagenomic DNA fragment contained more than one predicted ARG, ARGs known to act on an antibiotic class (based on CARD and ResFinder reference databases) other than the one we used in the selection experiment were filtered out. ARG sequences having at least 95% identity and coverage on the DNA sequence level were collapsed into ARG clusters 37 . Each cluster was represented by the closest hit to known ARGs in the Card 35 and ResFinder 36 databases (Supplementary Table 6). Donor organisms from which the assembled DNA contig sequences originated were identified by nucleotide sequence similarity search using the DNA contigs as query against the NCBI Reference Prokaryotic database (RefProk, downloaded 21 March 2021) with a threshold e-value of 10 −10 . The taxonomic hierarchy (kingdom, phylum, class, order, family, genus, species) was acquired using the taxonomizr package in R (v0.8.0).

Mobilization of the isolated ARGs
To create the mobile gene catalogue (that is, a database of recently transferred DNA sequences between bacterial species 40 ), we downloaded 1,377 genomes of diverse human-related bacterial species from the Integrated Microbial Genomes and Microbiomes database as done previously 40 and 1,417 genomes of Gram-negative ESKAPE pathogens from the NCBI RefSeq database (Supplementary Table 8). Using NCBI blastn 2.10.1+ 62 , we searched the nucleotide sequences shared between genomes belonging to different species. The parameters for filtering the NCBI blastn 2.10.1+ blast results were the following: minimum percentage of identity, 99%; minimum alignment length, 500; maximum alignment length, 20,000. The blast hits were clustered by cd-hit-est 4.8.1 63,64 , with sequence identity threshold of 99%. We predicted the ORFs on the blast hits with prodigal v2.6.3 61 , keeping only those longer than 500 nt. Then, to generate the mobile gene catalogue, we compared them with the merged CARD 3.1.0 35 and ResFinder (d48a0fe) 36 databases using diamond v2.0.4.142 65 . Finally, natural plasmid sequences were identified by downloading 27,939 complete plasmid sequences from the PLSDB database (v2020- [11][12][13][14][15][16][17][18][19] 41 .
Then, representative sequences of the isolated 114 ARG clusters were BLASTN searched both in the mobile gene catalogue and in natural plasmid sequences, with an identity and coverage threshold of 90%. ARGs that were present in the mobile gene catalogue and/or in natural plasmid sequences were considered as mobile.

Statistical analysis
Statistical analysis was performed using R (v4.1.1). The parametric two-sample t-test was used to assess the differences between the means of the groups of samples. Fisher's exact test was used to determine significant associations between two variables. Shannon alpha diversity index was used to characterize the diversity of DNA contigs in the libraries using the vegan package (v2.5-7) in R 66 . Data distribution was assumed to be normal, but this was not formally tested.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
Illumina reads and Nanopore contigs for this study have been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI under accession number PRJEB54063 (https://www.ebi.ac.uk/ena/browser/view/ PRJEB54063). Source data are provided with this paper.
https://doi.org/10.1038/s41564-023-01320-2 Extended Data Fig. 1 | Characterization of functional metagenomic library delivery by hybrid T7 bacteriophage transducing particles into the target clinical strains. a, Comparison of electroporation and transduction efficiencies. The figure shows that the maximum number of plasmids delivered to the hosts is at least two orders of magnitude higher by transduction than by electroporation in all 3 pathogenic host species (Centre and error bars represent mean and standard error (n= 3 biologically independent experiments)). Data available in Supplementary Table 2. b, PCR amplified metagenomic inserts from transduced cells. Following transduction and electroporation, the metagenomic DNA fragments were PCR amplified by using plasmid specific primers at both sides of the metagenomic DNA fragment and subsequently sequenced by capillary sequencing. This experiment differentiates monoclonal cells (single PCR product and DNA sequence) from those that were co-transduced by two or more plasmids (double bands on the gel and mixed signal in the capillary sequence). PCR was repeated in case of each host-library pair with similar results. c, During the generation of transducing bacteriophage particles, a large portion of phages remains replicative and kills the bacterial cells used for phage generation. Therefore, with the increasing phage concentration, transduction efficiency is not growing as would be expected, but declines. The figure shows the transduction efficiency of the T7 transducing phage particle harboring T7 phage tail (black line) on Shigella sonnei HNCMB 25021 at different dilutions (see Methods). Red dashed line shows the expected increase in transduction efficiency without any detectable killing effect of replicative phages. Data available in Supplementary Table 2. Fig. 2 | Distributions and transduction efficiencies of the most enriched mutations in the T7 and the ΦSG-JL2 tail fibre displaying hybrid T7 bacteriophage particles when selected on E. coli ΔwaaR model strain. a, The mutant T7 gp17 HRDRs usually carry specific combinations of mutations, 28% of which have been described as adaptive mutations previously. Heatmap representing the number of cases a mutation occurs in the 50 sequenced T7 phage tail HRDRs. Adaptive mutations according to (Huss et al. 28 ) are indicated with a red dot. The frequent combination of specific adaptive mutations indicates the potential of DIvERGE to find host-specificity altering mutations with high efficiency. Data available in Supplementary Table 4. b,c, Distribution of detected mutations across the mutagenized phage tail fibre genes Escherichia phage T7 gp17 and Salmonella phage ΦSG-JL2 gp17. Predicted HRDRs are distinguished via colorized regions as in (Yehl et al. 29 ) with the T3 bacteriophage. d, Transduction efficiencies of the mutant T7 (grey) and ΦSG-JL2 (yellow) phage tails as compared to their wild type counterparts with E. coli K12 BW25113 ΔtrxAΔwaaR LPS deficient strain. Y axis shows the number of transduced cells in 1 mL. Centre and error bars represent mean and standard error (n = 3 biologically independent experiments). Note that we did not observe any enriched mutant Salmonella phage Vi06 gp43, indicating that Salmonella phage Vi06 tail fibre binds to a cell surface receptor other than LPS. Data available in Supplementary Table 4. Fig. 3 | The effect of T7 gp17 V544G mutation on replicative phage contamination and on the transduction of the metagenomic plasmid. a, Schematic representation of transducing phage particle generation with T7 gp17 WT . During the first step of the process, the T7 bacteriophage lacking its tail fibre encoding genes in its genome but otherwise displaying the wild type T7 tail fibre infects the E. coli BW25113 cell carrying the metagenomic plasmid and a phage tail expressing plasmid. The infection results in the production of phage particles carrying either the metagenomic plasmid (transducing phage particle) or the phage genome (replicative phage) according to Yosef et al. If the phage tail encoded by the phage tail expressing plasmid (and therefore, displayed on the generated T7 particles) can efficiently infect E. coli, the replicative phage continoulsy accumulates during the process, since the phage genome containing phage particle can initiate a new reproduction cycle. b, The number of metagenomic plasmids that are delivered in Shigella sonnei HNCMB 25021 by the T7 phage particles harboring gp17 WT (blue) or gp17 V544G (green) tail fibers (twosample one-sided t-test, P = 0.01944. Centre and error bars represent mean and standard error, n = 3 biologically independent experiments). Data is available in Supplementary Table 5. c, Replicative phage contamination measured by plaque formation of the T7 transducing phage particles harboring the gp17 WT (blue) or gp17 V544G (green) tail fibers (see Methods). Plaque assay was carried out both with E. coli BW25113 and with S. sonnei HNCMB 25021 (Two-sample two-sided t-test, P = 0.000168 and P = 0.013476 when applied with E. coli and S. sonnei, respectively. Centre and error bars represent mean and standard error; n = 3 biologically independent experiments). Data is available in Supplementary  Table 5. d, Replicative phage contamination measured by transduced S. sonnei HNCMB 25021 colony numbers with T7 phage particles harbouring the gp17 WT (blue) or gp17 V544G (green) tail fibers. Lower amount of replicative phage in the T7 gp17 V544G transducing particle sample is indicated by the increasing colony numbers even at high concentrations of the transducing particle. Notably, unlike in Supplementary Fig. 5c, replicative phage activity is detected at the highest concentration of the transducing particle in this experiment. (n = 2 biological replicates. Centre and error bars represent mean and standard error.) Data available in Supplementary Table 5. e, Transduction efficiencies of the T7 phage particles harboring gp17 WT (blue) or gp17 V544G (green) tail fibers in E. coli BW25113 (two-sample two-sided t-test, P = 0.00553, n = 3 biologically independent experiments. Centre and error bars represent mean and standard error). Data is available in Supplementary Table 5. f, Schematic representation of the assumed transducing phage particle generation scheme with T7 gp17 V544G . The decreased transduction efficiency for E. coli BW25113 abolishes the reproduction of the replicative phage after the first infection cycle. Note that the first infection cycle is carried out by the T7 gp17 WT . In sum, the inefficient infection of E. coli by the mutant phage tail results in a lower amount of replicative phages.  34 )) with a modification that avoids PCR amplification of resistance-conferring metagenomic DNA fragments, and therefore, preserves the original composition of the samples (Methods). The workflow consists of the following steps. First, all the functional metagenomic plasmids obtained from the screens were pooled and then linearized using SrfI restriction endonuclease. SrfI has an eight base-pair-long recognition sequence to minimize the digestion of the metagenomic insert. The linearized plasmids are then subjected to Nanopore long-read sequencing (Methods). Long-read sequencing identifies the metagenomic DNA fragment (insert) and the two 10 nucleotide long random barcodes pre-cloned up-and down-stream (Uptag and Downtag, respectively) of each metagenomic DNA fragment (Methods). Parallel, prior pooling the metagenomic plasmids from each screen, a multiplexed short-read deepsequencing was applied to read out the plasmid-encoded unique barcodes on each side of the metagenomic fragments in each functional metagenomic screen. Specifically, the Uptag and Downtag sequences were PCR amplified with barcoded Illumina sequencing compatible primers (BC). Following illumina sequencing and demultiplexing of the samples using the BC barcodes, the Nanopore and Illumina datasets are combined to assign each plasmid (identified by the Up-and Downtags) to a screening batch that is a unique host, antibiotic and library combination.

Extended Data
Article https://doi.org/10.1038/s41564-023-01320-2 Extended Data Fig. 8 | Re-investigation of all resistance-conferring DNA fragments from the metagenomic screens. a, A significantly higher portion of ARGs not being detected to provide a resistance phenotype in any species were present on a single resistance-conferring DNA fragment as compared to ARGs being detected to provide a resistance phenotype in at least one species (Two-tailed Fisher's exact test, P = 0.032, n = 80, Supplementary Table 10). b, Estimated accuracy of the screen based on taking the MIC measurements as a gold standard dataset. Note that we excluded one ARG (QnrB73) from the MIC measurements, as re-introduction of this ARG into each of the four host bacterial species was not confirmed by sequencing of the plasmid library (Source Data File 9). Presence of resistance in the MIC dataset was defined as a more than two-fold change in relative MIC value. False negative hits are those ARGs that were not detected in the screen but showed a resistance phenotype in the MIC measurements. False positive hits are those that did not provide resistance in the MIC measurements but were detected to show a resistance phenotype in the screen. We assumed plasmid hitchhiking as a primary source of false positives.
Data is available in Supplementary Table 9 c, The distribution of adjusted Jaccard similarity coefficients that represent the overlaps of functional ARG sets between pairs of host species after controlling for measurement accuracy using a stochastic approach (Methods). Dashed line, blue line and red lines represent the measured Jaccard similarity coefficient for host species pairs, the median of the adjusted Jaccard similarity coefficients and the lower and upper bounds of the 95% confidence intervals, respectively. d, In total, only ~46% of the ARGs (~29 out of 63) are estimated to provide resistance in all four bacterial host species. Histogram shows the number of ARGs that are estimated to confer resistance in all four host species when taking into account the false positive and false negative rates of the screen by using a stochastic approach (see Methods). Dashed line, blue line and red lines represent the measured Jaccard similarity coefficient for host species pairs, the median of the adjusted Jaccard similarity coefficients and the lower and upper bounds of the 95% confidence intervals, respectively. (see Methods).
Corresponding author(s): Balint Kintses, Csaba Pál Last updated by author(s): Dec 27, 2022 Reporting Summary Nature Portfolio wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code Data collection We did not use any code or software to collect data for this study. For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy Illumina reads and Nanopore contigs for this study have been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI under accession number PRJEB54063 (https://www.ebi.ac.uk/ena/browser/view/PRJEB54063). Scripts and other files needed to reproduce the analysis are available at https://github.com/ stitam/Apjok-et-al-DEEPMINE-NatMicrobiol.