Introduction

Bacterial antimicrobial resistance has become an increasing threat to the effectiveness of antibiotics in therapeutic applications [1]. Well-established antibiotic-resistance mechanisms include chemical modifications of target enzymes, enzymatic inactivation of antibiotics, and efflux pumps [2]. Less appreciated is the catabolic ability conferring antibiotic mineralization in bacteria, termed antibiotic subsistence. Antibiotic-subsisting bacteria are considered to include a substantial reservoir of antibiotic-resistance determinants, such as the newly discovered flavoenzymes enabling tetracycline resistance through degradation reactions [3]. Because it is thought that resistance can be selected by environments to which antibiotics are eventually released, bacteria that can utilize antibiotics for catabolic energy generation may play an important role in reducing the concentration of antibiotics. Such activity may prevent the evolution of antimicrobial resistance in the environment [4]. Thus, having antibiotic-subsisting bacteria distributed in the environment is an ecological paradox in terms of the transferability of their genetic determinants [5,6,7].

As the earliest human-made synthetic antibiotics [8], sulfonamides remain one of the most commonly used antibiotics in agriculture [9], resulting in their widespread distribution in the environment [10]. Since the first identification of sulfonamide-subsisting bacteria in 2008 [11], a large number of follow-up studies have reported a phylogenetically diverse range of bacteria that can sustain sulfonamides [10]. Though those studies highlight the promising power of bacteria, the potential of biotechnology applications has yet to be unlocked. Some attempts have been made to harness this capacity, for example, by augmentation with single sulfonamide-subsisting bacteria [12] or designed microbiomes [13] in sulfonamide-contaminated environments. The presence of bacteria capable of sulfonamide subsistence is a prerequisite for detoxification, but their desired function can be impacted by extrinsic factors from the background community context, such as interactions between individual members during adaptation [14,15,16]. Tractable experiments under highly controlled conditions coupled with tools that permit adequate investigation of microbiome structure and function can overcome this bottleneck, enabling us to move beyond descriptive observations and connect statistical patterns in the assembly of large multispecies microbiomes with empirical findings.

Nevertheless, considering the ecological consequences of antibiotic subsistence, the potential dissemination of the genetic determinant of sulfonamide subsistence to pathogenic microbes is a major concern for applications related to the environment. Thus far, the flavin-dependent monooxygenase encoded by the sadA gene or its homologous gene sulX is the solely validated enzyme for sulfonamide subsistence [17, 18]. Both genes were reported to be adjacent to mobile elements implicated in horizontal gene transfer [18, 19]. However, as limited by genome sequencing tools and assembly approaches to overcome the fragmented nature of target genomic regions in which mobile elements are densely nested, diversity in the organization of sulfonamide metabolic clusters, variation between sulfonamide-subsisting bacterial species, and mechanisms that underlie its horizontal transfer have not been explored.

In this work, we studied the ecological selection that drives microbiomes to perform sulfonamide subsistence. We used wastewater microbiomes originating from six regional pools as inocula and assembled them using a sulfonamide antibiotic as sole carbon source. As illustrated in Fig. 1, we studied the microbiome assembly for sulfonamide subsistence across hierarchical structures of microbiomes: from communities and individual populations to pathways and genes. At the community level, we found that the assembly of biologically diverse microbiomes selects for the same three families, while allowing for species variability. We proposed primary degrader mediated determinism for the observed patterns. By leveraging short- and long-read sequencing, we resolved complete or near-complete genomes in addition to mobilomes (the collection of all mobile elements) from individual isolates capable of sustaining on sulfadiazine. Our data revealed the polymorphisms associated with the sulfonamide metabolic gene clusters that were previously undetected. More importantly, the specialized capability for sulfonamide subsistence is manifested as evolutionarily conserved, showing limited spread beyond the boundary of Micrococcaceae family.

Fig. 1: Top–down assembly of wastewater microbiomes on sulfadiazine antibiotic as sole carbon source.
figure 1

a Experimental scheme for the assembly of wastewater microbiomes on a single limiting carbon resource. b Factors that contribute to community-level assembly patterns. The top row refers to the encompass of genetic information in specified populations conferring sulfadiazine subsistence. The middle and bottom rows refer to ecological interactions in the community and impacts of initial inocula. c Family-level community structures before (t = 0 day) and after (t = 334 days, 83rd transfer) the passaging experiment based on 16S rRNA gene analysis.

Materials and methods

Sample collection and passaging wastewater microbiomes

We collected samples of wastewater (~500 mL) from aeration tanks in six full-scale municipal sewage treatment plants (STPs) in Hong Kong including Sai Kung STP, Shatin STP, Stanley (STL) STP, Shek Wu Hui (SWH) STP, Tai Po (TP) STP, and Yuen Long STP. The biomass from six wastewater samples was inoculated into ~160 mL of synthetic medium in six 1-L batch reactors. We used mineral source media containing sulfadiazine as sole carbon source as we previously reported [20]. Cultures in reactors were stirred with the same magnetic force for aeration and were allowed to grow for 4 days at room temperature. Passaging was performed by taking biomass from culture in each reactor to use as inoculum in 160 mL of fresh media, and bacteria were allowed to grow again. After the 6th transfer, the concentration of sulfadiazine was gradually increased to 20, 30, 50, 100, 150, and 200 mg/L at the 7th, 10th, 13th, 18th, 30th, and 38th transfers, respectively (Fig. S1). All cultures were passaged 83 times over a time course of 334 days. Biomass was measured by drying at 105 and 550 °C. Cell pellets from starting microbiomes and cultures after the 33rd (134 days), 57th (231 days), and 83rd (334 days) transfers were collected via centrifugation for DNA extraction and sequencing.

Chemical analysis

The liquid medium after each transfer was filtered by a 0.45-μm nylon membrane, and the total organic carbon concentration in the filtrate was measured by a total organic carbon analyzer (TOC-V CPH, Shimadzu). The mother compound sulfadiazine and its metabolite 2-amino pyrimidine in liquid media were analyzed with ultraperformance liquid chromatography coupled with tandem mass spectrometry (UPLC-MS/MS system, Waters). The analytical method as well as the operation parameters, including mobile phases and elution gradients for the UPLC-MS/MS were the same as those reported in our previous study [20].

Isolation of strains and genomic analysis

Serial dilutions of six assembled communities (one from each initial inoculum microbiome) were spread on agar plates containing the mineral source medium used in the passaging experiment and 50-mg/L sulfadiazine solidified with 1.5% agar and were allowed to grow at room temperature. Individual colonies were picked up and subjected to further purification. Using liquid mineral medium with sulfadiazine as the sole carbon source at room temperature, a pure culture was confirmed as capable of sulfadiazine subsistence when the disappearance of sulfadiazine, the accumulation of the metabolite 2-amino pyrimidine, and the cell growth were all observed.

A total of eight pure cultures from Arthrobacter (including six isolated in this study and the other two isolated in our previous study) were sequenced using both short- and long-read sequencing. For shot-read sequencing, extracted DNA of collected samples via the Fast DNA Spin Kit for Soil (MP Biomedicals) was sent to Novogene company for sequencing using the HiSeq (Illumina). For long-read sequencing, briefly, total DNA from each isolate was extracted using the DNeasy PowerSoil Kit (Qiagen). Individual libraries were constructed using the SQK-RAD004 Rapid Sequencing Kit (Oxford Nanopore Technologies). Barcoded libraries from eight isolates were pooled and subjected to MinION sequencing using R9.4 flow cells (FLO-MIN 106). After base calling using Albacore (v2.1.10), raw reads were generated in fastq format and were trimmed with PoreChop (v0.2.3) to remove adapters. Hybrid assembly of short and long reads was performed using Unicycler [21].

A phylogenetic tree of isolates was constructed based on 135 protein sequences encoded by bacterial single-copy genes as reported by Campbell et al. [22] using Anvi’o (v5). The constructed tree was then midpoint rooted using FigTree (v1.4.4) (https://github.com/rambaut/figtree/). Genome annotation was performed using the NCBI Prokaryotic Genome Annotation Pipeline [23]. The alignments of the isolated genomes were performed by Mauve [24]. The ICEberg 2.0 [25] was used to detect integrative and conjugative elements (ICEs) in isolated genomes. The oriTfinder [26] was used to detect the oriT sites. The PlasFlow [27] was used to differentiate plasmids and chromosomes in the assembled genomes. The copy number of plasmids was derived by depicting the short-read coverage of individual pure cultures obtained by read mapping. Plasmids were categorized into three different types (conjugative, mobilizable and nonmobilizable) by querying sequence databases of relaxase, type IV coupling protein (T4CP), and the gene cluster for bacterial type IV secretion system (T4SS) using Plas-CAD (https://github.com/pianpianyouche/Plas-CAD).

16S rRNA gene sequencing and analysis

Cell pellets collected at days 0 and 334 were subjected to 16S rRNA gene sequencing. Total DNA of the cell pellets from 2 mL of culture was extracted using a Fast DNA Spin Kit for Soil (MP Biomedicals). The V3V4 region of 16S rRNA gene was amplified using the primer pair of 341F (ACTCCTACGGGAGGCAGCAG) and 806R (GGACTACHVGGGTWTCTAAT), followed by sequencing by the MiSeq (Illumina) using a dual-index 300-bp paired-end sequencing protocol at BGI company. The sequencing depth was ~100,000 reads for each sample. The raw reads were processed with QIIME 2 [28] and DADA2 [29] to infer Amplicon Sequence Variants (ASVs). The taxonomy of ASVs was assigned by a Naïve Bayes classifier [30] using the 16S rRNA gene database of the Genome Taxonomy Database (GTDB) [31] (https://zenodo.org/record/2541239#.XohOZ9MzYnU). Rare ASVs with a prevalence of 1 across all samples and rare phyla encompassing less than a total number of 10 unique ASVs were pruned from downstream analysis. Diversity and ordination analysis was conducted using the R “phyloseq” package [32]. Differential abundance analysis was conducted using the R “DESeq2” package [33]. Permutational multivariate analysis of variance test was conducted using the R package “vegan” [34]. The Wilcoxon was calculated in R using the command “wilcox.test.”

Metagenomic sequencing, binning, and analysis

We performed short-read metagenomic sequencing on the seeding microbiomes and enriched communities at the 33rd, 57th, and 83rd transfers. One hundred and fifty paired-end reads with a sequencing depth of ~10 gigabase pairs of data were generated for each sample. Quality filtration of raw reads was conducted via the KneadData pipeline (https://bitbucket.org/biobakery/kneaddata/wiki/Home) to remove eukaryote reads and low-quality reads (Trimmomatic [35] parameter is SLIDINGWINDOW:3:20 and MINLEN:100). The k-mer signatures of each dataset were computed and compared by sourmash [36]. To facilitate genome binning, we also downloaded raw data of eight metagenomic datasets on microbial communities capable of sulfonamide subsistence [37, 38] from NCBI SRA. In addition, we employed a strategy encompassing single-sample assembly and binning as well as coassembly and cobinning for the same reactor. Briefly, clean reads from each sample were first individually assembled into contigs using metaSPAdes (v3.13) [39]. Due to memory requirements larger than >500 G of RAM, coassembly of raw reads from the same reactor spanning all time points was conducted in CLC Genomics Workbench (v12.0, QIAGEN Bioinformatics). Contigs with lengths <1000 nts were excluded from downstream analyses. BWA-MEM [40] (v 0.7.17) and SAMtools [41] were used for read recruitment analyses.

We used four binning algorithms including CONCOCT [42] (v1.0.0, default parameters), MaxBin 2.0 [43] (v2.2.6, default parameters), MetaBAT 2 [44] (v2.12.1, option “-minContig 2000, --minCV 2”), and the bin refinement module embedded in MetaWRAP [45] (v0.8, default parameters). Comparisons of binning results of assembled contigs derived from four algorithms were performed with anvi’o (v5). The binning process generated 9542 MAGs, which were subjected to quality filtration and dereplication via dRep. We first performed quality filtration of all MAGs obtained from four binning methods. MAGs with a completeness <50% and contamination >10% estimated from CheckM [46] were discarded. Then, we performed the dereplication for the quality-filtered MAG to group them into species-level genome bins (SGBs). MAGs with <5% genetic diversity were grouped into the same SGBs. After filtering rare species with low prevalence (Fig. S2), a total of 335 species-level MAGs (comparable to species-level phylotypes) were employed in subsequent analyses. The relative abundance of each MAG across all samples was calculated by dividing the total length of reads recruited to the MAG by the total length of the raw reads. To avoid the compositionality issue caused by relative abundance, the obtained matrix of relative abundance of MAGs was transformed into total genome count under even sequencing depth of 10 G by normalizing the genome size. Taxonomy of MAGs was assigned by using GTDB-Tk [47] against the GTDB genome database [31]. The count matrix and the taxonomic assignment were used for diversity and ordination analyses using the R package “phyloseq.” We further used GhostKOALA [48] to generate annotations of KEGG Orthology (KO) for each MAG. We used the SIMPER test to identify the most differentially abundant KO.

The sadA-carrying contigs were identified by BLASTp [49] against the sadA-encoded protein database collected from seven isolates whose sulfonamide subsistence capabilities were experimentally validated. The isolates included Microbacterium lacus SDZm4 (NCBI nr accession, WP_100813237.1) [50], Microbacterium sp. C448 (WP_081766351.1) [51], Microbacterium sp. BR1 (WP_100812327.1) [17], Arthrobacter sp. D2 (OEH61722.1, OEH57813.1) [20], Arthrobacter sp. D4 (OEH63558.1) [20], and Microbacterium sp. (WP_103663397.1) [18]. The identify and length cutoff for BLASTp is 95% and 70%, respectively. The identified sadA-carrying contigs were aligned to the circular chromosome of Arthrobacter sp. D2 using BLASTn. The alignment results were inspected via Kablammo [52]. Taxonomic classification of contigs was performed by Kraken2 [53] as well as Kaiju [54]. Annotation of contigs was performed using the NCBI Prokaryotic Genome Annotation Pipeline. To classify the taxonomy of sadA-carrying contigs in complex communities, we relied on sequence composition instead of the binning method based on coverage. Fifty out of fifty-three were successfully assigned. We performed hybrid assembly of short and long reads for the community (TP_d134 community) in which the unassigned contigs was from. We also assessed the presence of sadA in 156,279 genomes in GenBank (downloaded in April 2019) by BLASTp, and the protein sequences encoded by sadA were collected as previously described with an E-value cutoff of 10−5.

For the cooccurrence network analysis, pairwise Spearman’s rank correlations were calculated from the count matrix of MAGs using the “corr.test” from the R package “psych.” Only robust correlations with coefficient >0.6 and adjusted p value < 0.01 were retained. The result was visualized in Gephi (https://gephi.org/). Network figures showing all subset regressions were conducted using the “regsubsets” from the R package “leaps.” A ternary plot was drawn using the R package “ggtern.”

Functional metagenomics and analysis

To select genomic fragments conferring sulfonamide subsistence from assembled communities, we followed an established large-insert functional metagenomic protocol [55] using the CopyControl Fosmid Library Production Kits (Epicenter). Following the manufacturer’s introduction, total DNA extracted from a representative community was sheared and blunt-end repaired. Then, the DNA fragments (~40 kb) were size-selected on agarose gel, followed by recovery and purification. The size-selected gel-purified DNA fragments constituted the insert sequence library were ligated to the CopyControl pCC1FOS fosmid vector (chloramphenicol resistant) and were individually packaged by replication-deficient phage for subsequent cloning into Escherichia coli strain EPI-300 via phage infection. The Escherichia coli carrying insert sequences were allowed to grow on agar plates at room temperature for selection using a modified agar plate. The growth medium in the modified agar plate contained the mineral medium used in the passaging experiment, 50-mg/L sulfadiazine as the sole carbon source, and 12.5-mg/L chloramphenicol as the selectable marker for fosmid-carrying Escherichia coli. All transformed clones from the constructed library were selected on sulfadiazine mineral salt agar plates in the presence of chloramphenicol. Identifiable colonies grown independently on the sole supplied carbon source were isolated. We selected eight individual clones grown on the plates for short and long sequencing. Raw reads generated from individual clones were hybrid-assembled using Unicycler with default parameters.

Results and discussion

Assembly patterns at the community level

We first used 16S rRNA gene sequencing to compare community compositions with high coverage and sensitivity. The six inocula derived from different pools of wastewater microbiomes were diverse and taxonomically rich, containing between 633 and 1485 ASVs. Despite different biological staring points, the same three families (Saprospiraceae, Nitrospiraceae, and Micrococcaceae) appeared in the final communities after the passaging experiment (Fig. 1). While Nitrospiraceae was similar in relative abundance, the other two families were inversely correlated. For rare functional niches that are highly specialized, such as hydrogen-oxidizing autotrophic denitrification [56], the same specialist taxa tend to be selected in a persistent and reproducible manner across microbial communities. But in all our assembled communities, three families always emerged. This suggests that sulfonamide subsistence is not a “narrow” function, and it may be carried out via different machineries, such as the reported sadA-mediated catabolism in Micrococcaceae [17, 18] and the cometabolism in Nitrospiraceae [57]. The Saprospiraceae may also be able to subsist on sulfonamide, and the putative mechanism is unknown. When we conducted pairwise comparison of final communities (quantified by β diversity), we found a significant reduction in between-community similarities from the family to genus level (Mann–Whitney U test, p = 0.008, Fig. S3), suggesting that different taxa within the same family rose to prominence in the samples. Indeed, as identified by differential analysis of abundance changes after the whole experiment (DEseq2, p < 0.01), not all the ASVs under the same dominant family were significantly enriched (Fig. S4).

To resolve genomes for community members that are responsible for the structure variability, we further performed short-read metagenomic sequencing on communities sampled at days 0, 134, 231, and 334 (0, 33rd, 57th, and 83rd transfer, respectively) considering the changing sulfadiazine concentrations. We uncovered metagenome-assembled genomes (MAGs) that were collapsed into 335 species (pairs of MAGs having an average nucleotide identity (ANI) > 95% [58], Fig. S5), representing a median value of 84% (interquartile range, 10%) of the raw metagenomic reads. In line with 16S rRNA gene analysis, when we grouped MAGs by family-level taxonomy, the same three families were still the abundant groups. However, we also noticed a high fraction of the family SM1A02 belonging to the phylum Planctomycetota coexisted with the three families in the final communities (Fig. S6). We found that the species-level variability in community composition is associated with the initial inoculum. As shown in Fig. 2a, when plotting an NMDS clustering of all MAGs based on the Bray–Curtis dissimilarity of their relative abundances across the communities, similar ordering patterns of the most abundant MAGs were differentiated by the initial inoculum. Pairwise similarity in composition is significantly higher between assembled communities started from the same inoculum but sampled at different time points than communities started from different inoculum but sampled at the same time. In other words, the latter has higher between-community variability in compositions. However, there was no significantly increased degree of similarity between communities inoculated from different initial microbiomes (Wilcoxon test, p > 0.01; Fig. 2b) with longer enrichment time. By monitoring the assembly of complex wastewater microbiomes on sulfadiazine as the single limiting carbon source, we observed the occurrence of three families in all assembled communities, as well as substantial species variability associated with initial inoculum. By assessing the KO annotation frequency and the abundance of individual MAGs, we found the most differentially abundant KO between the seed communities and the enriched communities were KOinvolved in quorum sensing, ABC transporters, ABC-2 type transport system ATP-binding protein, and ABC-2 type transport system permease protein.

Fig. 2: Species-level variability in sulfadiazine-subsisting communities is associated with initial inoculum.
figure 2

a Differentiation in community compositions based on MAG relative abundance. Columns represent individual assembled communities and were organized by their initial inocula. Rows represent individual MAGs and were ordered using ordination methods as reported [68]. b Between-community similarity as quantified by β diversity (1-Bray–Curtis dissimilarity) was significantly higher for communities assembled from the same inoculum than different inocula by the Wilcoxon test.

Identification of primary functional groups

Primary degraders for sulfonamide are pioneers perceived to be specialized in unlocking the bottleneck process in sulfonamide subsistence, secreting 2-amino pyrimidine as intermediate metabolite for their partner bacteria. This process directs the carbon flow from primary degraders to generalist byproduct utilizers who are unable to degrade sulfadiazine directly but can subsist on the byproducts [38]. We first attempted to identify specialized degraders based on experimental evidence. When we spread six sulfadiazine-subsisting communities assembled from different inocula on agar plates, we reproducibly isolated Arthrobacter colonies belonging to the family Micrococcaceae. Monocultures of six representative Arthrobacter isolates were able to assimilate sulfadiazine and produce 2-amino pyrimidine as the principal metabolite with varying efficiency (Fig. 3). Genomes of all these isolates harbored the SadA gene, which encodes a flavin-dependent monooxygenase [17, 18] responsible for the bottleneck breakdown of sulfonamide through an ipso-hydroxylation reaction mechanism, showing a shared strategy for sulfonamide catabolism in these isolates.

Fig. 3: Identification of the primary degrader for sulfadiazine.
figure 3

Sulfadiazine assimilation by isolated strains (left), and the simultaneous accumulation of 2-amino pyrimidine as the principal metabolite (right).

To identify other potential primary degraders that might employ novel genetic mechanisms for sulfadiazine subsistence, we applied a functional metagenomic protocol [3, 59] to one representative sulfadiazine-subsisting community by extracting community DNA, shearing it to 30–40-kb fragments, and ligating it into a fosmid vector using single-copy cloning to express the target sequence in Escherichia coli (for details see “Materials and methods”). We found that sulfadiazine-subsisting genes are underrepresented in the pool. Among all the eight positive Escherichia coli clones capable of growing on sulfadiazine as the sole carbon source, we were able to recover two circular, intact fosmids from the assembly results, both carrying a ~33-kb insert sequence (Table S1). By comparing sequence composition to reference whole genomes from the GTDB database [31] using a k-mer-based classifier [53], these two insert sequences were resolved to families Nocardioidaceae and Burkholderiaceae, which are not dominant in communities. Both insert sequences did not carry the sadA gene, indicating the presence of other genetic bases for sulfonamide subsistence in the community.

Next, we sought to link the experimentally validated functionality to MAGs. We found two primary degrader MAGs: one exhibited a high degree of similarity in ANI compared to the eight Arthrobacter isolate genomes (median ANI value of 99.9% for 98.5% of the isolate whole-genome regions), and the other MAG included a contig that entirely covered the Nocardioidaceae fosmid insert sequence with an identical nucleotide sequence. The ability of Nocardioidaceae to degrade sulfonamide antibiotics identified here by functional metagenomics is consistent with previous studies using DNA- and protein-stable isotope probing approaches [60].

Factors in determining microbial community-level assembly

Partitioning of the data by microbial cooccurrence divides all MAGs belonging to significantly enriched families into seven modules (Fig. 4a). Each module has its own taxonomic structure (Fig. S7). When we consider individual modules in terms of inoculum origins, we found inconsistent adaptive trajectories (Fig. 4b). The ordination of assembled communities based on module abundance showed clear separation by the initial inoculum (Fig. S8). When we considered module adaptation under changing experimental conditions, we observed that the community structure evolution followed three major adaptive patterns (Fig. 4c): (1) for longer and stronger selective pressure of sulfadiazine, there was a tendency to select species in modules D, E, and F; (2) species in module C transiently bloomed during our passaging experiment and, (3) species in modules A, B, and G showed restrictive adaptation toward final time point under our changing experimental conditions. By using all subsets regressions (Fig. S9), we found that a simple combination of MAG abundances in each enrichment culture had high predictive power for the compositional variations of the module (Fig. 4d). For modules E and F, variations resulted from the abundance shift of a single species, which is either the primary degrader or a bacterium selected by the primary degrader as revealed by cooccurrence patterns. In module D, two MAGs were identified as the best explanatory factor set. Identification of Top1_f_Saprospiraceae as one of the explanatory factors again implies its important role. A Planctomycetes MAG (Top5_f_SM1A02) and an Actinobacteriota MAG (Top10_f_Nocardioidaceae) coexisted with Top1_f_Saprospiraceae, suggesting a niche partitioning or potential metabolic facilitation between these two MAGs. However, the mechanism needs to be further studied. There is the possibility that the postulated secondary degrader could be a primary degrader. Except for these two major functional groups, populations that can establish their carbon source from dead cells may also coexist within the major groups in the community.

Fig. 4: Detection of determinism in the assembly of sulfadiazine-subsisting communities.
figure 4

a Partitioning of significantly enriched MAGs into functional modules via cooccurrence. b Adaptive dynamics of individual modules in terms of initial inocula. c Adaptive dynamics of individual modules in terms of changing experimental conditions. d Primary degraders and their partner bacteria predict abundance changes of enriched modules.

Specific catabolic gene cluster for sulfonamides

Having reproducibly isolated Arthrobacter spp. as sulfadiazine primary degraders, we sought to reveal the innate basis for such specialist bacteria to subsist on sulfadiazine. Enabled by ever-expanding tools for genomic sequencing and analysis, the genetic determinant of sulfonamide subsistence was resolved and compared at finer genetic scales for reported specialist bacteria (Table 1). The canonical configuration of the sulfonamide catabolic gene cluster contains three functional genes: the keystone sadA encoding an FMNH2-dependent monooxygenase for bottleneck oxidation of sulfonamide, sadB encoding monooxygenase for downstream metabolite oxidation, and the sadC encoding flavin reductase. Reports using heterologous expression indicated that the absence of sadA can abolish pathway functionality, while the roles of the latter two genes in the sulfonamide catabolism have been proven to be substitutable [17, 18]. Indeed, all isolates harbored sadA but sadB and sadC were recovered in nine out ten isolates (Table S2). Although all Arthrobacter isolates are closely related strains (>99.9% whole-genome ANI with each other), we observed substantial differences in the structure and content of genetic contexts encompassing the sadA gene. These include variation in genomic locus and copy number of the sadA, as well as recombination that could shuffle small chromosomal regions near sadA (Fig. 5a). The sadA gene is either distributed in chromosomal regions or extrachromosomal loci such as plasmids. Certain strains consist of two copies of sadA that reside ~10 kb away from each other in the chromosome, such as Arthrobacter sp. SWH, STL, and D2. Interestingly, the proteins encoded by two cooccurring sadA genes in the chromosome are distinct, sharing ~94% of their whole sequences with an identity of ~79%. For sadA genes located in extrachromosomal loci, multiple copies were found in four strains. In contrast, the other strains do not contain any sadA genes outside their chromosomes. Homologous alignments of sadA-residing regions from all isolated chromosomes reveal conserved synteny but variable gene contents, which may be caused by multiple recombination events. For example, gain or loss of genes that were located near sadA loci via transposition was observed in two Microbacterium strains (Fig. S10) as well as in four closely related Arthrobacter strains (Fig. S11). By comparing gene annotations of sadA-residing regions, we found that in all isolates, the sadA gene is flanked by regions in which transposable elements are densely nested, which could facilitate the transposition. The observed intraspecies variations could be associated with the potentially distinct sulfonamide-subsisting capacities as shown in Fig. 3. The divergence in the structure and content of the gene cluster encompassing the determinant sadA gene for sulfonamide subsistence gives us a look into the steps that occurred in the evolutionary assembly of this pathway, while the adaptive importance of pathway organization diversity, as well as how this pathway originates, and how it is maintained in bacteria are questions that have yet to be answered.

Table 1 Pure cultures capable of sulfonamide subsistence.
Fig. 5: Structures and contents of sulfadiazine metabolic gene clusters in sulfonamide-subsisting isolates.
figure 5

a Various copy numbers and conserved synteny (as indicated by the same color block) but variable gene contents of genomic loci encompassing the sadA gene in isolated genomes. “Mobilome” represents the collection of all mobile elements. Strains ST, SK, YL, TP, SWH, STL, D4, and D2 affiliated with genus Arthrobacter. Strains CJ77 and BR1 were affiliated with the genus Microbacterium. b The circular chromosome in Arthrobacter sp. D2 includes a 12-kbp integrative and mobilizable element (IME) encompassing the sadA gene. c SadA-carrying IME in Arthrobacter sp. D2 was aligned to circular plasmids in another two Arthrobacter strains.

Constrained transfer of sulfonamide metabolic genes

Once genes encoding beneficial functions have been established in specialist bacteria, they can be spread across multiple species within the community via horizontal gene transfer mediated by transformation, conjugation, or bacteriophage transduction [61]. We then inspected the collection of all mobile elements (termed the “mobilome”) that are associated with sadA. We recovered a chromosome-borne integrative and mobilizable element (IME) as well as two nonconjugative plasmids as mobile carriers for sadA. Arthrobacter sp. D2 has a chromosomal locus containing a ~12-kb IME whose coverage value is inconsistent from other chromosomal regions (Fig. 5b). This sadA-carrying IME is located within a genomic island and encodes a typical DDE-type integrase-mediated recombination apparatus. However, it lacks an intact conjugative module that enables autonomous conjugative transfer [62, 63]. Some particular IMEs are not self-transmissible but can be mobilized by conjugative plasmids or ICEs via recognition and binding at the origin of the transfer (oriT) region [26, 64], but the IME of strain D2 did not retain any oriT sites. As another important vehicle to disseminate gene cassettes across bacterial species, the class 1 integron requires an integrase and a site-specific recombination site (attI) to fulfill its mobilization functionality [65], while for strain D2, the class 1 integron residing nearby sadA also did not retain any attI sites. Nonetheless, the sadA-carrying IME of strain D2 can functionate as hitchhiker and was observed to be picked up by plasmids from another two closely related Arthrobacter strains (Fig. 5c). However, these two plasmids are mobilizable but nonconjugative because although they both contain the relaxase gene, they are devoid of conjugal module including the gene for T4CP and the gene cluster for bacterial T4SS [66]. Overall, the above evidence suggests a limited horizontal transfer potential of the sadA gene via conjugation machinery. These results provided a mechanistic explanation of the low probability of a horizontal transfer of sadA.

Wastewater microbiomes assembled in sulfadiazine media are dense and taxonomically rich microbial communities under sulfonamide selection, representing a favored condition for the horizontal transfer of genes conferring sulfonamide subsistence. By combining short- and long-read sequencing, we validated that the sadA was only found in the Micrococcaceae lineage in our assembled sulfadiazine-subsisting communities. A BLAST search revealed that 50 out of all 53 sadA-carrying contigs assembled from enrichment cultures were aligned to the chromosome of Arthrobacter sp. D2 (Table S3). Given the length of contigs, there’s still the possibility that these contigs are derived from other lineages following the horizontal gene transfer. The other three unaligned contigs were relatively short and identical to each other. Further inspection of the community (TP_d134 community) from which the unaligned contig was assembled resolved two structurally different sadA-carrying contigs. Sequence composition-based taxonomic classifier assigned the genus Microbacterium affiliated with the Micrococcaceae lineage as the host for both contigs. Taxonomic classification that relies on protein-level sequence signature comparison [54] also supports this result. Specifically, sadA in one ~150-kb contig was found within a set of gene clusters belonging to a Microbacterium bacteria as the best hit of this contig in the NCBI nonredundant nucleotide database (Fig. S12a). The other 77-kb sadA-carrying contig was identified as a circular plasmid of Microbacterium based on its genomic composition signatures using a neural network approach [27]. Notably, conjugative apparatus including T4CP and T4SS were not found in its sequence, suggesting that this plasmid is nonmobilizable. Overall, this analysis shows the narrow range of bacterial hosts of sadA under the selective conditions that may favor its transfer among bacteria, suggesting the infrequent dissemination potential of sadA within the wastewater microbiome.

Due to the physicochemical differences, such as different sulfonamide concentrations between laboratory microcosms and open microbial systems, it is natural to ask to what extent the sadA can be transferred outside the laboratory. Homology searches against ~156-k genomes from the NCBI GenBank database [67] revealed that the keystone monooxygenase encoded by sadA rarely has related proteins in other families, displaying narrow distributions across the publicly available bacterial genome archive (Fig. S12b). Given the paucity of homologous proteins found, it is likely that there are currently few bacteria outside the Micrococcaceae lineage acquired the sadA via horizontal transfer. Overall, these results suggested that the sadA is less likely to occur in genetic contexts of other taxa except for the Micrococcaceae lineage. Taken together, these results collectively support the conclusion that the Micrococcaceae lineage is a specialized degrader of sulfonamides and that the sadA could be evolutionarily conserved in this specialist lineage.

Conclusions

The ability of primary degraders to select bacterial associates represents a mechanistic link between species interactions and microbial community assembly. By combining physiological data and metagenomic analysis, we show that microbiome assembly for sulfonamide subsistence depends not only on the innate properties of the primary degrader but also on the context of the initial inoculum, as well as interactions between these two impacting factors. Given the low prevalence of sadA exchange in assembled microbiomes under favored conditions and the low potential of autonomously conjugative transfer as indicated by the configuration of gene clusters, the genetic determinant toward sulfonamide subsistence is likely confined within the specialist bacterial lineage.