Abstract
Understanding microbial gene functions relies on the application of experimental genetics in cultured microorganisms. However, the vast majority of bacteria and archaea remain uncultured, precluding the application of traditional genetic methods to these organisms and their interactions. Here, we characterize and validate a generalizable strategy for editing the genomes of specific organisms in microbial communities. We apply environmental transformation sequencing (ET-seq), in which nontargeted transposon insertions are mapped and quantified following delivery to a microbial community, to identify genetically tractable constituents. Next, DNA-editing all-in-one RNA-guided CRISPR–Cas transposase (DART) systems for targeted DNA insertion into organisms identified as tractable by ET-seq are used to enable organism- and locus-specific genetic manipulation in a community context. Using a combination of ET-seq and DART in soil and infant gut microbiota, we conduct species- and site-specific edits in several bacteria, measure gene fitness in a nonmodel bacterium and enrich targeted species. These tools enable editing of microbial communities for understanding and control.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
The Pathfinder plasmid toolkit for genetically engineering newly isolated bacteria enables the study of Drosophila-colonizing Orbaceae
ISME Communications Open Access 24 May 2023
-
Addressable and adaptable intercellular communication via DNA messaging
Nature Communications Open Access 24 April 2023
-
Opportunities and challenges of using metagenomic data to bring uncultured microbes into cultivation
Microbiome Open Access 12 May 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout





Data availability
Source data are provided with this paper. Summary data for genomes, plasmids and oligonucleotides used in this study can be found in Table 1 and Supplementary Tables 1 and 3–5. Sequence data for all genomes assembled as part of this study are available at NCBI under bioproject ID PRJNA774280. For accession numbers associated with genomes assembled in previous studies, please see Supplementary Table 1. Genomes and sequences used in the project will also be made available on ggKbase (https://ggkbase.berkeley.edu/). Full plasmid sequences are available in Supplementary Table 3. Raw count data for all experiments, including both metagenome and ET-seq information, are available at https://github.com/SDmetagenomics/ETsuite/tree/master/manuscript_data. VcDART and ShDART plasmids will be made available through Addgene. Plasmids, oligonucleotides and microbial isolates used in this manuscript will also be made available from the authors upon request.
Code availability
Custom R scripts for ET-seq analysis and code used in the construction of figures are available at https://github.com/SDmetagenomics/ETsuite (https://doi.org/10.5281/zenodo.5597397).
References
Steen, A. D. et al. High proportions of bacteria and archaea across most biomes remain uncultured. ISME J. 13, 3126–3130 (2019).
Pascual-García, A., Bonhoeffer, S. & Bell, T. Metabolically cohesive microbial consortia and ecosystem functioning. Philos. Trans. R. Soc. Lond. B Biol. Sci. 375, 20190245 (2020).
Fux, C. A., Shirtliff, M., Stoodley, P. & Costerton, J. W. Can laboratory reference strains mirror ‘real-world’ pathogenesis? Trends Microbiol. 13, 58–63 (2005).
Pukall, R., Tschäpe, H. & Smalla, K. Monitoring the spread of broad host and narrow host range plasmids in soil microcosms. FEMS Microbiol. Ecol. 20, 53–66 (1996).
De Gelder, L., Vandecasteele, F. P. J., Brown, C. J., Forney, L. J. & Top, E. M. Plasmid donor affects host range of promiscuous IncP-1β Plasmid pB10 in an activated-sludge microbial community. Appl. Environ. Microbiol. 71, 5309–5317 (2005).
Musovic, S., Oregaard, G., Kroer, N. & Sørensen, S. J. Cultivation-independent examination of horizontal transfer and host range of an IncP-1 plasmid among Gram-positive and Gram-negative bacteria indigenous to the barley rhizosphere. Appl. Environ. Microbiol. 72, 6687–6692 (2006).
Musovic, S., Klümper, U., Dechesne, A., Magid, J. & Smets, B. F. Long-term manure exposure increases soil bacterial community potential for plasmid uptake. Environ. Microbiol. Rep. 6, 125–130 (2014).
Klümper, U. et al. Broad host range plasmids can invade an unexpectedly diverse fraction of a soil bacterial community. ISME J. 9, 934–945 (2015).
Ronda, C., Chen, S. P., Cabral, V., Yaung, S. J. & Wang, H. H. Metagenomic engineering of the mammalian gut microbiome in situ. Nat. Methods 16, 167–170 (2019).
Brophy, J. A. N. et al. Engineered integrative and conjugative elements for efficient and inducible DNA transfer to undomesticated bacteria. Nat. Microbiol. 3, 1043–1053 (2018).
Farzadfard, F., Gharaei, N., Citorik, R. J. & Lu, T. K. Efficient retroelement-mediated DNA writing in bacteria. Cell Syst. 12, 860–872 (2021).
Vo, P. L. H. et al. CRISPR RNA-guided integrases for high-efficiency, multiplexed bacterial genome engineering. Nat. Biotechnol. 39, 480–489 (2020).
Hsu, B. B., Way, J. C. & Silver, P. A. Stable neutralization of a virulence factor in bacteria using temperate phage in the mammalian gut. mSystems 5, e00013–e00020 (2020).
Hsu, B. B. et al. In situ reprogramming of gut bacteria by oral delivery. Nat. Commun. 11, 5030 (2020).
Sheth, R. U., Cabral, V., Chen, S. P. & Wang, H. H. Manipulating bacterial communities by in situ microbiome engineering. Trends Genet. 32, 189–200 (2016).
Wu, L. R., Chen, S. X., Wu, Y., Patel, A. A. & Zhang, D. Y. Multiplexed enrichment of rare DNA variants via sequence-selective and temperature-robust amplification. Nat. Biomed. Eng. 1, 714–723 (2017).
Strecker, J. et al. RNA-guided DNA insertion with CRISPR-associated transposases. Science 365, 48–53 (2019).
Klompe, S. E., Vo, P. L. H., Halpin-Healy, T. S. & Sternberg, S. H. Transposon-encoded CRISPR–Cas systems direct RNA-guided DNA integration. Nature 571, 219–225 (2019).
Petassi, M. T., Hsieh, S.-C. & Peters, J. E. Guide RNA categorization enables target site choice in Tn7-CRISPR-Cas transposons. Cell 183, 1757–1771 (2020).
Lou, Y. C. et al. Infant gut strain persistence is associated with maternal origin, phylogeny, and functional potential including surface adhesion and iron acquisition. Cell Rep. Med. 2, 100393 (2021).
Picard, B. et al. The link between phylogeny and virulence in Escherichia coli extraintestinal infection. Infect. Immun. 67, 546–553 (1999).
Viladomiu, M. et al. Adherent-invasive E. coli metabolism of propanediol in Crohn’s disease regulates phagocytes to drive intestinal inflammation. Cell Host Microbe 29, 607–619 (2021).
Dogan, B. et al. Inflammation-associated adherent-invasive Escherichia coli are enriched in pathways for use of propanediol and iron and M-cell translocation. Inflamm. Bowel Dis. 20, 1919–1932 (2014).
Leimbach, A., Hacker, J. & Dobrindt, U. E. coli as an all-rounder: the thin line between commensalism and pathogenicity. Curr. Top. Microbiol. Immunol. 358, 3–32 (2013).
Olm, M. R. et al. inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains. Nat. Biotechnol. 39, 727–736 (2021).
Diamond, S. et al. Mediterranean grassland soil C-N compound turnover is dependent on rainfall and depth, and is mediated by genomically divergent microorganisms. Nat. Microbiol. 4, 1356–1367 (2019).
He, C. et al. Genome-resolved metagenomics reveals site-specific diversity of episymbiotic CPR bacteria and DPANN archaea in groundwater ecosystems. Nat. Microbiol. 6, 354–365 (2021).
Laurenceau, R. et al. Toward a genetic system in the marine cyanobacterium Prochlorococcus. Access Microbiol. 2, acmi000107 (2020).
Adler, B. A. et al. Systematic discovery of salmonella phage-host interactions via high-throughput genome-wide screens. Preprint at https://www.researchgate.net/publication/340988219_Systematic_Discovery_of_Salmonella_Phage-Host_Interactions_via_High-Throughput_Genome-Wide_Screens (2020).
Liu, H. et al. Magic Pools: parallel assessment of transposon delivery vectors in bacteria. mSystems 3, e00143–17 (2018).
Egbert, R. G. et al. A versatile platform strain for high-fidelity multiplex genome editing. Nucleic Acids Res. 47, 3244–3256 (2019).
Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–D200 (2021).
Kalvari, I. et al. Non-coding RNA analysis using the rfam database. Curr. Protoc. Bioinformatics 62, e51 (2018).
Price, M. N. et al. Mutant phenotypes for thousands of bacterial genes of unknown function. Nature 557, 503–509 (2018).
Liu, H. et al. Functional genetics of human gut commensal Bacteroides thetaiotaomicron reveals metabolic requirements for growth across environments. Cell Rep. 34, 108789 (2021).
Devon, R. S., Porteous, D. J. & Brookes, A. J. Splinkerettes—improved vectorettes for greater efficiency in PCR walking. Nucleic Acids Res. 23, 1644–1645 (1995).
Barquist, L. et al. The TraDIS toolkit: sequencing and analysis for dense transposon mutant libraries. Bioinformatics 32, 1109–1111 (2016).
Peng, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinormatics. 11, 119 (2010).
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
Chen, L.-X., Anantharaman, K., Shaiber, A., Eren, A. M. & Banfield, J. F. Accurate and complete genomes from metagenomes. Genome Res. 30, 315–333 (2020).
Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).
Beghain, J., Bridier-Nahmias, A., Le Nagard, H., Denamur, E. & Clermont, O. ClermonTyping: an easy-to-use and accurate in silico method for Escherichia genus strain phylotyping. Microb. Genom. 4, e000192 (2018).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Zhao, L., Liu, Z., Levy, S. F. & Wu, S. Bartender: a fast and accurate clustering algorithm to count barcode reads. Bioinformatics 34, 739–747 (2018).
Costello, M. et al. Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms. BMC Genomics 19, 332 (2018).
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2020); https://www.R-project.org/
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Acknowledgements
We thank M. N. Price for data analysis input, P. Pausch for experimental advice, S. L. McDevitt, E. Wagner and H. Asahara for help with sequencing, B. A. Adler for helpful discussions and T. R. Northen for directional advice. Funding was provided by m-CAFEs Microbial Community Analysis & Functional Evaluation in Soils (m-CAFEs@lbl.gov) a Science Focus Area led by Lawrence Berkeley National Laboratory and supported by the US Department of Energy, Office of Science, Office of Biological & Environmental Research under contract no. DE-AC02-05CH11231. This research was developed with funding from the Defense Advanced Research Projects Agency award no. HR0011-17-2-0043. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the US Government. This material is based upon work supported by the National Science Foundation under award no. 1817593. Support was also provided by the Innovative Genomics Institute at UC Berkeley. J.A.D. is an Investigator of the Howard Hughes Medical Institute. B.E.R. and B.F.C. are supported by the National Institute of General Medical Sciences of the National Institute of Health under award nos. F32GM134694 and F32GM131654. Y.C.L. was supported by a National Institute of Health award (no. RAI092531A). A.L.B. was supported by a Miller Basic Science Research Fellowship at University of California, Berkeley. C.H. was supported by a Camille and Henry Dreyfus Foundation postdoctoral fellowship in environmental chemistry. Schematics used in Figs. 1a, 2a, 3b, 4a, 4c, Extended Data Fig. 2, 5a, 5c and 5e were created with BioRender.com.
Author information
Authors and Affiliations
Contributions
B.E.R., S.D., B.F.C., R.B., A.M.D., J.F.B. and J.A.D. conceived the work and designed the experiments. B.E.R. led the establishment of microbial communities and development of ET-seq. S.D. led bioinformatics and development of ETSuite. B.F.C. led genetic design and development of DART systems. B.E.R., B.F.C., A.L.B., C.H., M.X., Z.Z., D.C.J.S., K.T., T.K.O., N.K. and R.R. conducted the molecular biology included. S.D., A.C.-C., Y.C.L., H.S., C.H., R.S. and S.J.S. developed and conducted bioinformatic analysis. B.E.R., S.D., B.F.C., Y.C.L., R.B., A.M.D., J.F.B. and J.A.D. analyzed and interpreted data.
Corresponding authors
Ethics declarations
Competing interests
The Regents of the University of California have patents pending related to this work on which B.E.R., S.D., B.F.C., A.M.D., J.F.B. and J.A.D. are inventors. J.A.D. is a cofounder of Caribou Biosciences, Editas Medicine, Scribe Therapeutics, Intellia Therapeutics and Mammoth Biosciences. J.A.D. is a scientific advisory board member of Vertex, Caribou Biosciences, Intellia Therapeutics, eFFECTOR Therapeutics, Scribe Therapeutics, Mammoth Biosciences, Synthego, Algen Biotechnologies, Felix Biosciences, The Column Group and Inari. J.A.D. is a Director at Johnson & Johnson and Tempus and has research projects sponsored by Biogen, Pfizer, AppleTree Partners and Roche. J.F.B. is a founder of Metagenomi. R.B. is a shareholder of Caribou Biosciences, Intellia Therapeutics, Locus Biosciences, Inari, TreeCo and Ancilia Biosciences. All other authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Library preparation and data normalization for ET-Seq.
a, ET-Seq requires low-coverage metagenomic sequencing and customized insertion sequencing. Insertion sequencing relies on custom splinkerette adaptors, which minimize non-specific amplification, a digestion step for degradation of delivery vector containing fragments, and nested PCR to enrich for fragments containing insertions with high specificity. The second round of nested PCR adds unique dual index adaptors for Illumina sequencing. b, This insertion sequencing data is first normalized by the reads to internal standard DNA which is added equally to all samples and serves to correct for variation in reads produced per sample. Secondly, it is normalized by the relative metagenomic abundances of the community members.
Extended Data Fig. 2 Measurement and correction of chimeric reads.
a, The response of chimeric reads, measured as total normalized read counts to insertions into wildtype S. meliloti DNA spiked-in before library preparation, to increasing quantities of donor vector. Plot is log10 scaled on the x and y-axis for readability. Dashed lines indicate log-log linear fit to data (R2No Correction = 0.86, n = 7 biological replicates; R2Correction = 0.92, n = 7 biological replicates) b, Frequency of read properties (imperfect insert sequence = single difference in last 5 bp of transposon right end from expected sequence; imperfect host sequence = mismatch in first 3 bp of genomic sequence at transposon genome junction when aligned to host genome) identified as strongly associated with S. meliloti insertions, in which all reads are expected to be chimeric, used as markers for filtering chimeric reads. Box plots indicate median and bound 1st and 3rd quartile, whiskers indicate max/min values (n = 7 biological replicates). Plot is log10 scaled on the y-axis for readability. c, Fraction of insertion mapping reads filtered out of each dataset, for each organism/vector (n = 7 biological replicates) following chimera filtering. Box plots indicate median and bound 1st and 3rd quartile, whiskers indicate max/min values. Plot is log10 scaled on the y-axis for readability.
Extended Data Fig. 3 ET-Seq determined insertion efficiencies for all nine consortium members as a fraction of the entire community.
ET-Seq determined insertion efficiencies for conjugation, electroporation, and natural transformation on the synthetic soil community (n = 3 biological replicates). The values shown are the estimated fraction a constituent species’s transformed cells make of the total community population. Control samples received no exogenous DNA. Average relative abundance across all samples is indicated in parentheses (n = 18 independent samples).
Extended Data Fig. 4 Benchmarking DART vectors.
a, E. coli WM3064 to E. coli BL21(DE3) conjugation, transposition, and selection schematic (top) and guide RNAs targeting the lacZ α-fragment of recipient BL21(DE3), which is absent from donor WM3064 (bottom). b,d,f, Percent selectable transposed colonies is calculated as the number of colonies obtained with gentamicin selection divided by total viable colonies in absence of selection. b, Insertion-receiving colonies divided into on- and off-targeted. This was calculated by multiplying % selectable colonies for representative guides in d and f (highlighted by grey bars) by the on- or off-target rates (shown in Fig. 2b). c, Transposition with VcDART was tested using three promoters. The variant using the Plac promoter, harvested from pHelper_ShCAST_sgRNA16, was also used for Figs. 2–5 and Extended Data Figs. 4b, 5, 6, and 8. d, Efficiencies of VcDART using various promoters. e, Transposition with ShDART was tested with three transcriptional configurations, all using Plac16. The configuration used for characterization of ShCasTn originally16 was also used for Fig. 2 and Extended Data Fig. 4b. f, Efficiencies of ShDART using various promoters. b, d, f, Crossbar indicates mean and error bars indicate one standard deviation from the mean (n = 3 biological replicates). Guide RNAs ending in ‘NT’ are non-targeting negative control samples.
Extended Data Fig. 5 Sanger sequencing of VcDART mutants from the synthetic soil microbial community.
a, Representative Sanger sequencing chromatogram of PCR product spanning transposon insertion site at targeted pyrF locus in K. michiganensis and b, in P. simiae mutant colonies following VcDART-mediated transposon integration and selection. Target-site duplications (TSD) are indicated with dashed boxes.
Extended Data Fig. 6 Insertion counts in Ralstonia sp. after metabolic enrichment for P. simiae.
a, Raw number of paired end reads in shotgun sequencing analysis detected as spanning a transposon-genome junction for the P. simiae and Ralstonia sp. genomes in each of three replicate enrichment samples. b, Number of paired end reads detected normalized to the coverage of each genome within each respective sample. The mean number of inserts normalized to coverage were compared between P. simiae and Ralstonia sp. (MeanPsim = 0.1250; MeanRal = 0.0042) and were significantly different (P-value = 0.00058; two-sample t-test).
Extended Data Fig. 7 Relative abundance of stool sample inoculum and infant gut community used for VcDART editing.
The gut microbiome compositions were obtained by read mapping to 1005 reference genomes from Lou et al. 2021. Bar height represents normalized subspecies relative abundance, and bars are colored by strain.
Extended Data Fig. 8 ET-Seq determined insertion efficiency for the infant gut community.
Insertion efficiency as quantified by ET-Seq for nine microbial species determined to be present by metagenomic sequencing. Experimental samples were conjugated with a donor containing the unguided mariner transposon (pHLL250; n = 3 biological replicates). Control samples did not receive the donor (n = 3 biological replicates). Percentages next to species names indicate their mean relative fraction in the infant gut community, averaged across the 6 biological replicate experiments performed.
Extended Data Fig. 9 Target site locus and strain comparisons for selective enrichment from the infant gut community.
a, Clinically relevant gene clusters targeted by VcDART for selective enrichment included a locus associated with fimbriae biosynthesis (top) and a propanediol utilization gene cluster (bottom). Insets show mapped reads to these loci in E. coli subsp. 2 and subsp. 3, which were assembled from enrichment culture shotgun sequencing data. The right end of the VcDART transposon cargo was assembled (green), is bridged to the genome, and is supported by paired end read mapping. VcDART target sites (protospacer) are indicated in dark red. b, Dendrogram displaying average nucleotide identity differences between all E. coli genomes analyzed as part of the infant gut community. Strains in black were genomes originally recovered from metagenomic assembly in Lou, et al. 2021. Strains in red were assembled out of enrichment cultures in this study.
Extended Data Fig. 10 Location of VcDART transposon insertions in isolated E. coli mutant colonies following infant gut community editing.
a, Insertion orientations and locations relative to target site were determined by locus-specific PCR and Sanger sequencing on colonies picked from selective solid medium after editing the infant gut community with VcDART guided by the fimbriae associated locus-targeting guide RNA and b, the propanediol metabolism locus-targeting guide RNA (n = 3 biological replicates).
Supplementary information
Supplementary Tables
Supplementary Tables 1–5.
Source data
Source Data Fig. 1
Numerical Source data for plots.
Source Data Fig. 2
Numerical Source data for plots.
Source Data Fig. 3
Numerical Source data for plots.
Source Data Fig. 4
Numerical Source data for plots.
Source Data Fig. 5
Numerical Source data for plots.
Source Data Extended Data Fig. 2
Numerical Source data for plots.
Source Data Extended Data Fig. 3
Numerical Source data for plots.
Source Data Extended Data Fig. 4
Numerical Source data for plots.
Source Data Extended Data Fig. 6
Numerical Source data for plots.
Source Data Extended Data Fig. 7
Numerical Source data for plots.
Source Data Extended Data Fig. 8
Numerical Source data for plots.
Source Data Extended Data Fig. 10
Numerical Source data for plots.
Rights and permissions
About this article
Cite this article
Rubin, B.E., Diamond, S., Cress, B.F. et al. Species- and site-specific genome editing in complex bacterial communities. Nat Microbiol 7, 34–47 (2022). https://doi.org/10.1038/s41564-021-01014-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41564-021-01014-7
This article is cited by
-
Addressable and adaptable intercellular communication via DNA messaging
Nature Communications (2023)
-
Nifty new tools for microbiome treatment design
Nature Reviews Gastroenterology & Hepatology (2023)
-
The Pathfinder plasmid toolkit for genetically engineering newly isolated bacteria enables the study of Drosophila-colonizing Orbaceae
ISME Communications (2023)
-
First full views of a CRISPR-guided system for gene insertion
Nature (2023)
-
Precise cut-and-paste DNA insertion using engineered type V-K CRISPR-associated transposases
Nature Biotechnology (2023)