Abstract
Despite longstanding appreciation of gene expression heterogeneity in isogenic bacterial populations, affordable and scalable technologies for studying single bacterial cells have been limited. Although single-cell RNA sequencing (scRNA-seq) has revolutionized studies of transcriptional heterogeneity in diverse eukaryotic systems1,2,3,4,5,6,7,8,9,10,11,12,13, the application of scRNA-seq to prokaryotes has been hindered by their extremely low mRNA abundance14,15,16, lack of mRNA polyadenylation and thick cell walls17. Here, we present prokaryotic expression profiling by tagging RNA in situ and sequencing (PETRI-seq)—a low-cost, high-throughput prokaryotic scRNA-seq pipeline that overcomes these technical obstacles. PETRI-seq uses in situ combinatorial indexing11,12,18 to barcode transcripts from tens of thousands of cells in a single experiment. PETRI-seq captures single-cell transcriptomes of Gram-negative and Gram-positive bacteria with high purity and low bias, with median capture rates of more than 200 mRNAs per cell for exponentially growing Escherichia coli. These characteristics enable robust discrimination of cell states corresponding to different phases of growth. When applied to wild-type Staphylococcus aureus, PETRI-seq revealed a rare subpopulation of cells undergoing prophage induction. We anticipate that PETRI-seq will have broad utility in defining single-cell states and their dynamics in complex microbial communities.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Tools for microbial single-cell genomics for obtaining uncultured microbial genomes
Biophysical Reviews Open Access 08 September 2023
-
Single-cell massively-parallel multiplexed microbial sequencing (M3-seq) identifies rare bacterial populations and profiles phage infection
Nature Microbiology Open Access 31 August 2023
-
Droplet-based high-throughput single microbe RNA sequencing by smRandom-seq
Nature Communications Open Access 23 August 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout



Data availability
Raw data have been submitted to the Gene Expression Omnibus under accession number GSE141018. Source data are also provided for all figures. All of the figures except for Fig. 1 include original data. An overview of all of the experiments is provided in Supplementary Table 4. A count matrix for the three primary PETRI-seq experiments is provided in Supplementary Table 6.
Code availability
Relevant code for this manuscript is available from the corresponding author on request; current PETRI-seq code and protocols are available at https://tavazoielab.c2b2.columbia.edu/PETRI-seq/.
References
Tang, F. et al. mRNA-seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).
Ramsköld, D. et al. Full-length mRNA-seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012).
Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).
Fan, H. C., Fu, G. K. & Fodor, S. P. A. Expression profiling. Combinatorial labeling of single cells for gene expression cytometry. Science 347, 1258367 (2015).
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
Bose, S. et al. Scalable microfluidics for single-cell RNA printing and sequencing. Genome Biol. 16, 120 (2015).
Zheng, G. X. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Picelli, S. Single-cell RNA-sequencing: the future of genome biology is now. RNA Biol. 14, 637–650 (2016).
Sheng, K., Cao, W., Niu, Y., Deng, Q. & Zong, C. Effective detection of variation in single-cell transcriptomes using MATQ-seq. Nat. Methods 14, 267–270 (2017).
Cao, J. et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357, 661–667 (2017).
Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182 (2018).
Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).
Taniguchi, Y. et al. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science 329, 533–538 (2010).
Bartholomäus, A. et al. Bacteria differently regulate mRNA abundance to specifically respond to various stresses. Philos. Trans. R. Soc. A 374, 20150069 (2016).
Moran, M. A. et al. Sizing up metatranscriptomics. Isme J. 7, 237–243 (2013).
de Lange, N., Tran, T. M. & Abate, A. R. Electrical lysis of cells for detergent-free droplet assays. Biomicrofluidics 10, 024114 (2016).
Amini, S. et al. Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet. 46, 1343–1349 (2014).
Hodson, R. E., Dustman, W. A., Garg, R. P. & Moran, M. A. In situ PCR for visualization of microscale distribution of specific genes and gene products in prokaryotic communities. Appl. Environ. Microbiol. 61, 4074–4082 (1995).
Bloom, J. D. Estimating the frequency of multiplets in single-cell RNA sequencing from cell-mixing experiments. PeerJ 6, e5578 (2018).
Okayama, H. & Berg, P. High-efficiency cloning of full-length cDNA. Mol. Cell. Biol. 2, 161–170 (1982).
Kivioja, T. et al. Counting absolute number of molecules using unique molecular identifiers. Nat. Methods 9, 72–74 (2012).
Yang, S. et al. Decontamination of ambient RNA in single-cell RNA-seq with DecontX. Genome Biol. 21, 57 (2020).
Young, M. D. & Behjati, S. SoupX removes ambient RNA contamination from droplet based single-cell RNA sequencing data. Preprint at bioRxiv https://doi.org/10.1101/303727 (2020).
Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24, 417–441 (1933).
Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296 (2019).
Gentry, D. R., Hernandez, V. J., Nguyen, L. H., Jensen, D. B. & Cashel, M. Synthesis of the stationary-phase sigma factor σs is positively regulated by ppGpp. J. Bacteriol. 175, 7982–7989 (1993).
Almirón, M., Link, A. J., Furlong, D. & Kolter, R. A novel DNA-binding protein with regulatory and protective roles in starved Escherichia coli. Genes Dev. 6, 2646–2654 (1992).
Traxler, M. F. et al. The global, ppGpp-mediated stringent response to amino acid starvation in Escherichia coli. Mol. Microbiol. 68, 1128–1148 (2008).
Chen, H., Shiroguchi, K., Ge, H. & Xie, X. S. Genome-wide study of mRNA degradation and transcript elongation in Escherichia coli. Mol. Syst. Biol. 11, 781 (2015).
Vargas-Garcia, C. A., Ghusinga, K. J. & Singh, A. Cell size control and gene expression homeostasis in single-cells. Curr. Opin. Syst. Biol. 8, 109–116 (2018).
Diep, B. A. et al. Complete genome sequence of USA300, an epidemic clone of community-acquired meticillin-resistant Staphylococcus aureus. Lancet 367, 731–739 (2006).
Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Wheeler, D. L. GenBank. Nucleic Acids Res. 35, D21–D25 (2007).
Saint, M. et al. Single-cell imaging and RNA sequencing reveal patterns of gene expression heterogeneity during fission yeast growth and adaptation. Nat. Microbiol. 4, 480–491 (2019).
Grün, L., Kester, L. & Oudenaarden, A. Validation of noise models for single-cell transcriptomics. Nat. Methods 11, 637–640 (2014).
Raj, A., van den Bogaard, P., Rifkin, S. A., van den Oudenaarden, A. & Tyagi, S. Imaging individual mRNA molecules using multiple singly labeled probes. Nat. Methods 5, 877–879 (2008).
Abraham, J. M., Freitag, C. S., Clements, J. R. & Eisenstein, B. I. An invertible element of DNA controls phase variation of type 1 fimbriae of Escherichia coli. Proc. Natl Acad. Sci. USA 82, 5724–5727 (1985).
Deutsch, D. R. et al. Extra-chromosomal DNA sequencing reveals episomal prophages capable of impacting virulence factor expression in Staphylococcus aureus. Front. Microbiol. 9, 1406 (2018).
Balasubramanian, S., Osburne, M. S., BrinJones, H., Tai, A. K. & Leong, J. M. Prophage induction, but not production of phage particles, is required for lethal disease in a microbiome-replete murine model of enterohemorrhagic E. coli infection. Plos Pathog. 15, e1007494 (2019).
Blattman, S. B., Jiang, W., Oikonomou, P. & Tavazoie, S. Prokaryotic single-cell RNA sequencing by in situ combinatorial indexing. Preprint at bioRxiv https://doi.org/10.1101/866244 (2019).
Kuchina, A. et al. Microbial single-cell RNA sequencing by split-pool barcoding. Preprint at bioRxiv https://doi.org/10.1101/869248 (2019).
Brauner, A., Fridman, O., Gefen, O. & Balaban, N. Q. Distinguishing between resistance, tolerance and persistence to antibiotic treatment. Nat. Rev. Microbiol. 14, 320–330 (2016).
Girgis, H. S., Harris, K. & Tavazoie, S. Large mutational target size for rapid emergence of bacterial persistence. Proc. Natl Acad. Sci. USA 109, 12740–12745 (2012).
Franzosa, E. A. et al. Sequencing and beyond: integrating molecular ‘omics’ for microbial community profiling. Nat. Rev. Microbiol. 13, 360–372 (2015).
Lee, T. S. et al. BglBrick vectors and datasheets: a synthetic biology platform for gene expression. J. Biol. Eng. 5, 12 (2011).
Zaslaver, A. et al. A comprehensive library of fluorescent transcriptional reporters for Escherichia coli. Nat. Methods 3, 623–628 (2006).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBNet 17, 10–12 (2011).
Smith, T., Heger, A. & Sudbery, I. UMI-tools: modelling sequencing errors in unique molecular Identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Santos-Zavaleta, A. et al. RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12. Nucleic Acids Res. 47, D212–D220 (2019).
Taboada, B., Ciria, R., Martinez-Guerrero, C. E. & Merino, E. ProOpDB: Prokaryotic Operon DataBase. Nucleic Acids Res. 40, D627–D631 (2012).
Fu, G. K., Hu, J., Wang, P. & Fodor, S. P. A. Counting individual DNA molecules by the stochastic attachment of diverse labels. Proc. Natl Acad. Sci. USA 108, 9026–9031 (2011).
Tange, O. GNU Parallel 2018 (Ole Tange, 2018).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
Huang, Y., Sheth, R. U., Kaufman, A. & Wang, H. H. Scalable and cost-effective ribonuclease-based rRNA depletion for transcriptomics. Nucleic Acids Res. 48, e20 (2020).
Armour, C. D. et al. Digital transcriptome profiling using selective hexamer priming for cDNA synthesis. Nat. Methods 6, 647–649 (2009).
He, S. et al. Validation of two ribosomal RNA removal methods for microbial metatranscriptomics. Nat. Methods 7, 807–812 (2010).
Zhulidov, P. A. et al. Simple cDNA normalization using kamchatka crab duplex-specific nuclease. Nucleic Acids Res. 32, e37 (2004).
Acknowledgements
We thank the members of the Tavazoie laboratory for discussions and comments on early drafts of the manuscript; and P. Sims for suggestions during the early development of PETRI-seq. S.T. is supported by award no. 5R01AI077562 from the National Institutes of Health. S.B.B. is supported by a National Science Foundation Graduate Research Fellowship (no. DGE 16-44869). W.J. is supported by a fellowship from the Jane Coffin Childs Fund.
Author information
Authors and Affiliations
Contributions
W.J., S.B.B. and S.T. conceived the study. S.B.B., W.J. and S.T. designed experiments. S.B.B. and W.J. performed experiments and data analysis. P.O. assisted with computational analysis. S.B.B., W.J. and S.T. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Experimental and computational pipelines for PETRI-seq.
a–c, Experimental pipeline for PETRI-seq. PETRI-seq libraries can be prepared in just 2.5 days. (a) Detailed schematic of steps for cell preparation, which is started at the end of day 1 and finished on day 2. (b) Detailed schematic of steps for split-pool barcoding, which is entirely done on day 2. (c) Detailed schematic of steps for library preparation, which can be completed (up to sequencing) on day 3 (or later, if preferred). d, Computational pipeline for PETRI-seq analysis after sequencing. e, Structure of contig elements in read 1 after Illumina sequencing of PETRI-seq. To reduce the length of the sequence, barcodes overlap by one base (indicated by asterisk) with the adjacent linker sequence. f, Representative ‘knee plot’ used to select BCs for further analysis. The threshold line at 25,000 BCs is inclusive to facilitate additional filtering after collapsing PCR duplicates to UMIs. g, Representative histogram of reads per UMI. A threshold line was set for each library. For this library, only UMIs with more than 3 reads were kept for downstream analysis. Threshold line at log10(3). h, Species mixing plot with all BCs containing >0 UMIs for library 1.06SaEc. BCs with fewer than 20 UMIs per cell were removed from further analysis. Line segments at x = 20 and y = 20. i, Distribution of E. coli BCs from species mixing plot in h. BCs above the threshold line were used for further analysis and considered single E. coli cells. Threshold line at log2(20). j,k, PCAs of E. coli (orange) and S. aureus (blue) BCs from library 1.06SaEc. For calculation of principal components, rRNA operons were omitted and counts were normalized and scaled as described in methods. In j, all S. aureus and E. coli BCs with greater than 20 total UMIs and greater than 0 mRNAs are included (13,786 S. aureus, 1,153 E. coli). In k, only BCs with greater than or equal to 15 mRNA UMIs are included (6,683 S. aureus, 800 E. coli). For 100% of S. aureus BCs, PC1 < 0.05, and for 100% of E. coli BCs, PC1 > 4.
Extended Data Fig. 2 Development and preliminary optimization of PETRI-seq.
a, qPCR after in situ RT with random hexamers shows higher yield of rpsB cDNA from fixation without media (pelleting before) than fixation with media (formaldehyde added to culture) [n = 3 technically independent samples (dots), p = 0.012, 2-sided t-test]. Bars show mean abundance. b, Transcriptome stabilized by RNAprotect after 2-minute spin was highly correlated with transcriptomes stabilized immediately by either RNAprotect or flash freezing. Pearson’s r is reported. c, RNA purified from E. coli cells after 16-hour 4% formaldehyde fixation (‘Fixed Bulk’) was highly correlated with non-fixed RNA (‘Standard Bulk’). 2,617 operons included. Pearson’s r is reported. d, qPCR after in situ RT with rpsB-specific primer (SB10) showed similar yield when cells were resuspended in 50% ethanol (n = 2 technically independent samples). e, qPCR after in situ RT with random hexamers shows improved yield of rpsB cDNA after lysozyme treatment (n = 3 technically independent samples [dots], p = 0.001, 2-sided t-test). Bars show mean abundance. f, qPCR after DNase treatment or incubation with only DNase buffer confirmed in situ DNase treatment efficacy (n = 8 technically independent samples [dots], p = 0.035, 2-sided t-test). Bars show mean abundance. g, qPCR after in situ RT with rpsB-specific primer (SB10) confirmed DNase inactivation, as yield was unchanged (n = 2 technically independent samples [dots]). Bars show mean proportion. h, Gel of 775-bp PCR fragment after 1-hour incubation with DNase-treated cells confirmed DNase inactivation. Right-most lane: DNase was directly added to PCR product. Experiment conducted one time. i, Aggregated PETRI-seq UMIs from DNase-treated and untreated libraries were highly correlated. Pearson’s r reported. j, Bioanalyzer traces of RNA purified after in situ DNase treatment and cell lysis (methods). k, Imaging after E. coli cell preparation. Images for all libraries looked similar (n = 8). l, qPCR after bulk RT and ligation (methods) confirmed effective ligation with a 16-base linker. Minor increase (1.5×) in ligation efficiency was detected (p = 0.001, n = 3 technically independent samples [dots], 2-sided t-test). Bars show mean proportion. m, qPCR after in situ RT showed cDNA retention after AMPure purification (n = 4 technically independent samples, p = 0.69, 2-sided t-test). Bars show mean abundance. n,o, Second-strand synthesis yielded more mRNAs and operons per cell (p < 10−300, 2-sided Mann-Whitney U) than template switching. 10,000 BCs are included from unoptimized PETRI-seq (Experiment 1.08). Boxplots within violins show interquartile range (black box) and median (white circle).
Extended Data Fig. 3 Quantification of intercellular contamination using E. coli and S. aureus cells.
After defining single E. coli and S. aureus cells (Fig. 2b, Experiment 1.06SaEc), we examined levels of cross-contamination within single cells. Similar analysis for Experiment 2.01 is shown in Extended Data Fig. 7c, d. a, Quantification of S. aureus-aligned UMIs assigned to E. coli cells after standard PETRI-seq alignment (edit distance ≤1). Reads mapping equally well to both species are discarded. Bottom: Scatterplots of E. coli UMIs vs. absolute (left) or percent (right) S. aureus UMIs assigned to each E. coli cell. Top: Cumulative distributions corresponding to scatterplots. b, Quantification of E. coli-aligned UMIs assigned to S. aureus cells after standard alignment. Bottom: Scatterplots of S. aureus UMIs vs absolute (left) or percent (right) E. coli UMIs assigned to each S. aureus cell. Top: Cumulative distributions corresponding to scatterplots. c, mRNAs per E. coli cell in a. d, mRNAs per S. aureus cell in b. e,f, Same analysis as (a,b) but using more stringent alignment (edit distance = 0) to better understand source of contamination. g, mRNAs per E. coli cell in e. h, mRNAs per S. aureus cell in f. i,j, To further understand the impact of alignment on apparent cross-contamination, we used stringent alignment to map UMIs for a library of only E. coli (Experiment 1.10). Total UMIs (i) or percent of UMIs (j) assigned to S. aureus were determined after stringent alignment for a PETRI-seq library prepared with only E. coli. S. aureus UMIs are computational artifacts. E. coli cells include a mean of 0.02% S. aureus aligned UMIs, indicating that the majority of interspecies contamination observed in e is not caused by incorrect alignment. To quantify contamination, we needed to correct percentages of inter-species alignment based on species abundance in the library (25% of UMIs aligned to E. coli, 75% S. aureus) to predict the percent of UMIs in a given single-cell derived from any other cell (whether or not the same species). We predict a ‘corrected contamination rate’, or percent of UMIs in a single-cell transcriptome derived from another cell, of 0.19-0.36% \(\left( {\frac{{0.14}}{{0.75}} = 0.19;\frac{{0.09}}{{0.25}} = 0.36} \right)\).
Extended Data Fig. 4 Further evaluation of PETRI-Seq for E. coli and S. aureus in Experiment 1.06SaEc.
a,b,c, Breakdown of total aligned UMIs (a,b) or reads (c) per cell for PETRI-seq exponential GFP- and RFP-expressing E. coli (a), PETRI-seq exponential S. aureus (b), and bulk exponential wild-type E. coli (c). Left: Stacked bar shows breakdown of sense and anti-sense alignments. Right: Pie shows breakdown of rRNA and mRNA alignments within the sense fraction. d, Distributions of mRNA UMIs (left) and operons (right) per S. aureus cell. 13,785 cells are included. 2 cells were omitted as they contained zero mRNAs. Boxplots within violins show interquartile range (black box) and median (white circle). e, Distributions of mRNA UMIs (left) and operons (right) per E. coli cell in five sub-populations, including GFP cells (contain GFP plasmid transcripts), RFP cells (contain RFP plasmid transcripts), ambiguous cells (contain no plasmid transcripts), and either RFP or GFP and ambiguous cells. Three ambiguous cells classified as E. coli in Fig. 2B were omitted as they contained zero mRNAs. Boxplots within violins show interquartile range (black box) and median (white circle). f, Distribution of total RNAs per GFP-containing exponential E. coli cell. 609 cells are included. g, Left, growth curves for PrplN-GFP, Ptet-RFP, and MG1655 (no plasmid) cells with and without aTc. Right, doubling times calculated from the growth curves. Ptet-RFP had a significantly longer doubling time than all other strains/conditions when induced with aTc (n=4, p=2.2 * 10−5, 2.5 * 10−5, 2.1 * 10−5, 3.6 * 10−5, 2.6 * 10−5 [for each sample moving left to right], 2-sided t-test), which might explain fewer mRNA UMIs in these cells.
Extended Data Fig. 5 Further evaluation of growth phase characterization by PETRI-seq.
a, PCA of Experiment 1.06 (biological replicate of 1.10) shows that PETRI-seq can reproducibly distinguish between stationary and exponential cells by projecting cells onto the principal components calculated from the first library (bottom). 2,724 cells are included. 1,551 cells are left of the threshold (PC1=0.34), and 1,173 cells are right of the threshold. mRNA UMIs captured per cell on either side of the threshold line are shown (top). b, PCA as in Fig. 3b, but UMI counts were normalized using sctransform26. c, Expression along PC1 (Fig. 3b, Experiment 1.10) of operons with the most positive or negative PC1 loadings (z-scored moving average, size=1,000 cells). d, Distribution of mRNA UMIs per cell (Experiment 1.10) on either side of the threshold line in Fig. 3b. Grey cells (without plasmid UMIs) are included. Only cells with greater than 14 mRNA UMIs per cell were included, as cells with fewer were excluded from the PCA. 4,878 cells are left of the threshold, and 2,509 cells are right of the threshold. e,f, Breakdown of total aligned UMIs per cell for Experiment 1.10 for cells above and below the PC1 threshold in Fig. 3b. In e, Exponential E. coli (above the threshold) are shown and in f, stationary E. coli (below the threshold) are shown. Left: Stacked bar shows breakdown of sense and anti-sense alignments. Right: Pie shows breakdown of rRNA and mRNA alignments within the sense fraction.
Extended Data Fig. 6 Additional optimization of PETRI-seq by increasing ligation primer concentration and adding detergent during barcoding.
a, Increasing the concentration of round 3 ligation primers by 4x relative to previous experiments (1.06SaEc and 1.10) increases mRNA UMIs per cell 2.7-fold for GFP-expressing exponential (green) and RFP-expressing stationary E. coli cells (red). Boxplots within violins show interquartile range (black box) and median (white circle). b, Adding detergent (tween-20) to cells before ligation 1 and after ligation 3 increased mRNA UMIs per cell 1.4-fold relative to original PETRI-seq for wild-type exponential E. coli cells. Boxplots within violins show interquartile range (black box) and median (white circle). c, With 10x more RT primer relative to original PETRI-seq, we observed a shift in the breakdown of sense/anti-sense and mRNA/rRNA UMIs. Left: Stacked bar shows breakdown of sense and anti-sense alignments. Right: Pie shows breakdown of rRNA and mRNA alignments within the sense fraction. Proportions of anti-sense RNAs and sense rRNAs are significantly increased. We hypothesized that any condition effectively increasing the intracellular concentration of RT primers could lead to this undesirable shift. For this reason, detergent was only ever added after RT to avoid further permeabilizing cells and increasing the effective concentration of RT primer. d, Combining detergent treatment and increased ligation primer (for both rounds) resulted in higher mRNA capture for wild-type exponential E. coli cells. Detergent again increased mRNA UMIs per cell (1.5-fold). Boxplots within violins show interquartile range (black box) and median (white circle). e, Optimized PETRI-seq (4x ligation primer, detergent treatment) resulted in S. aureus transcriptomes with a median of 43 mRNA UMIs per cell (left) and 35 operons per cell (right). Boxplots within violins show interquartile range (black box) and median (white circle). f,g, Breakdown of total aligned UMIs per cell for optimized PETRI-seq (Experiment 2.01) for exponential (f) and stationary E. coli (g). Left: Stacked bar shows breakdown of sense and anti-sense alignments. Right: Pie shows breakdown of sense rRNA and mRNA alignments. h,i, Distributions of total UMIs per E. coli (h) and S. aureus (i) BCs in Experiment 2.01. Given higher capture, we imposed higher thresholds for distinguishing cells from background than used previously (Extended Data Fig. 1i). E. coli BCs with more than 128 total UMIs (threshold line in h) and S. aureus BCs with more than 32 total UMIs (threshold line in i) were considered cells.
Extended Data Fig. 7 Multiplet frequency and intercellular contamination for optimized PETRI-seq.
a, Species mixing plot for PETRI-seq with 4x ligation primers and no detergent. The multiplet frequency is 0.7%, which is 5-fold higher than the Poisson expectation of 0.14% for 2,423 BCs. b, Species mixing plot for PETRI-seq with 4x ligation primers and detergent (Experiment 2.01). The multiplet frequency is 2.8%, which is 4.7-fold higher than the Poisson expectation of 0.6% for 10,797 BCs. This indicates that compared to no detergent, detergent treatment did not significantly increase multiplet frequency relative to the Poisson expectation. In (a,b), E. coli BCs with > 128 total UMIs and S. aureus BCs with > 32 total UMIs were included. c,d, Quantification of cross-contamination for PETRI-seq with 4x ligation primers and no detergent (c, same experiment as a) or 4x ligation primers and detergent (d, Experiment 2.01 as in b). Scatterplots show the percent of total UMIs for each cell aligned to the incorrect species. Reads were aligned using the stringent alignment (edit distance = 0) described in Extended Data Fig. 3. Top left: Percent of S. aureus UMIs in exponential E. coli cells (based on first round barcode). Top right: Percent of S. aureus UMIs in stationary E. coli cells (based on first round barcode). Bottom left: Percent of E. coli UMIs in S. aureus cells barcoded with exponential E. coli (based on first round barcode). Bottom right: Percent of E. coli UMIs per S. aureus cell barcoded with stationary E. coli (based on first round barcode). As described in Extended Data Fig. 3, we used these inter-species contamination rates to predict a corrected contamination rate (including intra-species contamination). Though higher than the contamination rates observed in the previous species mixing experiment (Extended Data Fig. 3e, f), these rates are comparable to previous findings for eukaryotic scRNA-seq methods23,24 and are not affected by detergent treatment (c vs. d). Furthermore, we anticipate that contamination could be reduced by additional washing prior to cell lysis (see ‘Future directions for optimization’ in Methods).
Extended Data Fig. 8 Comparison of plasmid-labeled (Experiment 1.10) and RT-labeled (Experiment 2.01) mixed growth stage libraries reveals minimal cross-contamination between E. coli cells barcoded together.
In Experiment 2.01, exponential and stationary cells were prepared separately and then barcoded independently during RT. In contrast, the RFP-expressing stationary cells and GFP-expressing exponential cells barcoded in Experiment 1.10 were combined for fixation and barcoded together, resulting in more opportunity for cross-contamination. Experiment 2.01 is thus a useful reference to quantify this cross-contamination. To account for differences in the capture efficiency for the two experiments, cells were down-sampled to 30 mRNA UMIs. a, PCA for all 4 cell types reveals that the two stationary populations are biologically distinct, possibly because they were grown independently to slightly different ODs, and RFP cells were induced with aTc. In contrast, the two exponential populations appear very similar. b, PC1 was calculated using only the stationary cells from both experiments. Right: The receiver operating characteristic (ROC) shows that PC1 is a strong classifier of the two states. c, PC1 was calculated using only exponential cells from both experiments. Right: The ROC shows that PC1 is a weak classifier of the two exponential states with performance similar to random assignment (Area Under the ROC Curve [AUC]=0.5). d, PC1 was calculated using wild-type exponential cells from Experiment 2.01, GFP-expressing exponential cells from Experiment 1.10, and RFP-expressing stationary cells from Experiment 1.10 in order to quantify cross-contamination between the GFP and RFP cells using the wild-type exponential cells from Experiment 2.01 as a reference. Right: ROC shows that PC1 is a strong classifier of exponential and stationary cells. The probability that the PC1 value of a wild-type exponential cell is lower than the PC1 value of a stationary RFP cell is 99.9% (AUC = 0.999), while the probability that the PC1 value of a GFP exponential cell is lower than the PC1 value of a stationary RFP cell is 99.67% (AUC = 0.9967). Thus, for the GFP exponential cells, 23 out of 10,000 cell pairs (1 exponential, 1 stationary) will be incorrectly ranked due to cross-contamination in the GFP cells. Finally, we confirmed that in the original library for Experiment 1.10, the relative representation of UMIs from exponential and stationary cells were roughly equal (50.3% stationary, 45.6% exponential), indicating that the cross-contamination analysis for the GFP exponential population would be reciprocal for the RFP stationary population.
Extended Data Fig. 9 Defining consensus transcriptional states of sub-populations by aggregating single-cell transcriptomes.
a, Correlation between mRNA abundances from 3,547 aggregated wild-type exponential cells (Experiment 2.01) vs. bulk preparation from fixed exponential wild-type E. coli cells. The Pearson correlation coefficient (r) was calculated for 2,150 out of 2,612 total operons, excluding those with zero counts in either library (grey points), or for all 2,612 operons. Bulk library was prepared from the same cells as the PETRI-seq library. b, Bottom: The correlation between the aggregated mRNA counts of single exponential cells (PETRI-seq) and the bulk exponential library increases as more single cells are included. Correlations were calculated from log10(TPM + 1) for each sample. Top: Difference between top curve and bottom curve in plot below, based on best-fit lines (y = ln(x) + b, r > 0.98). c, Correlation between RNA abundances from 4,627 aggregated wild-type stationary cells (Experiment 2.01) vs. bulk preparation from fixed wild-type stationary E. coli cells. The Pearson correlation coefficient (r) was calculated for 2,050 out of 2,612 total operons, excluding those with zero counts in either library (grey points), or for all 2,612 operons. Bulk library was prepared from the same cells as the PETRI-seq library. d, Bottom: The correlation between the aggregated mRNA counts of single stationary cells (PETRI-seq) and the bulk stationary library increases as more single cells are included. Correlations were calculated from log10(TPM + 1) for each sample. Top: Difference between top curve and bottom curve in plot below, based on best-fit lines (y = ln(x) + b, r > 0.98).
Extended Data Fig. 10 PETRI-seq detects rare transcriptional states and candidate genes with highly variable expression.
a, PCA detects rare transcriptional states among 6,663 S. aureus cells. A small sub-population of 28 cells (red) expressed operons from the φSA3usa phage. b, Distribution of PC1 loadings for all operons included in the S. aureus analysis. Eight operons from the φSA3usa phage have the highest PC1 loadings. c, Map of genomic region33 surrounding φSA3usa in the genome of S. aureus strain USA300. Red arrows indicate phage operons upregulated along PC1. d, Percent of mRNA UMIs mapped to the φSA3usa phage for the 28 cells containing phage UMIs. Three cells are composed of >77% phage transcripts. e, Noise (σ2/μ2) versus mean (μ) for operon expression within an S. aureus population of 6,663 cells. 676 operons are included. The circled operon (red) is SAUSA300_1933-1925, which deviated significantly from the rest of the distribution (z-score = 20.6 [determined by residuals from linear regression (see methods)], p = 10−94, FDR < 0.01). f,g, Noise (σ2/μ2) versus mean (μ) for operon expression in either exponential (f) or stationary (g) E. coli populations from Experiment 2.01. 1,960 operons are included in (f) and 1,219 operons in (g). Five operons significantly (FDR < 0.01, z-scores determined by residuals from linear regression [see methods]) deviated from the other operons in (f): sip-dctR (z-score = 7.3, p = 3*10−13), murJ (z-score = 6.7, p = 3*10−11, fimAICDFGH (z-score = 5.4, p = 7*10−8), mdtL (z-score = 4.8, p = 1*10−6), rnhA (z-score = 4.6, p = 4*10−6). fimAICDFGH, which encodes the type I fimbriae system, has been shown previously to exhibit population-level phase variation that is mediated by transcriptional control37. In (e-g), lines at y = -x indicate Poisson noise where σ2 = μ. Operon counts were normalized for each cell before plotting. Operons with fewer than 6 raw total UMIs and a mean less than 0.002 after normalization were excluded.
Supplementary information
Supplementary Information
Supplementary Figs. 1 and 2, and Tables 1 and 2.
Supplementary Tables 3–5
Supplementary Table 3: 96-well oligonucleotides used for PETRI-seq barcoding. Supplementary Table 4: overview of experiments included in this study. Supplementary Table 5: supplementary statistical data for Fig. 3, Extended Data Fig. 2 and Extended Data Fig. 5.
Supplementary Table 6
Count matrix for experiments 1.06SaEc, 1.10 and 2.01, and Bulk Libraries. Anti-sense operons were excluded. BCs with the prefix SB346 are from experiment 1.06SaEc; 394A from 1.10; and SB442 from 2.01. Bulk libraries for stationary-phase RFP-expressing E. coli cells (SB369) and exponential-phase GFP-expressing E. coli cells (SB371) are also included; reads, rather than UMIs, are reported for bulk libraries. Operon names with the prefix ‘U00096:’ originate from E. coli, whereas operons with the prefix ‘CP000255:’ originate from S. aureus.
Source data
Source Data Fig. 2
Raw source data.
Source Data Fig. 3
Raw source data.
Source Data Extended Data Fig. 1
Raw source data.
Source Data Extended Data Fig. 2
Raw source data.
Source Data Extended Data Fig. 3
Raw source data.
Source Data Extended Data Fig. 4
Raw source data.
Source Data Extended Data Fig. 5
Raw source data.
Source Data Extended Data Fig. 6
Raw source data.
Source Data Extended Data Fig. 7
Raw source data.
Source Data Extended Data Fig. 8
Raw source data.
Source Data Extended Data Fig. 9
Raw source data.
Source Data Extended Data Fig. 10
Raw source data.
Rights and permissions
About this article
Cite this article
Blattman, S.B., Jiang, W., Oikonomou, P. et al. Prokaryotic single-cell RNA sequencing by in situ combinatorial indexing. Nat Microbiol 5, 1192–1201 (2020). https://doi.org/10.1038/s41564-020-0729-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41564-020-0729-6
This article is cited by
-
Machine learning for microbiologists
Nature Reviews Microbiology (2023)
-
Probe-based bacterial single-cell RNA sequencing predicts toxin regulation
Nature Microbiology (2023)
-
Understanding plant pathogen interactions using spatial and single-cell technologies
Communications Biology (2023)
-
Single-cell massively-parallel multiplexed microbial sequencing (M3-seq) identifies rare bacterial populations and profiles phage infection
Nature Microbiology (2023)
-
Droplet-based high-throughput single microbe RNA sequencing by smRandom-seq
Nature Communications (2023)