Cell atlas projects and high-throughput perturbation screens require single-cell sequencing at a scale that is challenging with current technology. To enable cost-effective single-cell sequencing for millions of individual cells, we developed ‘single-cell combinatorial fluidic indexing’ (scifi). The scifi-RNA-seq assay combines one-step combinatorial preindexing of entire transcriptomes inside permeabilized cells with subsequent single-cell RNA-seq using microfluidics. Preindexing allows us to load several cells per droplet and computationally demultiplex their individual expression profiles. Thereby, scifi-RNA-seq massively increases the throughput of droplet-based single-cell RNA-seq, and provides a straightforward way of multiplexing thousands of samples in a single experiment. Compared with multiround combinatorial indexing, scifi-RNA-seq provides an easy and efficient workflow. Compared to cell hashing methods, which flag and discard droplets containing more than one cell, scifi-RNA-seq resolves and retains individual transcriptomes from overloaded droplets. We benchmarked scifi-RNA-seq on various human and mouse cell lines, validated it for primary human T cells and applied it in a highly multiplexed CRISPR screen with single-cell transcriptome readout of T cell receptor activation.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Two versions of the source code underlying this paper are provided as separate GitHub repositories: (1) analysis code for reproducing the results in this paper (https://github.com/epigen/scifiRNA-seq_publication); (2) pipeline code for processing new scifi-RNA-seq datasets (https://github.com/epigen/scifiRNA-seq).
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Stoeckius, M. et al. Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome Biol. 19, 224 (2018).
McGinnis, C. S. et al. MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. Nat. Methods 16, 619–626 (2019).
Guo, C. et al. CellTag Indexing: genetic barcode-based sample multiplexing for single-cell genomics. Genome Biol. 20, 90 (2019).
Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).
Huang, Y., McCarthy, D. J. & Stegle, O. Vireo: Bayesian demultiplexing of pooled single-cell RNA-seq data without genotype reference. Genome Biol. 20, 273 (2019).
Heaton, H. et al. Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes. Nat. Methods 17, 615–620 (2020).
Cao, J. et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357, 661–667 (2017).
Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182 (2018).
Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).
Christina Fan, H., Fu, G. K. & Fodor, S. P. A. Combinatorial labeling of single cells for gene expression cytometry. Science 347, 1258367 (2015).
Gierahn, T. M. et al. Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nat. Methods 14, 395–398 (2017).
Han, X. et al. Mapping the mouse cell atlas by Microwell-seq. Cell 172, 1091–1107.e17 (2018).
Han, X. et al. Construction of a human cell landscape at single-cell level. Nature 581, 303–309 (2020).
Shum, E. Y., Walczak, E. M., Chang, C. & Christina Fan, H. Quantitation of mRNA transcripts and proteins using the BD Rhapsody single-cell analysis system. Adv. Exp. Med. Biol. 1129, 63–79 (2019).
Srivatsan, S. R. et al. Massively multiplex chemical transcriptomics at single-cell resolution. Science 367, 45–51 (2020).
Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).
Shapiro, E., Biezuner, T. & Linnarsson, S. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat. Rev. Genet. 14, 618–630 (2013).
van der Wijst, M. et al. The single-cell eQTLGen consortium. eLife 9, e52155 (2020).
Rozenblatt-Rosen, O. et al. The Human Tumor Atlas Network: charting tumor transitions across space and time at single-cell resolution. Cell 181, 236–249 (2020).
Datlinger, P. et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat. Methods 14, 297–301 (2017).
Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882.e21 (2016).
Dixit, A. et al. Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866.e17 (2016).
Replogle, J. M. et al. Combinatorial single-cell CRISPR screens by direct guide RNA capture and targeted sequencing. Nat. Biotechnol. 38, 954–961 (2020).
Jaitin, D. A. et al. Dissecting immune circuits by linking CRISPR-pooled screens with single-cell RNA-seq. Cell 167, 1883–1896.e15 (2016).
McFarland, J. M. et al. Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action. Nat. Commun. 11, 4296 (2020).
Cusanovich, D. A. et al. Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348, 910–914 (2015).
Lareau, C. A. et al. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat. Biotechnol. 37, 916–924 (2019).
Vitak, S. A. et al. Sequencing thousands of single-cell genomes with combinatorial indexing. Nat. Methods 14, 302–308 (2017).
Mulqueen, R. M. et al. Highly scalable generation of DNA methylation profiles in single cells. Nat. Biotechnol. 36, 428–431 (2018).
Ramani, V. et al. Massively multiplex single-cell Hi-C. Nat. Methods 14, 263–266 (2017).
Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018).
Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. 37, 1452–1457 (2019).
Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 183, 1103–1116.e20 (2020).
Bock, C., Farlik, M. & Sheffield, N. C. Multi-omics of single cells: strategies and applications. Trends Biotechnol. 34, 605–608 (2016).
Rykalina, V., Shadrin, A., Lehrach, H. & Borodina, T. qPCR-based characterization of DNA fragmentation efficiency of Tn5 transposomes. Biol. Methods Protoc. 2, bpx001 (2017).
Salvatier, J., Wiecki, T. V. & Fonnesbeck, C. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2, e55 (2016).
We thank the team of the Biomedical Sequencing Facility at CeMM for assistance with next-generation sequencing and all members of the Bock laboratory for their help and advice. P.D. would like to thank N. Winhofer and K. Winhofer for their support. This work was conducted in the context of two Austrian Science Fund (FWF) Special Research Program grants (FWF SFB F6102; FWF SFB F7001). T.K. was supported by a Lise Meitner fellowship from the Austrian Science Fund (FWF M2403). C.B. is supported by an ERC Starting Grant (European Union’s Horizon 2020 research and innovation program, grant agreement no. 679146).
P.D. and C.B. are inventors on a patent application describing scifi barcoding and the scifi-RNA-seq method. The other authors declare no competing interests.
Peer review information Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Lei Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Droplet overloading for the Chromium scATAC v.1.0 and v.1.1 Next GEM microfluidic chips.
a-b, Representative microscopy images of droplets (top rows) and histograms showing the number of nuclei per droplet (bottom rows) at different loading concentrations (15,300, 191,000, 383,000, 765,000, and 1,530,000 nuclei per channel) for the Chromium scATAC v1.0 chip (panel a) and for the scATAC v1.1 Next GEM chip (panel b). To obtain these images, lysis reagents were omitted from the cell loading experiment, and a total of 3,265 (scATAC v1.0) or 4,509 (scATAC v1.1 Next GEM) droplets were manually counted. Moreover, the number of beads per droplet (rightmost image and diagram) was visualized and counted based on a loading experiment in which the nuclei suspension was substituted by 1x Nuclei Buffer, while Reducing Agent B was omitted. c, Despite substantial droplet overloading, stable droplet emulsions were obtained for all tested conditions. d, Box plots showing the droplet diameters for the Chromium scATAC v1.0 and scATAC v1.1 Next GEM microfluidic chips at different loading concentrations. For each setup, 100 droplets were evaluated. Box plots depict the interquartile range with marked median and whiskers extending to 1.5 times the interquartile range. e, Histogram showing droplet diameters (as in panel d) pooled across different loading concentrations (500 droplets per platform).
a, Droplet overloading boosts the percentage of droplets filled with nuclei for the scATAC v1.1 Next GEM microfluidic chip. b, Droplet overloading on the scATAC v1.1 Next GEM chip increases the average number of nuclei per droplet in a controlled fashion, while maintaining the desired Poisson-like loading distribution. c, Expected collision rates on the Next GEM chip as a function of the loaded number of cells or nuclei per channel for standard droplet-based scRNA-seq and for scifi-RNA-seq with different numbers of round1 barcodes. The cell/nuclei fill rate was modeled as a zero-inflated Poisson distribution. d-f, Modeling of the microfluidic device loading using alternative distributions (Negative Binomial, Poisson, Zero Inflated Negative Binomial, Zero Inflated Poisson). The number of loaded nuclei is plotted against the number of nuclei per droplet on a linear scale (panel d), logarithmic scale (panel e) and as point estimates (panel f). g, Statistical properties of the distribution of nuclei per droplet across experiments. The relationship between mean and variance that is expected for a Poisson distribution is indicated by gray lines. h, Computational modeling of droplet loading as a zero-inflated Poisson function. i, Posterior probability distributions of lambda and psi sampled using a Markov Chain Monte Carlo (MCMC) analysis. j, Independent estimation of the cell doublet rates using Monte Carlo simulations. Error bars in panels d, e, h, and j indicate three standard deviations around the mean.
a, Schematic outline of scifi-RNA-seq including detailed oligonucleotide sequences. The reverse transcription is performed inside permeabilized cells or nuclei on a 96-well or 384-well plate, introducing well-specific round1 barcodes into the whole transcriptome. Pre-indexed cells or nuclei are pooled and encapsulated into emulsion droplets using a standard microfluidic droplet generator (10x Genomics Chromium). The round2 barcodes are introduced by thermocycling ligation with a complementary bridge oligo and thermostable ligase. The droplet emulsion is then broken, and a second defined end is introduced into the library via template switching. cDNA is enriched and tagmented with a custom i7-only transposome. Finally, the library is PCR-enriched, with the option to introduce an additional sample index. The read structure for next-generation sequencing on the Illumina NovaSeq 6000 and NextSeq 500 platforms is shown. b, Nuclei recovery after pre-indexing of the whole transcriptome by reverse transcription. scifi-RNA-seq achieves high recovery rates for both cell lines and primary material. c, Nuclei with pre-indexed transcriptome, prior to microfluidic device loading, visualized under a microscope in a counting chamber. The selected image (representative of two replicate samples) shows nuclei derived from human primary T cells. d, Typical size distribution of enriched cDNA obtained with scifi-RNA-seq. e, Typical size distribution of final scifi-RNA-seq libraries ready for next-generation sequencing. f, Distribution of DNA bases along scifi-RNA-seq sequencing reads, showing the characteristic sequence patterns of the UMI, round1 barcode, sample barcode, round2 barcode, and transcript. g, Heatmap showing sequencing quality (Qscore) for each sequencing cycle.
Extended Data Fig. 4 scifi-RNA-seq yields high-quality data for whole cells, fresh nuclei and fixed nuclei.
a, Performance metrics for scifi-RNA-seq experiments using a mixture of human Jurkat cells and mouse 3T3 cells, starting from whole cells permeabilized by methanol, freshly isolated nuclei, and nuclei fixed with 1% or 4% formaldehyde (cryopreserved, re-hydrated, and permeabilized). The following plots are shown: (i) ranked barcodes plotted against reads, unique molecular identifiers (UMIs), and detected genes, distinguishing single-cell transcriptomes from background noise; (ii) reads plotted against UMIs; (iii) reads plotted against the number of detected genes; (iv) reads plotted against the fraction of unique reads; (v) species mixing plot showing the number of UMIs per cell aligning to the mouse genome (x-axis) versus the human genome (y-axis). To facilitate comparisons between the different types of input material, the axes of the performance plots use the same scale across conditions. b, In a species mixing experiment with pre-indexed nuclei from human (Jurkat) and mouse (3T3) cells run at the maximum loading concentration of the standard Chromium protocol (15,300 nuclei per channel), the microfluidic round2 barcode (left plot) is sufficient to resolve single cells. Nevertheless, the combination of round1 and round2 barcodes still improves the separation (right plot). c, Coverage along human and mouse transcripts from 200 bp upstream of the transcription start site (TSS) to 200 bp downstream of the transcription end site (TES), shown for whole cells permeabilized by methanol, freshly isolated nuclei, and nuclei fixed with 1% or 4% formaldehyde (cryopreserved, re-hydrated, and permeabilized). Freshly isolated nuclei show the strongest 3’ enrichment. d, Box plots summarizing sequence alignment metrics across the different types of input material: Total reads sequenced, percent uniquely mapped reads, percent multi-mappers, percent alignments to exons plus introns, percent alignments to exons, and percent spliced reads. Freshly isolated nuclei showed the best performance for these alignment metrics. The box plots summarize a total of 2,299 whole cells; 2,000 fresh nuclei; 2,051 nuclei fixed with 1% formaldehyde and 1,896 nuclei fixed with 4% formaldehyde. Box plots depict the interquartile range with marked median and whiskers extending to 1.5 times the interquartile range.
a, ‘Knee plot’ showing the number of UMIs (y-axis) per barcode ranked by frequency (x-axis) for scifi-RNA-seq on the Chromium scATAC v1.0 chip versus the scATAC v1.1 Next GEM chip. The characteristic inflection points are indicated, which separate cells/nuclei (left, colored lines) from background noise (right, grey lines). b, Reads per cell plotted against UMIs per cell to assess the level of sequencing saturation for the two microfluidic chips. c, Reads per cell plotted against the unique read fraction per cell to assess PCR duplication and library complexity for the two microfluidic chips. d, Alignments to the human genome versus alignments to the mouse genome in the species mixing experiment to assess the frequency of cell doublets for the two microfluidic chips. e, Alignment metrics for the two microfluidic chips. f, ‘Knee plot’ for the comparison of two reverse transcriptase enzymes (Maxima H Minus versus Superscript IV) in the reverse transcription step of scifi-RNA-seq (the template switching was performed with Maxima H Minus reverse transcriptase in both cases). g, Reads per cell plotted against UMIs per cell to assess the level of sequencing saturation for the two reverse transcriptases. h, Reads per cell plotted against the unique read fraction per cell to assess PCR duplication and library complexity for the two reverse transcriptases. i, Alignment metrics for the two reverse transcriptases.
Extended Data Fig. 6 Comparison of scifi-RNA-seq, droplet-based scRNA-seq and multiround combinatorial indexing in terms of library complexity, read duplication and barcode sequencing efficiency.
a, ‘Knee plot’ for the comparison of scifi-RNA-seq (this study), droplet-based scRNA-seq using the Chromium system (this study), and multi-round combinatorial indexing (published data) on mouse 3T3 cells. b, Box plot showing UMI counts for each assay. Box plots in this figure summarize a total number of single cell profiles n = 2,994 (Chromium: Intact cells); 2,878 (Chromium: MeOH-fixed cells); 3,523 (Chromium: Nuclei); 4,305 (scifi-RNA-seq: MeOH-fixed cells); 4,945 (scifi-RNA-seq: Nuclei); 3,443 (SPLiT-seq GSM3017262); 526 (SPLiT-seq GSM3017263); 8,874 (sci-RNA-seq GSM2599699); 1,944 (sci-Plex GSM4150376). Box plots depict the interquartile range with marked median and whiskers extending to 1.5 times the interquartile range. c, Reads per cell plotted against UMIs per cell to assess the level of sequencing saturation for each assay. d, Box plot showing the UMIs per read ratio. e, Reads per cell plotted against the unique read fraction per cell to assess PCR duplication and library complexity for each assay. f, Box plot showing the unique read fraction for each assay. g, Barcoding combinations in the largest published experiment against the total number of sequencing cycles used in that experiment. The grey line shows the total number of 138 sequencing cycles (including index cycles) available in the NovaSeq 100-cycle kits. h, Sequencing cycles used for reading the composite cell barcode (excluding the UMI). Uninformative sequencing cycles from ligation overhangs, primer binding sites, and transposase mosaic ends are depicted in gray, and the fraction of uninformative sequencing cycles is shown as a percentage value.
Extended Data Fig. 7 Comparison of scifi-RNA-seq, droplet-based scRNA-seq and multi-round combinatorial indexing in terms of cell doublet rates.
a, Alignments to the mouse genome versus alignments to the human genome for each species mixing experiment, assessing the frequency of cell doublets for the different methods. Data are shown on a linear scale, normalized to UMIs per million to allow the use of common thresholds across experiments. Cells were classified as doublets if (i) there was less than twice as many UMIs aligning to one species’ genome than to the other species’ genome, or (ii) more than 75 UMIs per million were detected in both species. Doublet cells are highlighted in red. The grey line indicates x=y after accounting for species bias. The green lines indicate the threshold value of 75 UMIs per million. b, Same visualization as in panel a, but plotted on a logarithmic scale. c, Percentage of cell doublets for each method, corresponding to the red cells in panels a and b. d, Single-cell purity plotted for doublet cells only. Purity was calculated as the cell’s number of UMIs for the dominant species divided by its total number of UMIs.
Extended Data Fig. 8 Single-cell embeddings and transcriptome comparisons of scifi-RNA-seq and droplet-based scRNA-seq.
An equal mixture of four human cell lines (HEK293T, Jurkat, K562, NALM-6) was processed in parallel with scifi-RNA-seq and with the Chromium 3’ v3.1 Single Cell Gene Expression kit. a, Single-cell transcriptomes displayed in a two-dimensional UMAP projection, with cluster IDs identified by the Leiden algorithm mapped on top. Enrichment of cell line signatures obtained from the ARCHS4 database for the identified Leiden clusters. These results can be used to assign the respective cell line for each cluster, and to identify spurious clusters of doublet cells. b, Joint embeddings combining data across methods (scifi-RNA-seq, standard droplet-based scRNA-seq) and sample preparation methods (intact cells, nuclei, methanol-fixed cells), using dimensionality reduction by principal component analysis (PCA), uniform manifold approximation and projection (UMAP), diffusion maps, t-distributed stochastic neighbor embedding (t-SNE), and the ForceAtlas2 algorithm. Individual cells are colored by cell line (top panel) or sample preparation method (bottom panel). The grouping by cell line (rather than by assay or sample preparation method) was observed without batch effect correction. c, The separation of cells in the latent spaces was quantified using the silhouette score. d, Overlap in the top-100 differential genes between cell lines. e, Correlation matrices of log fold changes, p-values, and test statistics across assays and sample preparation methods.
Extended Data Fig. 9 Large-scale scifi-RNA-seq profiling for a mixture of four human cell lines and for primary human T cells with and without TCR stimulation.
a, Gene expression levels obtained with scifi-RNA-seq for a mixture of four human cell lines. The expression levels of 72 cell-line specific genes were mapped on top of the UMAP projection from Fig. 2j. b, UMAP projections for 62,558 single-cell transcriptomes of human primary T cells, with additional variables mapped on top of the projections: Donor ID, logarithm of UMIs per cell, logarithm of detected genes per cell, percent unique reads per cell, percent mitochondrial expression, and percent ribosomal expression. c, UMAP projection for the single-cell transcriptomes, with T cell receptor stimulation status mapped on top of the projection. d, Expression levels of four genes induced by TCR stimulation mapped on top of the UMAP projection. e, UMAP projection for the single-cell transcriptomes, with single cells colored according to the clusters assigned by graph-based clustering using the Leiden algorithm. f, Gene set enrichment analysis for the differentially expressed genes in each cluster.
Extended Data Fig. 10 Arrayed CRISPR screen for TCR activation with multiplexed scifi-RNA-seq readout.
a, TCR activation signature as defined in Fig. 3c, mapped on top of a schematic of cell signaling in TCR pathway activation. b, TCR activation score derived from the transcriptome data plotted against a proliferation score derived from the cell counts. Key regulators of the TCR pathway are highlighted.
Oligonucleotide sequences for scifi-RNA-seq.
Sequencing metrics for scifi-RNA-seq on the NovaSeq 6000 platform.
Overview of single-cell RNA-seq datasets in this study.
Detailed step-by-step description of the scifi-RNA-seq method.
About this article
Cite this article
Datlinger, P., Rendeiro, A.F., Boenke, T. et al. Ultra-high-throughput single-cell RNA sequencing and perturbation screening with combinatorial fluidic indexing. Nat Methods 18, 635–642 (2021). https://doi.org/10.1038/s41592-021-01153-z