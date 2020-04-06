Abstract
The scale and capabilities of single-cell RNA-sequencing methods have expanded rapidly in recent years, enabling major discoveries and large-scale cell mapping efforts. However, these methods have not been systematically and comprehensively benchmarked. Here, we directly compare seven methods for single-cell and/or single-nucleus profiling—selecting representative methods based on their usage and our expertise and resources to prepare libraries—including two low-throughput and five high-throughput methods. We tested the methods on three types of samples: cell lines, peripheral blood mononuclear cells and brain tissue, generating 36 libraries in six separate experiments in a single center. To directly compare the methods and avoid processing differences introduced by the existing pipelines, we developed scumi, a flexible computational pipeline that can be used with any single-cell RNA-sequencing method. We evaluated the methods for both basic performance, such as the structure and alignment of reads, sensitivity and extent of multiplets, and for their ability to recover known biological information in the samples.
Data availability
RNA-seq data generated in this project are available from the Gene Expression Omnibus with accession number GSE132044 and the Single Cell Portal (https://portals.broadinstitute.org/single_cell). Source data for Figs. 2–6 are presented with the paper.
Code availability
The scumi Python package is available freely from bitbucket repository at https://bitbucket.org/jerry00/scumi-dev/src/master/ and as Supplementary code. The R scripts (used to assign cell types to clusters based on a set of marker genes, for parameter selecting for clustering analysis and for filtering low-quality cells) are available from bitbucket repository at https://bitbucket.org/jerry00/scumi-dev/src/master/.
Acknowledgements
We especially thank M. Chatterjee, A. Ratner and S. Boswell of the Single Cell Core at Harvard Medical School for performing the inDrops experiments. We are grateful to A. Neumann, J. Lee, D. Dionne and N. Sharif for assistance with project coordination; A. Klein for helpful discussions and suggestions; R. Kirchner for advice on inDrops data analysis; D. Leib for advice on CEL-Seq2 data analysis; B. Li for advice on PBMC data analysis; K. Shekhar for precision analysis in cell line mixture data; M. Cuoco for sample transportation; Broad Flow Cytometry Facility for cell sorting; Broad Genomics Platform for sequencing; and L. Gaffney for assistance with figures. Work was supported by the Klarman Cell Observatory, the Manton Foundation and the BRAIN Initiative (grant no. 1U19 MH114821, A.R.). A.R. is an Investigator of the Howard Hughes Medical Institute. This publication is part of the Human Cell Atlas at www.humancellatlas.org/publications.
Ethics declarations
Competing interests
A.R. is a founder and equity holder in Celsius Therapeutics; an equity holder in Immunitas; and an SAB member of Syros Pharmaceuticals, Neogene Therapeutics and Thermo Fisher Scientific. A.K.S. is a founder of, and consultant for, Honeycomb Biotechnologies, Inc., which manufactures Seq-Well peripherals. A.K.S. and A.R. are also named inventors on patents filed by the Broad Institute related to either Drop-seq (A.R. and A.K.S.), DroNc-seq (A.R.) or Seq-Well (A.K.S). The interests of A.K.S. and A.R. were reviewed and are subject to a management plan overseen by their institutions in accordance with their conflict of interest policies. The other authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Description of scRNA-seq methods evaluated.
Extended Data Fig. 2 Flowchart detailing computational analysis.
a, scumi workflow, b, removing low quality cell barcodes, c, profiling samples, d, bulk data workflow.
Extended Data Fig. 3 Characterization of genome alignments for sequence reads.
a, Mixture, b, PBMCs, c, Cortex. For each pair of bar graphs, experiment 1 is on the left and experiment 2 is on the right. For Smart-seq2, there were no poly(T) reads due to the full transcriptome coverage and the library construction using transposase-based Nextera reagents to attach adapters to both ends of cDNA fragments. Reads were assigned in the following order: no poly(T), unmapped or multi-mapped, ambiguous (mapping to a single location that overlaps 2 or more genes), and then one of the remaining categories. Reads were assigned as antisense only for the cortex datasets (c). % of reads may not sum to 100 due to rounding and numbers not shown for fraction of reads in categories with <2%.
Extended Data Fig. 4 Impact of number of cells on sensitivity.
a, human cells and mouse cells from Mixture experiments. Multiplet cells are not shown in this plot. b, PBMC. c, cortex. The number of cells (x-axis) with a given mean number of genes detected (y-axis), when cells are ordered from highest (left) to lowest (right) total number of genes. The right most point at the end of each curve shows the average number of detected genes for the final selected number of cells in this study.
Extended Data Fig. 5 Impact of sequencing depth on gene and UMI detection per cell in the PBMC datasets.
a–d, The median number of genes (a, b, y-axis) and UMIs (c, d, y-axis) detected per cell at different sequencing depths (x-axis) for low-throughput (a, c) and high-throughput (b, d) methods from PBMC1 (left) and PBMC2 (right). Far right point of each curve: median number of detected genes per cell at full sequencing depth. e, Relation between median number of genes and UMIs per cell in PBMC1 (left) and PBMC2 (right). (n = 1 biologically independent sample for each curve in each plot).
Extended Data Fig. 6 Fraction of reads from each species in Mixture experiments.
Fraction of UMIs (or reads for Smart-seq2) aligned to either mouse (y-axis) or human (x-axis) in each cell from the Mixture1 a, and Mixture2 b, experiments (n = 1 biologically independent sample per panel). Each dot represents a cell. Dashed line and number: robust linear regression fitted line and its slope. Number of genes detected from the “wrong” species is higher in cells with more reads.
Extended Data Fig. 7 Technical precision plots for mixture experiments.
Distributions of the extra Poisson coefficients of variation (“Extra Poisson CV”, y-axis) from each method (x-axis). a, b, Human cells, c, d, mouse cells – from Mixture1 (left) and Mixture2 (right) (n = 1 biologically independent sample per panel). Violin and box plot elements are defined as in Figure 2.
Extended Data Fig. 8 Cell type analysis for each PBMC dataset.
t-SNEs of single cell profiles (dots) from each method colored by cell type assignment from PBMC1 a, and PBMC2 b,. (n = 1 biologically independent sample per panel).
Extended Data Fig. 9 Cell type analysis for Cortex2.
t-SNEs of single nucleus profiles (dots) from each method colored by cell type assignment from Cortex2 (n = 1 biologically independent sample).
Extended Data Fig. 10 Cell type analysis of the combined PBMC datasets.
a, t-SNE plot generated with Harmony clustering all PBMC cells in this study (n = 2 biologically independent samples). b, All libraries contain cells of every cell type, according to this joint annotation. This differs from the individual level clustering results, in which many libraries are missing particular cell types (n = 2 biologically independent samples). c, PBMC1 and d, PBMC2. For each annotated cell type and method in the jointly clustered dataset (y-axis), we calculated the percentage of cells from that cell type that come from each cell type in the individual level clustering results (x-axis). This is denoted by the color of the corresponding boxes (n = 1 biologically independent sample for (c) and (d)).
