Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A test metric for assessing single-cell RNA-seq batch correction


Single-cell transcriptomics is a versatile tool for exploring heterogeneous cell populations, but as with all genomics experiments, batch effects can hamper data integration and interpretation. The success of batch-effect correction is often evaluated by visual inspection of low-dimensional embeddings, which are inherently imprecise. Here we present a user-friendly, robust and sensitive k-nearest-neighbor batch-effect test (kBET; for quantification of batch effects. We used kBET to assess commonly used batch-regression and normalization approaches, and to quantify the extent to which they remove batch effects while preserving biological variability. We also demonstrate the application of kBET to data from peripheral blood mononuclear cells (PBMCs) from healthy donors to distinguish cell-type-specific inter-individual variability from changes in relative proportions of cell populations. This has important implications for future data-integration efforts, central to projects such as the Human Cell Atlas.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Batch types and the concept of kBET.
Fig. 2: kBET is more responsive than other batch tests on simulated data.
Fig. 3: ComBat provides the best correction on mESC inDrop technical replicates.
Fig. 4: kBET assesses data-integration quality and inter-individual variability.

Data availability

We applied the batch estimates to several scRNA-seq datasets. In the inDrop publication, the droplet-based sequencing was demonstrated on mESCs growing on LIF+ medium and two additional technical replicates12. In our analysis, we used two replicates that consisted of 5,952 cells from two batches and 11,308 genes with at least 2 cells having more than 4 unique molecular identifier (UMI) reads per cell. Data were downloaded as UMI-filtered read count matrices from accession GSE65525.

Kolodziejczyk et al.14 explored heterogeneity in mESCs cultured with three different media (2i, a2i and LIF+) on full-length sequenced transcripts (Smart-seq). The three conditions included 219, 123 and 207 cells in 4, 2 and 3 batches, respectively. The mESC data sequenced with full-length Smart-seq14 were downloaded from ENA (project ID PRJEB6455) as FASTQ files and mapped to an Ensembl52 mouse transcriptome (GRCm38.p5.87, equivalent to UCSC mm10) with Salmon24. Cells were quality-controlled according to data derived from the Espresso database (

Further, scRNA-seq has been widely applied in explorations of mouse embryonic development. To test the performance of batch correction for data integration, we collected single-cell RNA-seq data of mouse early embryonic development from eight different studies16,17,18,19,20,21,22,23, consisting of 56, 49, 124, 65, 15, 294, 17 and 15 cells, respectively. The early embryonic development data used have the following accession IDs: E-GEOD-57249, E-GEOD-70605, E-MTAB-3321, GSE53386, E-MTAB-2958, E-GEOD-45719, E-GEOD-44183 and E-GEOD-66582. All studies applied Smart-seq-based protocols for scRNA-seq. All FASTQ files were mapped to an Ensembl52 mouse transcriptome (version GRCm38.p5.87) with Salmon24 (version 0.8.2; k-mer = 21 to tolerate different read lengths). Here we considered the studies as batches while omitting the flowcell batches. We continued our analysis without further gene filtering or quality control.

Kang et al.26 studied genetic variation among PBMCs from eight individuals as a replacement for cell barcoding in droplet-based sequencing (10x Genomics). From that study, we used three experimental runs: 3,514 and 4,106 cells from four healthy donors each, and 5,832 cells from these eight healthy donors. Human PBMC data26 can be provided by the authors upon request. Count matrices are available under accession number GSE96583.


  1. 1.

    Tung, P.-Y. et al. Batch effects and the effective design of single-cell gene expression studies. Sci. Rep. 7, 39921 (2017).

    CAS  Article  Google Scholar 

  2. 2.

    Lun, A. T. L., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res. 5, 2122 (2016).

    PubMed  PubMed Central  Google Scholar 

  3. 3.

    Heimberg, G., Bhatnagar, R., El-Samad, H. & Thomson, M. Low dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing. Cell Syst. 2, 239–250 (2016).

    CAS  Article  Google Scholar 

  4. 4.

    Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).

    Article  Google Scholar 

  5. 5.

    Hicks, S. C., Townes, F. W., Teng, M. & Irizarry, R. A. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19, 562–578 (2018).

    Article  Google Scholar 

  6. 6.

    Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).

    Article  Google Scholar 

  7. 7.

    Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    CAS  Article  Google Scholar 

  8. 8.

    Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).

    CAS  Article  Google Scholar 

  9. 9.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Article  Google Scholar 

  10. 10.

    Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

    Article  Google Scholar 

  11. 11.

    Cressie, N. & Timothy, R. C. Pearson’s χ2 and the loglikelihood ratio statistic G2: a comparative review. Int. Stat. Rev. 57, 19–43 (1989).

    Article  Google Scholar 

  12. 12.

    Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).

    CAS  Article  Google Scholar 

  13. 13.

    Brennecke, P. et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 1093–1095 (2013).

    CAS  Article  Google Scholar 

  14. 14.

    Kolodziejczyk, A. A. et al. Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell 17, 471–485 (2015).

    CAS  Article  Google Scholar 

  15. 15.

    Angerer, P. et al. Single cells make big data: new challenges and opportunities in transcriptomics. Curr. Opin. Syst. Biol. 4, 85–91 (2017).

    Article  Google Scholar 

  16. 16.

    Biase, F. H., Cao, X. & Zhong, S. Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing. Genome Res. 24, 1787–1796 (2014).

    CAS  Article  Google Scholar 

  17. 17.

    Liu, W. et al. Identification of key factors conquering developmental arrest of somatic cell cloned embryos by combining embryo biopsy and single-cell sequencing. Cell Discov. 2, 16010 (2016).

    CAS  Article  Google Scholar 

  18. 18.

    Goolam, M. et al. Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos. Cell 165, 61–74 (2016).

    CAS  Article  Google Scholar 

  19. 19.

    Fan, X. et al. Single-cell RNA-seq transcriptome analysis of linear and circular RNAs in mouse preimplantation embryos. Genome Biol. 16, 148 (2015).

    Article  Google Scholar 

  20. 20.

    Boroviak, T. et al. Lineage-specific profiling delineates the emergence and progression of naive pluripotency in mammalian embryogenesis. Dev. Cell 35, 366–382 (2015).

    CAS  Article  Google Scholar 

  21. 21.

    Deng, Q., Ramsköld, D., Reinius, B. & Sandberg, R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343, 193–196 (2014).

    CAS  Article  Google Scholar 

  22. 22.

    Xue, Z. et al. Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature 500, 593–597 (2013).

    CAS  Article  Google Scholar 

  23. 23.

    Wu, J. et al. The landscape of accessible chromatin in mammalian preimplantation embryos. Nature 534, 652–657 (2016).

    CAS  Article  Google Scholar 

  24. 24.

    Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).

    CAS  Article  Google Scholar 

  25. 25.

    Teng, M. et al. A benchmark for RNA-seq quantification pipelines. Genome Biol. 17, 74 (2016).

    Article  Google Scholar 

  26. 26.

    Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).

    CAS  Article  Google Scholar 

  27. 27.

    Liu, Q. et al. Quantitative assessment of cell population diversity in single-cell landscapes. bioRxiv Preprint at (2018).

  28. 28.

    Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).

    CAS  Article  Google Scholar 

  29. 29.

    Bacher, R. et al. SCnorm: robust normalization of single-cell RNA-seq data. Nat. Methods 14, 584–586 (2017).

    CAS  Article  Google Scholar 

  30. 30.

    Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S. & Vert, J.-P. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 9, 284 (2018).

    Article  Google Scholar 

  31. 31.

    Buettner, F., Pratanwanich, N., McCarthy, D. J., Marioni, J. C. & Stegle, O. f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq. Genome Biol. 18, 212 (2017).

    Article  Google Scholar 

  32. 32.

    Li, W. V. & Li, J. J. An accurate and robust imputation method scImpute for single-cell RNA-seq data. Nat. Commun. 9, 997 (2018).

    Article  Google Scholar 

  33. 33.

    Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S. & Theis, F. J. Single cell RNA-seq denoising using a deep count autoencoder. bioRxiv Preprint at (2018).

  34. 34.

    Huang, M. et al. SAVER: gene expression recovery for single-cell RNA sequencing. Nat. Methods 15, 539–542 (2018).

    CAS  Article  Google Scholar 

  35. 35.

    van Dijk, D. et al. Recovering gene interactions from single-cell data using data diffusion. Cell 174, 716–729 (2018).

    Article  Google Scholar 

  36. 36.

    Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    Article  Google Scholar 

  37. 37.

    Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods: towards more accurate and robust tools. bioRxiv Preprint at (2018).

  38. 38.

    Bhaduri, A., Nowakowski, T. J., Pollen, A. A. & Kriegstein, A. R. Saturating single-cell datasets. bioRxiv Preprint at (2017).

  39. 39.

    Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).

    Article  Google Scholar 

  40. 40.

    Tabula Muris Consortium. Single-cell transcriptomic characterization of 20 organs and tissues from individual mice creates a Tabula Muris. bioRxiv Preprint at (2018).

  41. 41.

    McCarthy, D. J., Campbell, K. R., Lun, A. T. L. & Wills, Q. F. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33, 1179–1186 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Baik, J. & Silverstein, J. W. Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97, 1382–1408 (2006).

    Article  Google Scholar 

  43. 43.

    Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).

    Article  Google Scholar 

  44. 44.

    Andrews, T. S. & Hemberg, M. Dropout-based feature selection for scRNASeq. bioRxiv Preprint at (2018).

  45. 45.

    Vallejos, C. A., Risso, D., Scialdone, A., Dudoit, S. & Marioni, J. C. Normalizing single-cell RNA sequencing data: challenges and opportunities. Nat. Methods 14, 565–571 (2017).

    CAS  Article  Google Scholar 

  46. 46.

    Lun, A. T. L., Bach, K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).

    Article  Google Scholar 

  47. 47.

    Paulson, J. N. et al. Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data. BMC Bioinformatics 18, 437 (2017).

    Article  Google Scholar 

  48. 48.

    Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).

    CAS  Article  Google Scholar 

  49. 49.

    Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

    CAS  Article  Google Scholar 

  50. 50.

    Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

    CAS  Article  Google Scholar 

  51. 51.

    Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).

    Article  Google Scholar 

  52. 52.

    Aken, B. L. et al. Ensembl 2017. Nucleic Acids Res. 45, D635–D642 (2017).

    CAS  Article  Google Scholar 

Download references


We thank A. Böttcher for motivating this study, and T. Illicic for carrying out pilot analyses. We thank in particular M. Subramaniam and J. Ye (UCSF) for the PBMC data. We are grateful to the members of the Teichmann and Theis labs for valuable discussions and comments on the manuscript. M.B. is supported by a DFG Fellowship through the Graduate School of Quantitative Biosciences Munich (QBM). Z.M. is supported by a Single Cell Gene Expression Atlas grant from the Wellcome Trust (nr. 108437/Z/15/Z). F.A.W. acknowledges support by the Helmholtz Postdoc Programme, Initiative and Networking Fund of the Helmholtz Association. F.J.T. acknowledges financial support by the German Science Foundation (SFB 1243 and Graduate School QBM) and by the Bavarian government (BioSysNet). This collaboration was supported by a Helmholtz International Fellow Award to S.A.T.

Author information




M.B. developed, tested and validated the method; prepared and analyzed the data; and wrote the paper. Z.M. prepared and analyzed the data and wrote the paper. F.A.W. assisted with method development and manuscript writing. S.A.T. and F.J.T. oversaw the research, designed the method validation and wrote the paper.

Corresponding authors

Correspondence to Sarah A. Teichmann or Fabian J. Theis.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Assessing neighborhood size effect with different flavors of kBET.

Neighborhood size effect for two simulated datasets (1,000 genes, 500 cells, 2 batches of equal size). Dashed vertical line shows the optimal neighborhood size for batch effect detection, that is, where the rejection rate is maximal. Shaded areas represent the 95th percentile of 100 repeated kBET runs. In each run, the number of tested neighborhoods is 10% of the sample size. (a) For 1% of genes, the mean expression levels are varied across the batches. The observed rejection rate is low overall, and decreases with increasing neighborhood size. (b) For 20% of genes, the mean expression levels are varied across the batches. The observed rejection rate is almost 100% and decreases for neighborhood sizes larger than 75%. The vertical dashed line marks the optimal neighborhood size. All flavors of kBET return almost identical results. The exact test has slightly lower rejection rates than Pearson’s 2-test for small neighborhoods (<10% sample size).

Supplementary Figure 2 Simulation of different dropout effects in single-cell RNA-seq data.

Dropout is controlled by the shape parameter k, where batch 1 has k1 = –1 and k2 in batch 2 ranges from –0.9 to –3. Batches are always equally sized; sample sizes refer to the total number of samples in the dataset. (a) kBET rejection rates, variance explained and silhouette coefficient for each simulation (from top to bottom). (b) PCA plots for a large difference (left, Δk = 2) and a small difference (right, Δk = 0.04). (c-d) Mean relation (top), dropout relation (center), and cellular detection rate (CDR) effect (bottom) for a large difference (c) and a small difference (d). Blue lines indicate the linear fit with parameters depicted in the top left corner. R2 values denote the variance explained by the fit.

Supplementary Figure 3 Simulation of equally sized batches with one having additional noise.

Additional noise is a factor multiplied on final gene expression means and is drawn from a log normal distribution LN, with batch factor and batch scale .We simulated 1,000 samples and drew subsets from this dataset. Batches were always equally sized. (a-b) Batch-effect analysis for several batch factors (a) and batch scales (b) using kBET, PC regression (variance explained by batch effect) and silhouette coefficients (from top to bottom). kBET rejection rates were computed for both the original data space and the default 50-dimensional PC space (top two plots). (c-d) Mean relation (top), dropout relation (center), PCA plot for batch factor = 0.1 (c) and a batch scale = 0.5 (d). Blue lines indicate the linear fit with parameters depicted in the top left corner. R2 values denote the variance explained by the fit and correspond to high correlation of mean and dropout of both batches.

Supplementary Figure 4 Highly variable genes and kBET results after batch regression (Klein et al.).

(a) Number of retained highly variable genes before and after batch correction. Reference: intersect of highly variable genes per batch with log(counts + 1) normalization. (b) Total number of highly variable genes after batch correction. (c) False positive rates on highly variable genes for all combinations of normalization and batch-correction methods. (d) Comparison of silhouette coefficient and kBET mean ‘acceptance rate’ (1 – rejection rate) from 100 kBET runs. (e) Comparison of PC regression and kBET mean ‘acceptance rate’ (1 – rejection rate) from 100 kBET runs.

Supplementary Figure 5 Deeply sequenced SMART-seq2/C1 mESC data have similar characteristics for batch correction (Kolodziejczyk et al.).

(a) Illustration of two full-length read datasets with replicates in 2i, LIF and a2i culture (219, 207 and 123 cells, respectively). (b) PCA plots for log(CPM + 1) ComBat-corrected data. (c) Percentage of retained highly variable genes versus kBET acceptance rate (equals 1 – rejection rate) for all combinations of normalization and batch-correction approaches. Best-performing normalization-regression strategies cluster in the top right corner, such as ComBat on log(CPM + 1) data. Isolated cells do not have mutual nearest neighbors and appear in some correction models. Seurat’s CCA alignment batch-corrects data only in a latent space as done in manifold learning, and we therefore could not compute highly variable genes and show only kBET values.

Supplementary Figure 6 Sequencing depth in mouse early development data varies by study rather than cell type.

Sequencing depth (library size) per developmental stage (shape) in eight different studies (color-coded) of mouse embryonic development.

Supplementary Figure 7 kBET detects inter-individual variability in PBMC data (Kang et al.).

PBMC data from eight unrelated individuals processed in three experiments (batches) (Chromium 10X Genomics device), with donor cell identity assigned with demuxlet. Note that with pooling of cells from multiple donors, between-donor processing batch effects are effectively excluded. We applied kBET as a sensitive measure of inter-individual variability. (a) t-SNE plot of all data. Cell types are annotated as in the original publication. (b) Cell-type frequencies per individual and batch. Cells were prepared once and processed in two separate runs. Inter-individual variability is stronger than preparation bias. (c) kBET acceptance rates (1 – rejection rate) for several subsets of the complete dataset. Subsample sizes were chosen from 10% to 100% of the data sample size. Subsampling was repeated threefold and kBET rejection rates were averaged across these replicates to reduce bias from subsampling. With decreasing sample size, we find decreasing rejection rates. This result is due to decreasing certainty for each tested neighborhood as it leads to enhanced failure to reject the null hypothesis.

Supplementary Figure 8 Normalization affects the CDR–library size relation in SMART-seq data.

(a-d) Comparison of cellular detection rate (CDR) effect and library size for inDrop UMI data (Klein et al.) (a-b) and C1 SMART-seq data (Kolodziejczyk et al.) (c-d). (a,c) CDR for count data. (b,d) CDR for CPM-normalized data where any gene was counted as detected if CPM > 1. Blue line shows a linear fit of CDR (y) to library size (x) and batch variable (z) and R2 denotes the variance explained by the model. Gray shaded areas indicate s.e. of the fit. In UMI data, the library size explains largely the CDR effect independent of normalization (a,b). In SMART-seq data, we observe a significant contribution of library size to CDR effect (c) that is lost in CPM-normalized data (d).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–8 and Supplementary Notes 1–7

Reporting Summary

Supplementary Software

k-nearest-neighbor batch-effect test (kBET) is available as an R package

Supplementary Table 1

Overview of batch-regression and normalization approaches

Supplementary Table 2

Top 20 best-performing batch-correction strategies

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Büttner, M., Miao, Z., Wolf, F.A. et al. A test metric for assessing single-cell RNA-seq batch correction. Nat Methods 16, 43–49 (2019).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing