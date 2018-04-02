Abstract
Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.
Acknowledgements
We are grateful to F.K. Hamey, J.P. Munro, J. Griffiths and M. Büttner for helpful discussions. L.H. was supported by Wellcome Trust Grant 108437/Z/15 to J.C.M. A.T.L.L. was supported by core funding from CRUK (award number 17197 to J.C.M.). M.D.M. was supported by Wellcome Trust Grant 105045/Z/14/Z to J.C.M. J.C.M. was supported by core funding from EMBL and from CRUK (award number 17197).
Integrated supplementary information
Supplementary figures
- 1.
MNN corrects nonconstant batch effects.
- 2.
Simulation of batch effect in two batches with identical cell-type composition.
- 3.
Analysis of the hematopoietic data by using all 3,904 highly variable genes.
- 4.
Analysis of the hematopoietic data by using 1,500 genes randomly subsampled from the highly variable gene set.
- 5.
Analysis of the pancreas data by using all 2,507 highly variable genes.
- 6.
Analysis of pancreas data on 1,500 genes randomly subsampled from the highly variable gene set.
- 7.
Locally varying batch correction versus global (i.e., constant vector) batch correction.
Supplementary information
PDF files
- 1.
Supplementary Text and Figures
Supplementary Figures 1–7, Supplementary Notes 1–5 and Supplementary Table 1
- 2.
Reporting Summary
