Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Scaling up reproducible research for single-cell transcriptomics using MetaNeighbor


Single-cell RNA-sequencing data have significantly advanced the characterization of cell-type diversity and composition. However, cell-type definitions vary across data and analysis pipelines, raising concerns about cell-type validity and generalizability. With MetaNeighbor, we proposed an efficient and robust quantification of cell-type replicability that preserves dataset independence and is highly scalable compared to dataset integration. In this protocol, we show how MetaNeighbor can be used to characterize cell-type replicability by following a simple three-step procedure: gene filtering, neighbor voting and visualization. We show how these steps can be tailored to quantify cell-type replicability, determine gene sets that contribute to cell-type identity and pretrain a model on a reference taxonomy to rapidly assess newly generated data. The protocol is based on an open-source R package available from Bioconductor and GitHub, requires basic familiarity with Rstudio or the R command line and can typically be run in <5 min for millions of cells.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: MetaNeighbor quantifies and characterizes cell-type replicability.
Fig. 2: Cell types from four pancreas datasets cluster according to their biological similarity.
Fig. 3: Restricting the four pancreas datasets to endocrine subtypes allows for a more stringent replicability assessment.
Fig. 4: 1-vs-best AUROCs automatically identify each cell type’s closest outgroup.
Fig. 5: Replicating cell types can be extracted as meta-clusters.
Fig. 6: Assessment of cell-type annotations from the mouse primary visual cortex against reference neuron taxonomy from the primary motor cortex (medium resolution).
Fig. 7: Assessment of inhibitory cell types from the mouse primary visual cortex against reference inhibitory cell types (high resolution).
Fig. 8: 1-vs-best AUROCs enable rapid identification of 1:1 hits and 1:n hits.
Fig. 9: A small fraction of functional gene sets contributes highly to cell-type replicability.
Fig. 10: Top-scoring gene sets can be broken down into characteristic genes for each cell type.
Fig. 11: Selection of a bad highly variable gene set leads to suboptimal performance and obscures biological signal.
Fig. 12: Absence of biological overlap between datasets leads to almost random performance and lack of hierarchical cell-type structure.
Fig. 13: Disrupting formatting of cell type names in pre-trained models leads to random performance.
Fig. 14: MetaNeighbor results are robust to batch effects.
Fig. 15: MetaNeighbor finds replicable cell types in a multimodal dataset of the mouse primary motor cortex.
Fig. 16: MetaNeighbor AUROCs offer a generalizable and batch-effect-free quantification of cell-type similarity.

Similar content being viewed by others

Data availability

The datasets analyzed in the protocol are all previously published and publicly available. Human pancreas datasets were from Baron et al.33 (Gene Expression Omnibus (GEO) accession code GSE84133), Lawlor et al.34 (GEO accession code GSE86473), Muraro et al.35 (GEO accession code GSE85241) and Segerstolpe et al.36 (ArrayExpress accession code E-MTAB-5061). These datasets are accessed through the Bioconductor scRNAseq library in the protocol. The mouse primary visual cortex dataset was from Tasic et al.32 (GEO accession code GSE71585), accessed through the Bioconductor scRNAseq library. The BICCN dataset for the mouse primary motor cortex from Yao et al.4 is available on the Neuroscience Multi-Omic archive ( The subset of the BICCN data necessary to run the protocol is also available on FigShare at (R version) and (Python version).

Code availability

The code for the procedures (including all figures) is freely available on GitHub at in multiple formats (Rmd, PDF and jupyter notebook for R and Python). The scripts used to generate the protocol data are available in the same repository. The stable R version of MetaNeighbor is available through Bioconductor ( at (the protocol was generated by using version 3.12), and the development versions are available on GitHub at (R version) and (Python version).


  1. Hay, S. B., Ferchen, K., Chetal, K., Grimes, H. L. & Salomonis, N. The Human Cell Atlas bone marrow single-cell interactive web portal. Exp. Hematol. 68, 51–61 (2018).

    Article  Google Scholar 

  2. Schaum, N. et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).

    Article  Google Scholar 

  3. Almanzar, N. et al. A single-cell transcriptomic atlas characterizes ageing tissues in the mouse. Nature 583, 590–595 (2020).

    Article  CAS  Google Scholar 

  4. Yao, Z. et al. An integrated transcriptomic and epigenomic atlas of mouse primary motor cortex cell types. Nature (in the press).

  5. Yao, Z. et al. A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation. Cell (in the press).

  6. Bakken, T. E. et al. Evolution of cellular diversity in primary motor cortex of human, marmoset monkey, and mouse. Nature (in the press).

  7. Duò, A., Robinson, M. D. & Soneson, C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res. 7, 1141 (2018).

    Article  Google Scholar 

  8. Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).

    Article  CAS  Google Scholar 

  9. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).

    Article  CAS  Google Scholar 

  10. Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887.e17 (2019).

    Article  CAS  Google Scholar 

  11. Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).

    Article  CAS  Google Scholar 

  12. Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2019).

    Article  CAS  Google Scholar 

  13. Barkas, N. et al. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat. Methods 16, 695–698 (2019).

    Article  CAS  Google Scholar 

  14. Luo, C. et al. Single nucleus multi-omics links human cortical cell regulatory genome diversity to disease risk variants. Preprint at bioRxiv (2019).

  15. Crow, M., Paul, A., Ballouz, S., Huang, Z. J. & Gillis, J. Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor. Nat. Commun. 9, 884 (2018).

    Article  Google Scholar 

  16. Paul, A. et al. Transcriptional architecture of synaptic communication delineates GABAergic neuron identity. Cell 171, 522–539.e20 (2017).

    Article  CAS  Google Scholar 

  17. Hodge, R. D. et al. Conserved cell types with divergent features in human versus mouse cortex. Nature 573, 61–68 (2019).

    Article  CAS  Google Scholar 

  18. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    Article  CAS  Google Scholar 

  19. Forcato, M., Romano, O. & Bicciato, S. Computational methods for the integrative analysis of single-cell data. Brief. Bioinform. 22, 20–29 (2020).

    Google Scholar 

  20. Hie, B. et al. Computational methods for single-cell RNA sequencing. Annu. Rev. Biomed. Data Sci. 3, 339–364 (2020).

    Article  Google Scholar 

  21. Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 21, 12 (2020).

    Article  CAS  Google Scholar 

  22. Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Preprint at bioRxiv (2020).

  23. Abdelaal, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20, 194 (2019).

    Article  Google Scholar 

  24. Kiselev, V. Y., Yiu, A. & Hemberg, M. scmap: projection of single-cell RNA-seq data across data sets. Nat. Methods 15, 359–362 (2018).

    Article  CAS  Google Scholar 

  25. Büttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).

    Article  Google Scholar 

  26. Kapp, A. V. & Tibshirani, R. Are clusters found in one dataset present in another dataset? Biostatistics 8, 9–31 (2007).

    Article  Google Scholar 

  27. Dudoit, S., Fridlyand, J. & Speed, T. P. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97, 77–87 (2002).

    Article  CAS  Google Scholar 

  28. Kiselev, V. Y. et al. SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14, 483–486 (2017).

    Article  CAS  Google Scholar 

  29. Tasic, B. et al. Shared and distinct transcriptomic cell types across neocortical areas. Nature 563, 72–78 (2018).

    Article  CAS  Google Scholar 

  30. gillislab/MetaNeighbor-Protocol. (2020).

  31. Protocol data (R version). (2020).

  32. Tasic, B. et al. Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nat. Neurosci. 19, 335–346 (2016).

    Article  CAS  Google Scholar 

  33. Baron, M. et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 3, 346–360.e4 (2016).

    Article  CAS  Google Scholar 

  34. Lawlor, N. et al. Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes. Genome Res. 27, 208–222 (2017).

    Article  CAS  Google Scholar 

  35. Muraro, M. J. et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3, 385–394.e3 (2016).

    Article  CAS  Google Scholar 

  36. Segerstolpe, Å. et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 24, 593–607 (2016).

    Article  CAS  Google Scholar 

Download references


J.G. was supported by NIH grants R01MH113005 and R01LM012736. S.F. was supported by NIH grant U19MH114821. B.D.H. was supported by the CSHL Crick Cray Fellowship. M.C. was supported by NIH grant K99MH120050.

Author information

Authors and Affiliations



S.F., M.C., B.D.H. and J.G. designed the experiments, performed the data analysis and wrote the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jesse Gillis.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Protocols thanks Praneet Chaturvedi, Guoji Guo, Ahmed Mahfouz, Nathan Salomonis and Daniel Schnell for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Crow, M. et al. Nat. Commun. 9, 884 (2018):

Paul, A. et al. Cell 171, 522–539.e20 (2017):

Yao, Z. et al. Preprint at bioRxiv (2020):

Bakken, T. E. et al. Preprint at bioRxiv (2020):

Key data used in this protocol

Yao, Z. et al. Preprint at bioRxiv (2020)

Baron, M. et al. Cell Syst. 3, 346–360.e4 (2016)

Lawlor, N. et al. Genome Res. 27, 208–222 (2017)

Muraro, M. J. et al. Cell Syst. 3, 385–394.e3 (2016)

Segerstolpe, Å. et al. Cell Metab. 24, 593–607 (2016)

Tasic, B. et al. Nat. Neurosci. 19, 335–346 (2016)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fischer, S., Crow, M., Harris, B.D. et al. Scaling up reproducible research for single-cell transcriptomics using MetaNeighbor. Nat Protoc 16, 4031–4067 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing