Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Uncovering cell identity through differential stability with Cepo

A preprint version of the article is available at bioRxiv.

Abstract

The use of single-cell RNA-sequencing (scRNA-seq) allows observation of different cells at multi-tiered complexity in the same microenvironment. To get insights into cell identity using scRNA-seq data, we present Cepo, which generates cell-type-specific gene statistics of differentially stable genes from scRNA-seq data to define cell identity. When applied to multiple datasets, Cepo outperforms current methods in assigning cell identity and enhances several cell identification applications such as cell-type characterisation, spatial mapping of single cells and lineage inference of single cells.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Uncovering differentially stable genes in synthetic and experimental single-cell RNA-sequencing datasets.
Fig. 2: Retrieving CIGs to enhance interpretation of diverse single-cell applications associated with cell identity.

Similar content being viewed by others

Data availability

All the datasets used in this study are publicly available. The Molecular Signatures Database gene sets were downloaded from http://www.gsea-msigdb.org/gsea/msigdb/. The Tabula Muris data collection was downloaded from https://tabula-muris.ds.czbiohub.org/. The CellBench data collection was downloaded from https://github.com/LuyiTian/sc_mixology/. The Embryogenesis atlas data, which profiles 48 h of mouse embryonic development, was downloaded from https://github.com/MarioniLab/EmbryoTimecourse2018. The parsed Gastrulation data, sequenced using scNMT-seq, were downloaded from the link provided in https://github.com/rargelaguet/scnmt_gastrulation. The processed Gastrulation data were downloaded from http://www.human-gastrula.net. The hematopoietic stem cells differentiation data were downloaded from https://cytotrace.stanford.edu/. The Fetal tissue atlas data were downloaded from NCBI Gene Expression Omnibus under accession number GSE156793. The spatial embryo data were downloaded from NCBI Gene Expression Omnibus under accession number GSE120963.

Code availability

Cepo R package, source code to generate figures, and the detailed vignette including various applications such as its usage together with scRNA-seq data normalisation, batch correction and integration pipelines are available from https://github.com/PYangLab/Cepo (ref. 50).

References

  1. Wagner, A., Regev, A. & Yosef, N. Revealing the vectors of cellular identity with single-cell genomics. Nat. Biotechnol. 34, 1145–1160 (2016).

  2. Kotliar, D. et al. Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq. eLife https://doi.org/10.7554/eLife.43803 (2019).

  3. Morris, S. A. The evolving concept of cell identity in the single cell era. Development 146, dev169748 (2019).

  4. Wang, T., Li, B., Nelson, C. E. & Nabavi, S. Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC Bioinformatics 20, 40 (2019).

  5. Soneson, C. & Robinson, M. D. Bias, robustness and scalability in single-cell differential expression analysis. Nat. Methods 15, 255–261 (2018).

  6. Law, C. W., Chen, Y., Shi, W. & Smyth, G. K. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014).

    Article  Google Scholar 

  7. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2009).

  8. Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).

  9. Korthauer, K. D. et al. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biol. 17, 222 (2016).

  10. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).

    Article  Google Scholar 

  11. Ritchie, M. E. et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucl. Acids Res. 43, e47–e47 (2015).

    Article  Google Scholar 

  12. Tian, L. et al. Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments. Nat. Methods 16, 479–487 (2019).

  13. Segal, E., Friedman, N., Koller, D. & Regev, A. A module map showing conditional activity of expression modules in cancer. Nat. Genet. 36, 1090–1098 (2004).

  14. Cao, J. et al. A human cell atlas of fetal gene expression. Science 370, aba7721 (2020).

  15. Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature 566, 490–495 (2019).

  16. Argelaguet, R. et al. Multi-omics profiling of mouse gastrulation at single-cell resolution. Nature 576, 487–491 (2019).

  17. Tyser, R.C.V. et al. Single-cell transcriptomic characterization of a gastrulating human embryo. Nature https://doi.org/10.1038/s41586-021-04158-y (2021).

  18. Peng, G. et al. Molecular architecture of lineage allocation and tissue organization in early mouse embryo. Nature 572, 528–532 (2019).

  19. Akashi, K., Traver, D., Miyamoto, T. & Weissman, I. L. A clonogenic common myeloid progenitor that gives rise to all myeloid lineages. Nature 404, 193–197 (2000).

    Article  Google Scholar 

  20. Weinreb, C., Rodriguez-Fraticelli, A., Camargo, F. D. & Klein, A. M. Lineage tracing on transcriptional landscapes links state to fate during differentiation. Science 367, aaw3381 (2020).

  21. Olsson, A. et al. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature 537, 698–702 (2016).

  22. Schaum, N. et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).

  23. Lun, A. T. L., Bach, K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).

    Article  Google Scholar 

  24. Clark, S. J. et al. ScNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nat. Commun. 9, 781 (2018).

  25. Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1100 (2013).

    Article  Google Scholar 

  26. Cao, J. et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357, aam8940 (2017).

  27. Peng, G. et al. Spatial transcriptome for the molecular annotation of lineage fates and cell identity in mid-gastrula mouse embryo. Developmental Cell 36, 681–697 (2016).

  28. McCarthy, D. J., Campbell, K. R., Lun, A. T. L. & Wills, Q. F. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33, 1179–1186 (2017).

    Google Scholar 

  29. Lin, Y. et al. Evaluating stably expressed genes in single cells. GigaScience 8, giz106 (2019).

    Article  Google Scholar 

  30. Massey, F. J. The Kolmogorov–Smirnov test for goodness of fit. J. Am. Stat. Assoc. 46, 68–78 (1951).

  31. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

  32. Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).

  33. Kuhn, M. & Vaughan, D. Yardstick: Tidy Characterizations of Model Performance (Yardstick, 2020).

  34. Pagès, H. HDF5Array: HDF5 Backend for DelayedArray Objects. R package version 1.22.1, https://bioconductor.org/packages/HDF5Array (2020).

  35. Su, S. et al. CellBench: R/Bioconductor software for comparing single-cell RNA-seq analysis methods. Bioinformatics 36, 2288–2290 (2020).

  36. Van der Laan, M. J. & Pollard, K. S. A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. J. Stat. Plann. Inference 117, 275–303 (2003).

  37. Kim, T. et al. Impact of similarity metrics on single-cell RNA-seq data clustering. Brief. Bioinform. 20, 2316–2326 (2019).

  38. Lun, A. T. L., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Research https://doi.org/10.12688/f1000research.9501.2 (2016).

  39. Kolde, R. pheatmap: Pretty Heatmaps. R Package Version 1.0.12 R Package Version 1.0.8 (2015).

  40. Gómez-Rubio, V. ggplot2—elegant graphics for data analysis (2nd edition). J. Stat. Softw. https://doi.org/10.18637/jss.v077.b02 (2017).

  41. Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019)

  42. Street, K. et al. Slingshot: Cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19, 477 (2018).

  43. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).

    Article  Google Scholar 

  44. duVerle, D. A., Yotsukura, S., Nomura, S., Aburatani, H. & Tsuda, K. CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data. BMC Bioinform. 17, 363 (2016).

  45. Taddy, M. A. On estimation and selection for topic models. In Proc. 15th International Conference on Artificial Intelligence and Statistics (AISTATS) (AISTATS, 2012).

  46. Sergushichev, A. A. An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. Preprint at https://www.biorxiv.org/content/10.1101/060012v1 (2016).

  47. Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).

  48. Yu, G., Wang, L., Han, Y. & He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS J. Integr. Biol. 16, 284–287 (2012).

    Article  Google Scholar 

  49. Avila Cobos, F., Alquicira-Hernandez, J., Powell, J. E., Mestdagh, P. & de Preter, K. Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat. Commun. 11, 5650 (2020).

  50. Kim, H., Yang, P. & Wang, K. PYangLab/Cepo: Release of Cepo (Zenodo, 2021); https://doi.org/10.5281/ZENODO.5652243

Download references

Acknowledgements

We thank all of our colleagues—particularly at the School of Mathematics and Statistics, The University of Sydney and Sydney Precision Bioinformatics Alliance—for their support and intellectual engagement. This work is supported by an Australian Research Council (ARC)/Discovery Early Career Researcher Award (DE170100759) and a National Health and Medical Research Council (NHMRC) Investigator Grant (1173469) to P.Y., an Australian Research Council Discovery Project grant (DP170100654) to P.Y. and J.Y.H.Y., and an Australian Research Council (ARC) Postgraduate Research Scholarship and Children’s Medical Research Institute Postgraduate Scholarship to H.J.K.

Author information

Authors and Affiliations

Authors

Contributions

P.Y. and H.J.K. conceived the study with input J.Y.H.Y. and D.M.L. H.J.K. and K.W. developed the method and software with input from P.Y. H.J.K., P.Y. and K.W. performed data analyses with input from C.C. and Y.L. H.J.K., P.Y., K.W. and J.Y.H.Y. interpreted the results with input from P.P.L.T. H.J.K., P.Y. and K.W. wrote the manuscript with input from J.Y.H.Y. All of the authors read and approved the final version of the manuscript.

Corresponding author

Correspondence to Pengyi Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review information

Nature Computational Science thanks the anonymous reviewers for their contribution to the peer review of this work. Handling editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Source data

Source Data Fig. 1

Statistical Source Data for Fig. 1

Source Data Fig. 2

Statistical Source Data for Fig. 2

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, H.J., Wang, K., Chen, C. et al. Uncovering cell identity through differential stability with Cepo. Nat Comput Sci 1, 784–790 (2021). https://doi.org/10.1038/s43588-021-00172-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-021-00172-2

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing