Abstract
Identifying and visualizing transcriptionally similar cells is instrumental for accurate exploration of the cellular diversity revealed by single-cell transcriptomics. However, widely used clustering and visualization algorithms produce a fixed number of cell clusters. A fixed clustering ‘resolution’ hampers our ability to identify and visualize echelons of cell states. We developed TooManyCells, a suite of graph-based algorithms for efficient and unbiased identification and visualization of cell clades. TooManyCells introduces a visualization model built on a concept intentionally orthogonal to dimensionality-reduction methods. TooManyCells is also equipped with an efficient matrix-free divisive hierarchical spectral clustering different from prevalent single-resolution clustering methods. TooManyCells enables multiresolution and multifaceted exploration of single-cell clades. An advantage of this paradigm is the immediate detection of rare and common populations that outperforms popular clustering and visualization algorithms, as demonstrated using existing single-cell transcriptomic data sets and new data modeling drug-resistance acquisition in leukemic T cells.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The accession number for the new data sets reported in this paper is Gene Expression Omnibus: GSE138892. Microfluidics single-cell RNA-seq count data from 11 organs in 3 female and 4 male, C57BL/6 NIA, 3-month-old mice were obtained from https://figshare.com/articles/_/5715025, removing P8 libraries due to outlier cell counts22. FACS-purified CD14+ monocytes, CD19+ B and CD4+ T cells were obtained from https://support.10xgenomics.com/single-cell-gene-expression/datasets (ref. 23). Data for seven cancer-cell lines were obtained from GSE81861 (ref. 17). FACS-purified B lymphocytes/natural killer, megakaryocyte-erythroid, and granulocyte-monocyte progenitors were obtained from GSE117498 (ref. 25).
Code availability
TooManyCells is available at https://github.com/faryabib/too-many-cells or as a Docker image https://cloud.docker.com/repository/docker/gregoryschwartz/too-many-cells/. An R wrapper for TooManyCells is available at https://cran.r-project.org/web/packages/TooManyCellsR. BirchBeer is available at https://github.com/faryabib/birch-beer or as a Docker image https://cloud.docker.com/repository/docker/gregoryschwartz/birch-beer. Codes necessary to reproduce the presented analyses are available at https://github.com/faryabib/NatMethods_TooManyCells_analysis.
References
Lafzi, A., Moutinho, C. & Picelli, S. Tutorial: guidelines for the experimental design of single-cell RNA sequencing studies. Nat. Protoc. 13, 2742 (2018).
Trapnell, C. Defining cell types and states with single-cell genomics. Genome Res. 25, 1491–1498 (2015).
Packer, J. & Trapnell, C. Single-cell multi-omics: an engine for new quantitative models of gene regulation. Trends Genet. 34, 653–665 (2018).
Liu, S. & Trapnell, C. Single-cell transcriptome sequencing: recent advances and remaining challenges. F1000Res 5, F1000 (2016).
Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell rna-seq in the past decade. Nat. Protoc. 13, 599–604 (2018).
Levine, J. H. et al. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015).
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).
Azizi, E., Prabhakaran, S., Carr, A. & Pe’er, D. Bayesian inference for single-cell clustering and imputing. Genomics Comput. Biol. 3, 46 (2017).
Ho, Y.-J. et al. Single-cell RNA-seq analysis identifies markers of resistance to targeted BRAF inhibitors in melanoma cell populations. Genome Res. 28, 1353–1363 (2018).
Van der Maaten, L. & Hinton, G. Visualizing data using T-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008).
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).
Nutt, S. L., Hodgkin, P. D., Tarlinton, D. M. & Corcoran, L. M. The generation of antibody- secreting plasma cells. Nat. Rev. Immunol. 15, 160–171 (2015).
Zeisel, A. et al. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142 (2015).
Lin, P., Troup, M. & Ho, J. W. K. CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol. 18, 59 (2017).
Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 49, 708–718 (2017).
Zappia, L. & Oshlack, A. C. lustering trees: a visualization for evaluating clusterings at multiple resolutions. Gigascience 7, 7–9 (2018).
Newman, M. E. J. & Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004).
Lancichinetti, A. & Fortunato, S. Limits of modularity maximization in community detection. Phys. Rev. E 84, 066122 (2011).
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
The Tabula Muris Consortium. et al. Single-cell transcriptomics of 20 mouse organs creates a tabula muris. Nature 562, 367–372 (2018).
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Herman, J. S. & Sagar and Grün, D. Fateid infers cell fate bias in multipotent progenitors from single-cell RNA-seq data. Nat. Methods 15, 379–386 (2018).
Pellin, D. et al. Comprehensive single cell transcriptional landscape of human hematopoietic progenitors. Nat Commun 10, 1–15 (2019).
Dahlin, J. S. et al. A single-cell hematopoietic landscape resolves 8 lineage trajectories and defects in kit mutant mice. Blood 131, e1–e11 (2018).
Borges da Silva, H. et al. Splenic macrophage subsets and their function during blood-borne infections. Front. Immunol. 6, 480 (2015).
Den Haan, J. M. M. & Kraal, G. Innate immune functions of macrophage subpopulations in the spleen. J. Innate Immun. 4, 437–445 (2012).
Hey, Y. Y. & O’Neill, H. C. Murine spleen contains a diversity of myeloid and dendritic cells distinct in antigen presenting function. J. Cell. Mol. Med. 16, 2611–2619 (2012).
Jojic, V. et al. Identification of transcriptional regulators in the mouse immune system. Nat. Immunol. 14, 633–643 (2013).
Winter, S. S. et al. Improved survival for children and young adults with t-lineage acute lymphoblastic leukemia: results from the children’s oncology group AALL0434 methotrexate randomization. J. Clin. Oncol. 36, 2926–2934 (2018).
Marks, D. I. et al. T-cell acute lymphoblastic leukemia in adults: clinical features, immunophenotype, cytogenetics, and outcome from the large randomized prospective trial (ukall XII/ECOG 2993). Blood 114, 5136–5145 (2009).
Aster, J. C., Pear, W. S. & Blacklow, S. C. The varied roles of notch in cancer. Annu. Rev. Pathol. Mech. Dis. 12, 245–275 (2017).
Knoechel, B. et al. An epigenetic mechanism of resistance to targeted therapy in T cell acute lymphoblastic leukemia. Nat. Genet. 46, 364–370 (2014).
Dluzen, D., Li, G., Tacelosky, D., Moreau, M. & Liu, D. X. BCL-2 is a downstream target of ATF5 that mediates the prosurvival function of ATF5 in a cell type-dependent manner. J. Biol. Chem. 286, 7705–7713 (2011).
Yamazaki, T. et al. Regulation of the human chop gene promoter by the stress response transcription factor ATF5 via the AARE1 site in human hepatoma HepG2 cells. Life Sci. 87, 294–301 (2010).
Liu, D. X., Qian, D., Wang, B., Yang, J.-M. & Lu, Z. P300-dependent ATF5 acetylation is essential for egr-1 gene activation and cell proliferation and survival. Mol. Cell. Biol. 31, 3906–3916 (2011).
Angelastro, J. M. Targeting ATF5 in cancer. Trends Cancer 3, 471–474 (2017).
Karpel-Massler, G. et al. A synthetic cell-penetrating dominant-negative ATF5 peptide exerts anticancer activity against a broad spectrum of treatment-resistant cancers. Clin. Cancer. Res. 22, 4698–4711 (2016).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Liberzon, A. et al. The molecular signatures database hallmark gene set collection. Cell Systems 1, 417–425 (2015).
Shu, L., Chen, A., Xiong, M. & Meng, W. Efficient spectral neighborhood blocking for entity resolution. In 2011 IEEE 27th International Conference on Data Engineering 1067–1078 (IEEE, 2011).
Shi, J. & Malik, J. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 18 (2000).
Sparck Jones, K. A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28, 11–21 (1972).
Manning, C. D., Raghavan, P. & Schütze, H. Introduction to Information Retrieval (Cambridge University Press, 2008).
Salton, G., Wong, A. & Yang, C. S. A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Hill, M. O. Diversity and evenness: a unifying notation and its consequences. Ecology 54, 427 (1973).
Schwartz, G. W. & Hershberg, U. Conserved variation: identifying patterns of stability and variability in BCR and TCR V genes with different diversity and richness metrics. Phys. Biol. 10, 035005 (2013).
Schwartz, G. W. & Hershberg, U. Germline amino acid diversity in b cell receptors is a good predictor of somatic selection pressures. Front. Immunol. 4, 357 (2013).
Meng, W. et al. An atlas of b-cell clonal distribution in the human body. Nat. Biotechnol. 35, 879–884 (2017).
Heck, K. L., van Belle, G. & Simberloff, D. Explicit calculation of the rarefaction diversity measurement and the determination of sufficient sample size. Ecology 56, 1459 (1975).
Tian, L. et al. Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments. Nat Methods 16, 479–487 (2019).
Ronen, J. & Akalin, A. netSmooth: network-smoothing based imputation for single cell RNA-seq. F1000Res 7, 8 (2018).
Dai, H., Li, L., Zeng, T. & Chen, L. Cell-specific network constructed by single-cell RNA sequencing data. Nucleic Acids Res. 47, e62 (2019).
Tan, P.-N., Steinbach, M., Karpatne, A. & Kumar, V. Introduction to Data Mining 2nd edn (Pearson, 2019).
Kvålseth, T. O. On normalized mutual information: measure derivations and properties. Entropy 19, 631 (2017).
Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).
Schwartz, G. W., Shokoufandeh, A., Ontañón, S. & Hershberg, U. Using a novel clumpiness measure to unite data with metadata: finding common sequence patterns in immune receptor germline v genes. Pattern Recognit. Lett. 74, 24–29 (2016).
Wolock, S. L., Lopez, R. & Klein, A. M. Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst. 8, 281–291.e9 (2019).
Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A. & Tyagi, S. Imaging individual mRNA molecules using multiple singly labeled probes. Nat. Methods 5, 877–879 (2008).
Acknowledgements
This work was supported by T32-CA-009140 (to G.W.S.), LLS-5456-17 (to J.P.), R01-CA-215518 (to W.S.P.), R01-HL-145754, the Penn Epigenetics pilot award and the Sloan Foundation Grant (to G.V.), Therapeutics Translational Medicine and Therapeutics program for Transdisciplinary Awards Program in Translational Medicine and Therapeutics, Concern Foundation’s The Conquer Cancer Now Award, Susan G. Komen CCR185472448 and R01-CA-230800 (to R.B.F.).
Author information
Authors and Affiliations
Contributions
Conceptualization: R.B.F., G.W.S.; Methodology: G.W.S., R.B.F.; Software: G.W.S.; Investigation: G.W.S., R.B.F., J.P., Y.Z.; Formal Analysis: G.W.S., R.B.F., J.P., M.F., S.M.S., L.X., Y.Z.; Resources and Reagents: R.B.F., G.V.; Writing, Review and Editing: G.W.S., R.B.F., W.S.P., J.P., Y.Z.; Writing, Original Draft: G.W.S., R.B.F.; Supervision: R.B.F.; Funding Acquisition: R.B.F.
Corresponding author
Ethics declarations
Competing Interests
The authors declare no competing interests.
Additional information
Peer review information Nicole Rusk and Lin Tang were the primary editors on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figures 1–33 and Notes 1–6
Supplementary Table 1
Differential expression analysis between the ascending (n = 1,780 cells) and sustained (n = 2,299 cells) subtrees in Supplementary Fig. 33e. QL F-test used with Benjamin–Hochberg method for multiple-hypothesis correction.
Supplementary Table 2
Differential expression analysis between untreated (n =2,338 cells) and short-term (n = 2,616 cells) populations in Supplementary Fig. 33i. QL F-test used with Benjamini–Hochberg method for multiple-hypothesis correction.
Supplementary Table 3
Differential expression analysis between parental (n =4,954 cells) and sustained (n = 2,417 cells) populations in Supplementary Fig. 33j. QL F-test used with Benjamini–Hochberg method for multiple-hypothesis correction.
Supplementary Table 4
Differential expression analysis between other parental (n = 4,926 cells) and resistant-like (n = 28 cells) populations in Supplementary Fig. 33k. QL F-test used with Benjamini–Hochberg method for multiple-hypothesis correction.
Supplementary Table 5
Differential expression analysis between sustained (n = 2,417 cells) and resistant-like (n = 28 cells) populations in Supplementary Fig. 33l. QL F-test used with Benjamini–Hochberg method for multiple-hypothesis correction.
Supplementary Table 6
Sequences of MYC and GAPDH RNA FISH probes.
Rights and permissions
About this article
Cite this article
Schwartz, G.W., Zhou, Y., Petrovic, J. et al. TooManyCells identifies and visualizes relationships of single-cell clades. Nat Methods 17, 405–413 (2020). https://doi.org/10.1038/s41592-020-0748-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-020-0748-5
This article is cited by
-
scBubbletree: computational approach for visualization of single cell RNA-seq data
BMC Bioinformatics (2024)
-
A fast, scalable and versatile tool for analysis of single-cell omics data
Nature Methods (2024)
-
Building and analyzing metacells in single-cell genomics data
Molecular Systems Biology (2024)
-
Identification of leukemia stem cell subsets with distinct transcriptional, epigenetic and functional properties
Leukemia (2024)
-
AnnoSpat annotates cell types and quantifies cellular arrangements from spatial proteomics
Nature Communications (2024)