Advances in single-cell genomics now enable large-scale comparisons of cell states across two or more experimental conditions. Numerous statistical tools are available to identify individual genes, proteins or chromatin regions that differ between conditions, but many experiments require inferences at the level of cell types, as opposed to individual analytes. We developed Augur to prioritize the cell types within a complex tissue that are most responsive to an experimental perturbation. In this protocol, we outline the application of Augur to single-cell RNA-seq data, proceeding from a genes-by-cells count matrix to a list of cell types ranked on the basis of their separability following a perturbation. We provide detailed instructions to enable investigators with limited experience in computational biology to perform cell-type prioritization within their own datasets and visualize the results. Moreover, we demonstrate the application of Augur in several more specialized workflows, including the use of RNA velocity for acute perturbations, experimental designs with multiple conditions, differential prioritization between two comparisons, and single-cell transcriptome imaging data. For a dataset containing on the order of 20,000 genes and 20 cell types, this protocol typically takes 1–4 h to complete.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
MacroH2A restricts inflammatory gene expression in melanoma cancer-associated fibroblasts by coordinating chromatin looping
Nature Cell Biology Open Access 21 August 2023
Nature Open Access 09 November 2022
Nature Communications Open Access 28 September 2021
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).
Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13, 599–604 (2018).
Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).
Tabula Muris Consortium. et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).
Han, X. et al. Mapping the mouse cell atlas by Microwell-Seq. Cell 173, 1307 (2018).
Han, X. et al. Construction of a human cell landscape at single-cell level. Nature 581, 303–309 (2020).
Plass, M. et al. Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics. Science 360, (2018).
Cao, J. et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357, 661–667 (2017).
Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).
Vieira Braga, F. A. et al. A cellular census of human lungs identifies novel cell states in health and in asthma. Nat. Med. 25, 1153–1163 (2019).
Mathys, H. et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature 570, 332–337 (2019).
Grubman, A. et al. A single-cell atlas of entorhinal cortex from individuals with Alzheimer’s disease reveals cell-type-specific gene expression regulation. Nat. Neurosci. 22, 2087–2097 (2019).
Smillie, C. S. et al. Intra- and inter-cellular rewiring of the human colon during ulcerative colitis. Cell 178, 714–730.e22 (2019).
Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).
Wagner, D. E. et al. Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo. Science 360, 981–987 (2018).
Tabula Muris Consortium. A single-cell transcriptomic atlas characterizes ageing tissues in the mouse. Nature 583, 590–595 (2020).
Svensson, V., da Veiga Beltrame, E. & Pachter, L. A curated database reveals trends in single-cell transcriptomics. Database https://doi.org/10.1093/database/baaa073 (2020).
Soneson, C. & Robinson, M. D. Bias, robustness and scalability in single-cell differential expression analysis. Nat. Methods 15, 255–261 (2018).
Crowell, H. L. et al. muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data. Nat. Comun. 11, 6077 (2020).
Zimmerman, K. D., Espeland, M. A. & Langefeld, C. D. A practical solution to pseudoreplication bias in single-cell studies. Nat. Commun. 12, 738 (2021).
Rossi, M. A. et al. Obesity remodels activity and transcriptional state of a lateral hypothalamic brake on feeding. Science 364, 1271–1274 (2019).
Hrvatin, S. et al. Single-cell analysis of experience-dependent transcriptomic states in the mouse visual cortex. Nat. Neurosci. 21, 120–129 (2018).
Hashikawa, Y. et al. Transcriptional and spatial resolution of cell types in the mammalian habenula. Neuron 106, 743–758.e5 (2020).
Sathyamurthy, A. et al. Massively parallel single nucleus transcriptional profiling defines spinal cord neurons and their activity during behavior. Cell Rep 22, 2216–2225 (2018).
Hrvatin, S. et al. Neurons that regulate mouse torpor. Nature 583, 115–121 (2020).
Schirmer, L. et al. Neuronal vulnerability and multilineage diversity in multiple sclerosis. Nature 573, 75–82 (2019).
Avey, D. et al. Single-cell RNA-seq uncovers a robust transcriptional response to morphine by glia. Cell Rep 24, 3619–3629.e4 (2018).
Kotliarov, Y. et al. Broad immune activation underlies shared set point signatures for vaccine responsiveness in healthy individuals and disease activity in patients with lupus. Nat. Med. 26, 618–629 (2020).
Skinnider, M. A. et al. Cell type prioritization in single-cell data. Nat. Biotechnol. https://doi.org/10.1038/s41587-020-0605-1 (2020).
La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).
Wagner, F. B. et al. Targeted neurotechnology restores walking in humans with spinal cord injury. Nature 563, 65–71 (2018).
Formento, E. et al. Electrical spinal cord stimulation must preserve proprioception to enable locomotion in humans with spinal cord injury. Nat. Neurosci. 21, 1728–1741 (2018).
Hagai, T. et al. Gene expression variability across cells and species shapes innate immunity. Nature 563, 197–202 (2018).
Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).
Moffitt, J. R. et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362, eaau5324 (2018).
Lareau, C. A. et al. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat. Biotechnol. 37, 916–924 (2019).
Lin, L. I. A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 255–268 (1989).
Bentsen, M. A. et al. Transcriptomic analysis links diverse hypothalamic cell types to fibroblast growth factor 1-induced sustained diabetes remission. Nat. Commun. 11, 4458 (2020).
Kim, D.-W. et al. Multimodal analysis of cell types in a hypothalamic node controlling social behavior. Cell 179, 713–728.e17 (2019).
Wu, Y. E., Pan, L., Zuo, Y., Li, X. & Hong, W. Detecting activated cell populations using single-cell RNA-seq. Neuron 96, 313–329.e6 (2017).
Skinnider, M. A., Squair, J. W. & Foster, L. J. Evaluating measures of association for single-cell transcriptomics. Nat. Methods 16, 381–386 (2019).
Clevers, H. et al. What is your conceptual definition of “cell type” in the context of a mature organism? Cell Syst. 4, 255–259 (2017).
Trapnell, C. Defining cell types and states with single-cell genomics. Genome Res 25, 1491–1498 (2015).
Zappia, L. & Oshlack, A. Clustering trees: a visualization for evaluating clusterings at multiple resolutions. Gigascience 7, giy083 (2018).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).
Amezquita, R. A. et al. Orchestrating single-cell analysis with Bioconductor. Nat. Methods 17, 137–145 (2020).
Luecken, M. D. & Theis, F. J. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15, e8746 (2019).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Pliner, H. A., Shendure, J. & Trapnell, C. Supervised classification enables rapid annotation of cell atlases. Nat. Methods 16, 983–986 (2019).
Zhang, A. W. et al. Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat. Methods 16, 1007–1015 (2019).
Abdelaal, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20, 194 (2019).
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Petukhov, V. et al. dropEst: pipeline for accurate estimation of molecular counts in droplet-based single-cell RNA-seq experiments. Genome Biol. 19, 78 (2018).
Melsted, P. et al. Modular, efficient and constant-memory single-cell RNA-seq preprocessing. Nat. Biotechnol. https://doi.org/10.1038/s41587-021-00870-2 (2021).
Srivastava, A., Malik, L., Smith, T., Sudbery, I. & Patro, R. Alevin efficiently estimates accurate gene abundances from dscRNA-seq data. Genome Biol. 20, 65 (2019).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Chen, H. et al. Assessment of computational methods for the analysis of single-cell ATAC-seq data. Genome Biol. 20, 241 (2019).
Ilicic, T. et al. Classification of low quality cells from single-cell RNA-seq data. Genome Biol. 17, 29 (2016).
Lun, A. T. L. et al. EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biol. 20, 63 (2019).
Wolock, S. L., Lopez, R. & Klein, A. M. Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst. 8, 281–291.e9 (2019).
McGinnis, C. S., Murrow, L. M. & Gartner, Z. J. DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. Cell Syst. 8, 329–337.e4 (2019).
Bhattacherjee, A. et al. Cell type-specific transcriptional programs in mouse prefrontal cortex during adolescence and addiction. Nat. Commun. 10, 4169 (2019).
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2018).
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2019).
Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887.e17 (2019).
Polański, K. et al. BBKNN: fast batch alignment of single cell transcriptomes. Bioinformatics 36, 964–965 (2020).
Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 21, 12 (2020).
Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).
Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).
Büttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).
Lewitus, G. M. et al. Microglial TNF-α suppresses cocaine-induced plasticity and behavioral sensitization. Neuron 90, 483–491 (2016).
Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).
McDavid, A. et al. Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. Bioinformatics 29, 461–467 (2013).
Ntranos, V., Yi, L., Melsted, P. & Pachter, L. A discriminative learning approach to differential expression analysis for single-cell RNA-seq. Nat. Methods 16, 163–166 (2019).
Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).
Erhard, F. et al. scSLAM-seq reveals core features of transcription dynamics in single cells. Nature 571, 419–423 (2019).
Phipson, B. & Smyth, G. K. Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn. Stat. Appl. Genet. Mol. Biol. 9, 39 (2010).
Schwartz, G. W. et al. TooManyCells identifies and visualizes relationships of single-cell clades. Nat. Methods 17, 405–413 (2020).
M.A.S. acknowledges support from Wings for Life, the Canadian Institutes of Health Research (CIHR) (Vanier Canada Graduate Scholarship, Michael Smith Foreign Study Supplement), an Izaak Walton Killam Memorial Pre-Doctoral Fellowship, a UBC Four Year Fellowship, a Vancouver Coastal Health–CIHR–UBC MD/PhD Studentship, a Brain Canada Hubert van Tol fellowship and a BCRegMed Collaborative Research Travel Grant. J.W.S. is supported by a CIHR Banting postdoctoral fellowship and a Marie Skłodowska-Curie individual fellowship (No. 842578). Work in L.J.F.’s group is supported by Genome Canada/Genome BC (Project 264PRO). The present work was supported by a Consolidator Grant from the European Research Council (ERC-2015-CoG HOW2WALKAGAIN 682999) and the Swiss National Science Foundation (subside 310030_192558).
G.C. is a founder and shareholder of ONWARD Medical, a company with no direct relationships with the present work.
Peer review information Nature Protocols thanks Lyla Atta, Jean Fan, Brendan Miller, Joshua Welch and the other, anonymous reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Key reference using this protocol
Skinnider, M. et al. Nat. Biotechnol. 39, 30–34 (2021): https://doi.org/10.1038/s41587-020-0605-1
Key data used in this protocol
Kang, H. et al. Nat. Biotechnol. 36, 89–94 (2018): https://doi.org/10.1038/nbt.4042
Bhattacherjee, A. et al. Nat. Commun. 10, 4169 (2019): https://doi.org/10.1038/s41467-019-12054-3
Moffitt, J. R. et al. Science 362, eaau5324 (2018): https://doi.org/10.1126/science.aau5324
Differential cell-type prioritization in the Moffitt et al., 201835 MERFISH dataset with variable numbers of permutations, subsamples per permutation, and total subsamples. a, Differential prioritization in the full dataset, with a background of 1,000 independent permutations. The top five cell types with a permutation P-value < 0.05 are shown throughout. b–e, Impact of reducing the number of permutations on differential prioritization. Differential prioritization yields stable results with the number of permutations decreased to 100, but becomes noisier below this threshold. Moreover, with 100 permutations, over 134 core-hours are required. b, Differential prioritization with 30 permutations (left) or 100 permutations (right). c, Correlations between differential prioritization –log10 P-values for each cell type in the reduced datasets with 30 permutations (left) or 100 permutations (right), compared with the full dataset of 1,000 permutations shown in a. d, Correlation of –log10 P-values to the full dataset for between 2 and 999 permutations. e, Total runtime required to perform between 1 and 1,000 permutations. f,g, Impact of reducing the number of subsamples on differential prioritization. A full 50 subsamples are required in each permuted dataset for accurate differential prioritization. f, Differential prioritization with one, five or ten subsamples per permutation. g, Correlations between differential prioritization –log10 P-values for each cell type in the reduced datasets with one, five or ten subsamples per permutation, compared with the complete dataset shown in a. h, Correlation coefficients to the full dataset with one, five or ten subsamples per permutation. Error bars show 95% confidence interval. i, Distribution of mean AUCs with 1, 5, 10 or 50 subsamples per permutation. The variance of null distribution is inflated with <50 subsamples per permutation, which precludes differential prioritization. j, Distribution of mean AUCs in the complete dataset of 1,000 permutations (‘default’), or an equivalent number of mean AUCs sampled with replacement from a background of 100, 500 or 1,000 total subsamples, with 50 subsamples per permutation. The null distribution using sampling with replacement is indistinguishable from the null distribution in the complete dataset. k–n, Sampling with replacement enables accurate differential prioritization at dramatically reduced computational cost, providing an optimized workflow for differential prioritization. k, Differential prioritization after sampling with replacement from a background of 100, 500 or 1,000 total subsamples. The original results from the complete dataset are approximated with 500 or more subsamples. l, Correlation of –log10 P-values to the full dataset for between 100, 500 and 1,000 total permutations. m, Correlation coefficients to the full dataset with between 50 and 1,000 mean AUCs drawn from a background of 100, 500 or 1,000 subsamples. n, Total runtime required to perform the full permutation analysis versus 100, 500 or 1,000 total permutations using augur_mode = "permute”.
Cell-type prioritization in simulated scRNA-seq data74 from a tissue with eight cell types and increasingly unequal numbers of cells per type, as quantified by the Gini coefficient29. The average number of DE genes at 5% false discovery rate in 50 subsamples of 20 cells per condition was tallied using six different statistical tests (t-test, Wilcoxon rank-sum test, likelihood ratio test75, logistic regression76, MAST77 and a negative binomial generalized linear model), implemented through the Seurat ‘FindMarkers’ function. The accuracy of cell-type prioritization was quantified as the Pearson correlation between the cell-type prioritizations (AUC or average number of DE genes, for Augur and single-cell differential expression tests, respectively) and the true proportion of DE genes under the simulation ground truth. The mean of five simulation replicates is shown throughout. Insets show binomial P-values for the sign of the difference in correlations (that is, the frequency with which Augur outperforms single-cell differential expression with subsampling), all with n = 120. a,b, Impact of perturbation intensity (differential expression effect size) on cell-type prioritization for a representative test for single-cell differential gene expression (Wilcoxon rank-sum test). Augur outperforms single-cell differential expression with subsampling in prioritizing cell types in the context of by subtler perturbations. c,d, Impact of sequencing depth (% of reads downsampled) on cell-type prioritization for a representative test for single-cell differential gene expression (Wilcoxon rank-sum test), with the location parameter of the differential expression factor log-normal distribution set to 0.5. Augur outperforms single-cell differential expression with subsampling in more sparsely sequenced datasets. e,f, Impact of perturbation intensity on cell-type prioritization for five additional tests for single-cell differential gene expression. g,h, Impact of sequencing depth on cell-type prioritization for five additional tests for single-cell differential gene expression.
About this article
Cite this article
Squair, J.W., Skinnider, M.A., Gautier, M. et al. Prioritization of cell types responsive to biological perturbations in single-cell data with Augur. Nat Protoc 16, 3836–3873 (2021). https://doi.org/10.1038/s41596-021-00561-x
This article is cited by
MacroH2A restricts inflammatory gene expression in melanoma cancer-associated fibroblasts by coordinating chromatin looping
Nature Cell Biology (2023)
Nature Reviews Genetics (2023)
Nature Communications (2021)