Single-cell mRNA quantification and differential analysis with Census

Journal name:
Nature Methods
Volume:
14,
Pages:
309–315
Year published:
DOI:
doi:10.1038/nmeth.4150
Received
Accepted
Published online

Abstract

Single-cell gene expression studies promise to reveal rare cell types and cryptic states, but the high variability of single-cell RNA-seq measurements frustrates efforts to assay transcriptional differences between cells. We introduce the Census algorithm to convert relative RNA-seq expression levels into relative transcript counts without the need for experimental spike-in controls. Analyzing changes in relative transcript counts led to dramatic improvements in accuracy compared to normalized read counts and enabled new statistical tests for identifying developmentally regulated genes. Census counts can be analyzed with widely used regression techniques to reveal changes in cell-fate-dependent gene expression, splicing patterns and allelic imbalances. We reanalyzed single-cell data from several developmental and disease studies, and demonstrate that Census enabled robust analysis at multiple layers of gene regulation. Census is freely available through our updated single-cell analysis toolkit, Monocle 2.

At a glance

Figures

  1. Census approximation of relative transcript counts in single cells without external RNA standards.
    Figure 1: Census approximation of relative transcript counts in single cells without external RNA standards.

    (a) Typical single-cell RNA-seq procedure for estimating mRNA abundances via spike-in standards. Losses alter the distribution of relative gene expression levels in a single cell. RT, reverse transcription. (b) Distribution of transcript counts corresponding to each cell's most frequently observed relative abundance (i.e., TPM) in cDNA or lysate RNA from lung epithelial data25. (c) Total transcripts per lung epithelial cell estimated using Census counts versus using spike-in controls. Blue line indicates linear regression. The shading around the blue line indicates the 95% confidence interval of the regression. Black line indicates perfect concordance. (d) MA plot for expressed genes based on contrasts between cells from embryonic day (E)14.5 and cells from all other time points. Census transcript counts (top); transcript counts derived by spike-in regression (bottom). (e) Fold changes in gene expression based on Census counts or spike-in regression of spike-ins, contrasting cells from E14.5 and all other time points.

  2. Census counts improved the accuracy of differential expression analysis.
    Figure 2: Census counts improved the accuracy of differential expression analysis.

    (a) Receiver-operating characteristic (ROC) curves showing the accuracy of differential expression (DE) analysis between E14.5 and E18.5 lung epithelial cells25. Tools were provided with relative expression levels, normalized read counts, and transcript counts estimated with spike-ins or Census. A permutation-based test was applied to the spike-in-based expression levels to determine a ground truth set of DE genes. TPM (true total), counts derived by scaling TPM values by the correct per-cell total RNA. AUC, area under the curve. (b) Consensus between Monocle, DESeq2, edgeR and permutation tests using different measures of expression. Lighter bar colors, size of the union of DE genes reported by any of the four tests. Darker bar colors, number of DE genes identified by all tests.

  3. BEAM identification of branch-dependent gene expression and potential drivers of lung epithelial fate specification.
    Figure 3: BEAM identification of branch-dependent gene expression and potential drivers of lung epithelial fate specification.

    (a) Monocle recovered a branched single-cell trajectory beginning with bronchoalveolar progenitors and terminating at type I (AT1) and type II (AT2) pneumocytes. High expression of proliferation markers (Ccnb2 and Cdk1) was restricted to progenitor cells, whereas high expression of AT1 (Pdpn) and AT2 (Sftpb) markers was restricted to their corresponding lineages. Size of circles denotes level of expression. (b) BEAM uses generalized linear models with natural splines to perform a regression on the data in which branch assignments are known (alternative model), fitting a separate curve for each branch. It also performs a regression in which branch assignments are not known (null model) by fitting a single curve for all the data, and then compares these models via a likelihood ratio test. (c) Null and alternative model fits for AT1 and AT2 markers (Ager and Sftpb, respectively) and housekeeping genes (Hprt and Pgk1). Solid lines, smoothed expression curves for each branch in the alternative model. Dashed lines, fitted curve in null model used in the BEAM test.

  4. Loss of interferon signaling generated a branch in the trajectory followed by immune-stimulated dendritic cells.
    Figure 4: Loss of interferon signaling generated a branch in the trajectory followed by immune-stimulated dendritic cells.

    (a) Experimental design used in ref. 36 to compare BMDCs from Ifnar1−/− and Stat1−/− knockout mice against the wild type as they respond to LPS. (b) Single-cell trajectory recovered by Monocle 2. (c) Six kinetic clusters of branch-dependent genes identified by BEAM are functionally enriched for interferon signaling and other immune-related processes. (d) Branch time point for the significant (via the BEAM test) branching antiviral regulators and their significant branching targets collected from ref. 48 figure 4. (e) Branch time points for the TFs with motifs enriched in nearby DHS site from significant branch genes from cluster 5 and their potential target genes in cluster 5 in c. For all boxplots, upper and lower 'hinges' correspond to the first and third quartiles (the 25th and 75th percentiles), whiskers extend to the highest (or lowest) value that is within 1.5 × inter-quartile range of the hinge, or distance between the first and third quartiles. Points beyond the whiskers are the remaining data. The center line corresponds to the median.

  5. Census enabled robust analysis of differential splicing during human myoblast differentiation.
    Figure 5: Census enabled robust analysis of differential splicing during human myoblast differentiation.

    (a) Splicing structure of human gene TPM1, with the three alternatively spliced sets of exons highlighted. (b) Percent-spliced-in (PSI) values for TPM1 alternative exons. PSI values were computed by summing Census counts for isoforms including each exon and dividing by the total TPM1 transcript count in each cell. Black lines indicate loess smoothing of the PSI values as a function of pseudotime.

  6. Census detected shifts in allelic balance in single cells during embryogenesis.
    Figure 6: Census detected shifts in allelic balance in single cells during embryogenesis.

    (a) A quasibinomial regression model detected changes in allelic balance in single cells as a function of embryo stage. (b) Spread of X-chromosome inactivation as measured by Census for female embryos at different stages (compare with ref. 45 figure 2b). (c) Number of genes with at least 10% contribution from the maternal and paternal copies of X chromosome. (d) Observed monoallelic expression in single cells from late stage embryos as measured by Census transcript counts (top) or normalized read counts (bottom). Red line indicates median fraction of monoallelic calls as a function of average transcript count across cells. Only autosomal genes are shown. Black bars indicate 95% prediction interval generated by a quasibinomial regression model fit to each gene, with the median of the gene intervals indicated by the blue line. Light red points indicate individual genes that fall outside the prediction interval.

Accession codes

References

  1. Macosko, E.Z., Basu, A., Satija, R., Nemesh, J. & Shekhar, K. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 12021214 (2015).
  2. Klein, A.M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 11871201 (2015).
  3. Shalek, A.K. et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498, 236240 (2013).
  4. Grün, D., Kester, L. & van Oudenaarden, A. Validation of noise models for single-cell transcriptomics. Nat. Methods 11, 637640 (2014).
  5. Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).
  6. Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 15431551 (2011).
  7. Fu, G.K., Hu, J., Wang, P.-H. & Fodor, S.P.A. Counting individual DNA molecules by the stochastic attachment of diverse labels. Proc. Natl. Acad. Sci. USA 108, 90269031 (2011).
  8. Hug, H. & Schuler, R. Measurement of the number of molecules of a single mRNA species in a complex mRNA preparation. J. Theor. Biol. 221, 615624 (2003).
  9. Picelli, S., Faridani, O.R., Björklund, A.K. & Winberg, G. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171181 (2014).
  10. Petropoulos, S. et al. Single-cell RNA-seq reveals lineage and X chromosome dynamics in human preimplantation embryos. Cell 165, 10121026 (2016).
  11. Segerstolpe, Å. et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 24, 593607 (2016).
  12. Zeisel, A. et al. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 11381142 (2015).
  13. Jaitin, D.A. et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343, 776779 (2014).
  14. Wu, A.R. et al. Quantitative assessment of single-cell RNA-sequencing methods. Nat. Methods 11, 4146 (2014).
  15. Treutlein, B. et al. Dissecting direct reprogramming from fibroblast to neuron using single-cell RNA-seq. Nature 534, 391395 (2016).
  16. Robinson, M.D., McCarthy, D.J. & Smyth, G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139140 (2010).
  17. Love, M.I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
  18. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381386 (2014).
  19. Kharchenko, P.V., Silberstein, L. & Scadden, D.T. Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740742 (2014).
  20. Tang, F. et al. Tracing the derivation of embryonic stem cells from the inner cell mass by single-cell RNA-Seq analysis. Cell Stem Cell 6, 468478 (2010).
  21. Buganim, Y. et al. Single-cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase. Cell 150, 12091222 (2012).
  22. Zhou, J.X. & Huang, S. Understanding gene circuits at cell-fate branch points for rational cell reprogramming. Trends Genet. 27, 5562 (2011).
  23. Moignard, V. et al. Decoding the regulatory network of early blood development from single-cell gene expression measurements. Nat. Biotechnol. 33, 269276 (2015).
  24. Marco, E. et al. Bifurcation analysis of single-cell gene expression data reveals epigenetic landscape. Proc. Natl. Acad. Sci. USA 111, E5643E5650 (2014).
  25. Treutlein, B. et al. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509, 371375 (2014).
  26. Hochegger, H., Takeda, S. & Hunt, T. Cyclin-dependent kinases and cell-cycle transitions: does one fit all? Nat. Rev. Mol. Cell Biol. 9, 910916 (2008).
  27. Desai, T.J., Brownfield, D.G. & Krasnow, M.A. Alveolar progenitor and stem cells in lung development, renewal and cancer. Nature 507, 190194 (2014).
  28. Chi, X., Garnier, G., Hawgood, S. & Colten, H.R. Identification of a novel alternatively spliced mRNA of murine pulmonary surfactant protein B. Am. J. Respir. Cell Mol. Biol. 19, 107113 (1998).
  29. McCullagh, P. & Nelder, J.A. Generalized Linear Models 2nd edn. (CRC Press, 1989).
  30. Shu, W. et al. Foxp2 and Foxp1 cooperatively regulate lung and esophagus development. Development 134, 19912000 (2007).
  31. Yin, Y. et al. An FGF-WNT gene regulatory network controls lung mesenchyme development. Dev. Biol. 319, 426436 (2008).
  32. Shu, W., Yang, H., Zhang, L., Lu, M.M. & Morrisey, E.E. Characterization of a new subfamily of winged-helix/forkhead (Fox) genes that are expressed in the lung and act as transcriptional repressors. J. Biol. Chem. 276, 2748827497 (2001).
  33. Wan, H. et al. Kruppel-like factor 5 is required for perinatal lung morphogenesis and function. Development 135, 25632572 (2008).
  34. Xu, Y. et al. C/EBPα is required for pulmonary cytoprotection during hyperoxia. Am. J. Physiol. Lung Cell. Mol. Physiol. 297, L286L298 (2009).
  35. Okubo, T. & Hogan, B.L.M. Hyperactive Wnt signaling changes the developmental potential of embryonic lung endoderm. J. Biol. 3, 11 (2004).
  36. Shalek, A.K. et al. Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature 510, 363369 (2014).
  37. Darnell, J.E. Jr., Kerr, I.M. & Stark, G.R. Jak-STAT pathways and transcriptional activation in response to IFNs and other extracellular signaling proteins. Science 264, 14151421 (1994).
  38. Honda, K. et al. IRF-7 is the master regulator of type-I interferon–dependent immune responses. Nature 434, 772777 (2005).
  39. Gautier, G. et al. A type I interferon autocrine-paracrine loop is involved in Toll-like receptor–induced interleukin-12p70 secretion by dendritic cells. J. Exp. Med. 201, 14351446 (2005).
  40. Lavin, Y. et al. Tissue-resident macrophage enhancer landscapes are shaped by the local microenvironment. Cell 159, 13121326 (2014).
  41. Welch, J.D., Hu, Y. & Prins, J.F. Robust detection of alternative splicing in a population of single cells. Nucleic Acids Res. 44, e73 (2016).
  42. Perrin, B.J. & Ervasti, J.M. The actin gene family: function follows isoform. Cytoskeleton 67, 630634 (2010).
  43. Tondeleir, D., Vandamme, D., Vandekerckhove, J., Ampe, C. & Lambrechts, A. Actin isoform expression patterns during mammalian development and in pathology: insights from mouse models. Cell Motil. Cytoskeleton 66, 798815 (2009).
  44. Gunning, P., O'Neill, G. & Hardeman, E. Tropomyosin-based regulation of the actin cytoskeleton in time and space. Physiol. Rev. 88, 135 (2008).
  45. Deng, Q., Ramsköld, D., Reinius, B. & Sandberg, R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343, 193196 (2014).
  46. Bray, N.L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525527 (2016).
  47. Kim, J.K., Kolodziejczyk, A.A., Ilicic, T., Teichmann, S.A. & Marioni, J.C. Characterizing noise structure in single-cell RNA-seq distinguishes genuine from technical stochastic allelic expression. Nat. Commun. 6, 8687 (2015).
  48. Amit, I. et al. Unbiased reconstruction of a mammalian transcriptional network mediating pathogen responses. Science 326, 257263 (2009).
  49. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).
  50. Yee, T.W. Vector Generalized Linear and Additive Models (Springer, 2015).
  51. Katz, Y., Wang, E.T., Airoldi, E.M. & Burge, C.B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 10091015 (2010).
  52. Keane, T.M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289294 (2011).
  53. Corbel, C., Diabangouaya, P., Gendrel, A.-V., Chow, J.C. & Heard, E. Unusual chromatin status and organization of the inactive X chromosome in murine trophoblast giant cells. Development 140, 861872 (2013).
  54. Yang, F., Babak, T., Shendure, J. & Disteche, C.M. Global survey of escape from X inactivation by RNA-sequencing in mouse. Genome Res. 20, 614622 (2010).

Download references

Author information

Affiliations

  1. Department of Genome Sciences, University of Washington, Seattle, Washington, USA.

    • Xiaojie Qiu,
    • Andrew Hill,
    • Jonathan Packer,
    • Dejun Lin &
    • Cole Trapnell
  2. Molecular and Cellular Biology Program, University of Washington, Seattle, Washington, USA.

    • Xiaojie Qiu &
    • Cole Trapnell
  3. Department of Applied Mathematics, University of Washington, Seattle, Washington, USA.

    • Yi-An Ma

Contributions

X.Q. and C.T. designed Census and the regression methods. X.Q. implemented the methods. X.Q. and A.H. performed the analysis. J.P., D.L. and Y.-A.M. contributed to technical design. C.T. conceived the project. All authors wrote the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (18,419 KB)

    Supplementary Figures 1–15, Supplementary Table 2 and Supplementary Note 1

Excel files

  1. Supplementary Tables (170 KB)

    Supplementary Table 1

Text files

  1. Supplementary Data (1140 KB)

    Text file storing the result (p-value) from the permutation test used in benchmarking differential gene expression based on spike-in transcript counts. Each row corresponds to a gene.

Zip files

  1. Supplementary Software (6550 KB)

    A tarball includes a version of monocle 2 (version: 1.99) used to produce all the figures, supplementary data is provided along with this submission and a helper package including helper functions are included as well as all analysis code which can reproduce all figures in this study are provided.

Additional data