Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Cell composition analysis of bulk genomics using single-cell data

Abstract

Single-cell RNA sequencing (scRNA-seq) is a rich resource of cellular heterogeneity, opening new avenues in the study of complex tissues. We introduce Cell Population Mapping (CPM), a deconvolution algorithm in which reference scRNA-seq profiles are leveraged to infer the composition of cell types and states from bulk transcriptome data (‘scBio’ CRAN R-package). Analysis of individual variations in lungs of influenza-virus-infected mice reveals that the relationship between cell abundance and clinical symptoms is a cell-state-specific property that varies gradually along the continuum of cell-activation states. The gradual change is confirmed in subsequent experiments and is further explained by a mathematical model in which clinical outcomes relate to cell-state dynamics along the activation process. Our results demonstrate the power of CPM in reconstructing the continuous spectrum of cell states within heterogeneous tissues.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the CPM algorithm.
Fig. 2: Performance assessed via synthetic data.
Fig. 3: Cellular heterogeneity during in vivo influenza virus infection, reconstructed by CPM.
Fig. 4: Confirmation of gradual changes in relationships of cells to physiology over a trajectory of cell-activation states.

Similar content being viewed by others

Code availability

CPM is implemented in the ‘scBio’ CRAN R package (the CPM function), available at https://cran.r-project.org/web/packages/scBio/index.html.

Data availability

All RNA-seq data that support the findings of this study have been deposited in the Gene Expression Omnibus (GEO) database under accession numbers GSE113530 and GSE117975.

References

  1. Wagner, A., Regev, A. & Yosef, N. Revealing the vectors of cellular identity with single-cell genomics. Nat. Biotechnol. 34, 1145–1160 (2016).

    Article  CAS  Google Scholar 

  2. Chen, X., Teichmann, S. A. & Meyer, K. B. From tissues to cell types and back: single-cell gene expression analysis of tissue architecture. Annual Review of Biomedical Data Science 1, 29–51 (2018).

    Article  Google Scholar 

  3. Krieg, C. et al. High-dimensional single-cell analysis predicts response to anti-PD-1immunotherapy. Nat. Med. 24, 144–153 (2018).

    Article  CAS  Google Scholar 

  4. Shalek, A. K. & Benson, M. Single-cell analyses to tailor treatments. Sci. Transl. Med. 9, eaan4730 (2017).

    Article  Google Scholar 

  5. Kim, K.-T. et al. Application of single-cell RNA sequencing in optimizing a combinatorial therapeutic strategy in metastatic renal cell carcinoma. Genome. Biol. 17, 80 (2016).

    Article  Google Scholar 

  6. Shen-Orr, S. S. & Gaujoux, R. Computational deconvolution: extracting cell type-specific information from heterogeneous samples. Curr. Opin. Immunol. 25, 571–578 (2013).

    Article  CAS  Google Scholar 

  7. Baron, M. et al. A single-cell transcriptomic map of the human and mouse Pancreas reveals inter- and intra-cell population structure. Cell Syst. 3, 346–360 (2016).

    Article  CAS  Google Scholar 

  8. Frishberg, A., Brodt, A., Steuerman, Y. & Gat-Viks, I. ImmQuant: a user-friendly tool for inferring immune cell-type composition from gene-expression data. Bioinformatics 32, 3842–3843 (2016).

    Article  CAS  Google Scholar 

  9. Avila Cobos, F., Vandesompele, J., Mestdagh, P. & De Preter, K. Computational deconvolution of transcriptomics data from mixed cell populations. Bioinformatics 34, 1969–1979 (2018).

    Article  Google Scholar 

  10. Puram, S. V. et al. Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck. Cancer Cell 171, 1611–1624 (2017).

    CAS  Google Scholar 

  11. Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).

    Article  CAS  Google Scholar 

  12. Schelker, M. et al. Estimation of immune cell content in tumour tissue using single-cell RNA-seq data. Nat. Commun. 8, 2032 (2017).

    Article  Google Scholar 

  13. Trapnell, C. Defining cell types and states with single-cell genomics. Genome Res. 25, 1491–1498 (2015).

    Article  CAS  Google Scholar 

  14. Rostom, R., Svensson, V., Teichmann, S. A. & Kar, G. Computational approaches for interpreting scRNA-seq data. FEBS Lett. 591, 2213–2225 (2017).

    Article  CAS  Google Scholar 

  15. Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13, 599–604 (2018).

    Article  CAS  Google Scholar 

  16. Steuerman, Y. et al. Dissection of influenza infection in vivo by single-cell RNA sequencing. Cell Syst. 6, 679–691.e4 (2018).

    Article  CAS  Google Scholar 

  17. Altboum, Z. et al. Digital cell quantification identifies global immune cell dynamics during influenza infection. Mol. Syst. Biol. 10, 720 (2014).

    Article  Google Scholar 

  18. Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457 (2015).

    Article  CAS  Google Scholar 

  19. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R. & Lin, C.-J. LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008).

    Google Scholar 

  20. Welsh, C. E. et al. Status and access to the collaborative cross population. Mamm. Genome 23, 706–712 (2012).

    Article  Google Scholar 

  21. Bottomly, D. et al. Expression quantitative trait loci for extreme host response to influenza a in pre-collaborative cross mice. G3 (Bethesda) 2, 213–221 (2012).

    Article  Google Scholar 

  22. Yu, Y.-R. A. et al. A protocol for the comprehensive flow cytometric analysis of immune cells in normal and inflamed murine non-lymphoid tissues. PLoS ONE 11, e0150606 (2016).

    Article  Google Scholar 

  23. Ferris, M. T. et al. Modeling host genetic regulation of influenza pathogenesis in the collaborative cross. PLoS Pathog. 9, e1003196 (2013).

    Article  Google Scholar 

  24. Dengler, L. et al. Cellular changes in blood indicate severe respiratory disease during influenza infections in mice. PLoS ONE 9, e103149 (2014).

    Article  Google Scholar 

  25. Coates, B. M. et al. Inflammatory monocytes drive influenza a virus-mediated lung injury in juvenile mice. J. Immunol. 200, 2391–2404 (2018).

    Article  CAS  Google Scholar 

  26. Tanay, A. & Regev, A. Scaling single-cell genomics from phenomenology to mechanism. Nature 541, 331–338 (2017).

    Article  CAS  Google Scholar 

  27. Shen-Orr, S. S. et al. Cell type–specific gene expression differences in complex tissues. Nat. Methods 7, 287–289 (2010).

    Article  CAS  Google Scholar 

  28. Hutter, C. & Zenklusen, J. C. The Cancer Genome Atlas: creating lasting value beyond its data. Cell 173, 283–285 (2018).

    Article  CAS  Google Scholar 

  29. eGTEx Project. Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease. Nat. Genet. 49, 1664–1670 (2017).

  30. Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).

    Article  Google Scholar 

  31. Aran, D., Hu, Z. & Butte, A. J. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome. Biol. 18, 220 (2017).

    Article  Google Scholar 

  32. Singer, B. D. et al. Flow-cytometric method for simultaneous analysis of mouse lung epithelial, endothelial, and hematopoietic lineage cells. Am. J. Physiol. Lung Cell. Mol. Physiol. 310, L796–L801 (2016).

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the European Research Council (637885) (to A.F., N.P.-Y., O.C., D.R., Y.S., and I.G.-V.), and partially supported by the Israeli Centers of Research Excellence (I-CORE) Center No 41/11 (to A.F. and Y.S.), the Broad-Israel Science Foundation (ISF) 1168/14 (to D.R. and N.P.-Y.), ISF 1824/13 (to E.B.), partial fellowships from the Edmond J. Safra Center for Bioinformatics at Tel Aviv University (to A.F. and Y.S.), and a Shulamit Aloni Scholarship (to Y.S.). Research in the Gat-Viks laboratory was supported by ISF 288/16. I.G.-V. is a Faculty Fellow of the Edmond J. Safra Center for Bioinformatics at Tel Aviv University and an Alon Fellow. We thank O. Danziger and H.J. Abu-Toamih Atamni (Tel Aviv University) for help with mouse work, and S. Smith for scientific editing.

Author information

Authors and Affiliations

Authors

Contributions

A.F., N.P.-Y., E.B. and I.G.-V. conceived and designed the study. N.P.-Y., O.C., D.R., L.V., F.I., M.M., L.M., I.A., and E.B. developed experimental protocols and conducted the experiments. A.F. developed computational methods and performed bioinformatics analysis. Y.S. performed bioinformatic analyses. A.F., N.P.-Y., E.B., and I.G.-V. wrote the manuscript with input from all other authors.

Corresponding authors

Correspondence to Eran Bacharach or Irit Gat-Viks.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Exploring the performance of deconvolution using untreated mice.

(a) Validation of synthetic data generation. To validate our simulation strategy, we compared measured bulk expression levels (x axis) and computationally-generated bulk expression levels (y axis) of untreated lungs from a C57BL/6J mouse (each dot is a single gene). Y axis: to synthetically-generate the bulk tissue, we applied eq. 1 (Supplementary Note 1) to calculate a weighted average of nine different cell types, using scRNA-seq profiles derived from the lungs of an uninfected C57BL/6J mouse and their known cell-type annotations16. The weighting is in accordance with the known fractions of the different cell types within the untreated lungs of C57BL/6J mice22,32. X axis: a bulk RNA-seq profile that was measured from the lungs of a naive C57BL/6J mouse ('untreated mice'; Supplementary Table 1). The scatter plot suggests a good match between measured and synthetically-generated bulk gene expression data, supporting the validity of our simulation approach. (b) Performance of deconvolution. We used lung tissues from naive (untreated) mice to demonstrate the robustness of deconvolution in the case of a limited cell-activation heterogeneity. To address this, deconvolution was applied using the following input data: (i) bulk RNA-seq profiles of the lung tissue, generated from five naive CC mice ('untreated mice'; Supplementary Table 1); and (ii) scRNA-seq reference data, derived from an uninfected mouse16. Using this input data, deconvolution was applied to predict cell-type abundance. Shown is a comparison between measured cell-type fractions in the naive lung tissue (data from refs. 22,32; x axis) and deconvolution-predicted cell type quantities (averaged across mice; y axis), inferred by different methods (sub-panels; level of granularity=1 in all cases). Of note, all compared methods manifest a good match between measured and predicted cell-type quantities. This agrees well with the high accuracy of all deconvolution methods in the cell-type simulation (Fig. 2b).

Supplementary Figure 2 Detailed analysis of CPM performance using relative bulk input data.

(a,b) Performance of CPM for cell-subtype simulation (a) and gradual-change simulation (b). Accuracy of inferring the correct cell heterogeneity was calculated using the Pearson correlation coefficient, between predicted and ground-truth cell abundance. The calculated accuracy is plotted against varying data parameters, for alternative deconvolution methods: SVR (Top), DCQ (middle) and Cibersort (bottom). CPM accuracy is color coded in black; other colors represent the number of single-cell groups (the level of granularity) within each cell type. (c,d) Running time. (c) Running time of the CPM algorithm (y axis; seconds) for different simulation types (x axis) across various numbers of deconvolution repeats (color coded). Reported is the running time per sample, averaged across 100 samples. (d) Running time (y axis; seconds) of different deconvolution methods (color coded) for different granularity levels of the reference data (x axis). Shown is running time for the cell-subtype simulation (left) and gradual change simulation (right). Reported is the running time per sample, averaged across 100 input bulk samples. (e) Predictions in the gradual changes simulation. Shown is a comparison between predicted (x axis) and true (y axis) T-cell abundance for a representative synthetic relative bulk profile using four different methods (sub-panels). CPM successfully captures the continuity in the gradual-change simulation. In contrast, the compared methods are limited to one inferred value for each of the four cell groups. The alternative methods were applied with a reference dataset that was generated using granularity of 4 cell groups. (f) Analysis of granularity. Performance of the different deconvolution methods (y axis) across varying granularity of the reference data that is given as input to the alternative methods (x axis), for the cell-type simulation (top), cell-subtype simulation (middle) and gradual-change simulation (bottom). X axis, top: the granularity of the reference data, defined as the numbers of single-cell groups within each cell type (each of these groups is represented with a single reference profile). X axis, bottom: the total number of reference profiles. In the cell-type-level simulation (top), granularity=1 attains the best performance. In the two other simulations (middle and bottom), a higher granularity allows better sensitivity but also leads to scalability issues. Data are mean± stdev over 100 synthetic bulk profiles. Abbreviations: Grn, Granularity. (g) Performance assessment in comparison with the enrichment scheme. Accuracy (y axis) is shown across varying expression noise levels (x axis) for different methods: CPM (black) and an enrichment scheme ('ES'; see implementation in Supplementary Note 1) that was applied using reference data of varying granularity (2 to 20; color coded). Results are shown for fibroblasts (left) and T cells (right), building on a single-cell-type design that was tailored to provide an unbiased comparison with ES. The accuracy was evaluated through both parametric (Pearson, top) and a-parametric (Spearman, bottom) correlation coefficients to avoid bias due to potential non-linear relationships of enrichment scores31. (h) Assessment of SVR performance. Box plots of the accuracy of SVR (y axis) across 100 cell-subtype simulations (top) and 100 gradual-change simulations (bottom) across different ε values (x axis). Each box represents 0.25-0.75 percentiles; whiskers show 95% confidence interval; horizontal lines represent the median. The plots indicate that the selected value of ε does not have a substantial effect on the overall performance.

Supplementary Figure 3 CPM performance assessment for absolute bulk input data.

Plots in a, b and c are depicted as in Fig. 2c,d and Supplementary Fig. 2e, respectively, but for the case of absolute bulk input data.

Supplementary Figure 4 CPM performance assessment using additional parameter settings.

(a-c) Performance across different CPM parameters. Shown is CPM accuracy (y axis) for different numbers of deconvolution repeats (left), cell neighborhood sizes (middle) and reference subset sizes (right) (x axis), for the case of cell subtype simulation (a), and gradual-change simulation (b,c), in which the various cell-state trajectories carry the same directions (b), or mixed directions (c), of gradual alternations. Default parameter settings are marked by arrows. Data are mean ± s.d. over 100 synthetic bulk profiles. The plots suggest that CPM is generally robust to the number of repeats and may gain from an increased cell neighborhood size. In contrast, CPM attains high accuracy only when the reference subset size (Ns parameter) is relatively small. (d) Performance of deconvolution using alternative effect sizes. Shown is the accuracy (y axis) of different deconvolution methods (color-coded) when applied on gradual-change simulation with different levels of effect size (denoted xg in Supplementary Note 1; x axis). Unlike CPM, the alternative deconvolution methods were applied on a reference datasets with a granularity of 4. A constant effect size xg=1 was used as a default value in all other analyses. (e) Performance of deconvolution using four alternative reference-reconstruction methods. Shown is the accuracy of different deconvolution methods when using different reference reconstruction methods (see Supplementary Note 1 for details): DBSCAN with mean-center approach (as devised in previous studies10,11; red), and K-means with mean-centers (purple), median-centers (blue), and harmonic-mean (green). K-means was applied with K=4 (that is, granularity=4) within each cell type. Data are mean ± s.d. over 100 synthetic bulk profiles. Overall, all reference-reconstruction methods attained similar performance. (f,g,h) CPM performance for mixed increase/decrease gradual-change directions. Plots are depicted as in Fig. 2d, but for synthetic data in which gradual alterations in some cell types were reversed. (f) Gradual alterations in two cell types; reversed trajectory in one cell type (Kc=2, Kr=1). (g) Gradual alterations in four cell types; reversed trajectory in two cell types (Kc=4, Kr=2). (h) Gradual alterations in six cell types, reversed trajectory in three cell types (Kc=6, Kr=3). Accuracy (y axis) is shown against different cell space noise (top) or expression noise (bottom) (x axis). The alternative methods were applied with a reference dataset that was generated using granularity of 4 cell groups.

Supplementary Figure 5 The impact of cell-state space on the performance of CPM.

(a) CPM performance for different cell-state-space diagrams. Left: Distributions of the accuracy (y axis) of different deconvolution methods (x axis) across n=100 analyses that differ only in their input cell-state-space diagram. Each of the cell-state-space diagrams was generated by a separate tSNE run that was applied on the reference scRNA-seq data (in all cases dimension reduction was applied on the 101 generic response genes). All alternative methods were applied with granularity=4. Each box represents 0.25-0.75 percentiles; whiskers show 95% confidence interval; horizontal lines represent the median. Right: Exemplified are two cell-state-space diagrams of the same cell type (top: T cells, bottom: fibroblasts) that were generated through separate tSNE runs and were used in this performance analysis. Overall, the high accuracy of CPM remained high when using different tSNE-derived cell-state-space diagrams (left), despite marked differences between different diagrams of the same cell type (right). (b) The impact of the cell density on CPM performance. Shown is the distribution of the accuracy measure (y axis) across different cell-density levels within the cell-state space (ranked binned densities; x axis). Results are shown for different CPM parameters (color coded): the number of deconvolution repeats (left), cell neighborhood sizes (middle) and reference subset sizes (right). n=100 synthetic profiles. Each box represents 0.25-0.75 percentiles; whiskers show 95% confidence interval; horizontal lines represent the median. The plots suggest that the accuracy of CPM is generally independent of the local density across the cell-state space. (c) The impact of extrapolation. Shown is the distribution of the accuracy measure (y axis) across different cell-density levels within the cell-state space (ranked binned densities; x axis). Results are shown for the accuracy of CPM in the presence (light blue) or absence (orange) of extrapolation. n=100 synthetic profiles. Each box represents 0.25-0.75 percentiles; whiskers show 95% confidence interval; horizontal lines represent the median. The plot demonstrates the advantage of extrapolation and that a similar CPM accuracy was attained in regions that differ in their density.

Supplementary Figure 6 The impact of the quality of scRNA-seq data on CPM performance.

Shown is accuracy (y axis) for different parameters of the reference scRNA-seq data: averaged numbers of reads per cell (a), numbers of reference cells (b) and numbers of cell types within the reference dataset (c) (x axis). The different reference datasets were generated by eliminating some cells or reads from the original scRNAseq. Yet, all synthetic bulk data collections were still generated using the original (complete) scRNA-seq data. Results are shown for the case of cell subtype simulation (top) and gradual-change simulation (bottom). Plots are depicted as in Fig. 2c,d. The alternative methods were applied with a reference dataset that was generated using granularity of 4 cell groups.

Supplementary Figure 7 CPM analyses of influenza virus infection.

(a) Demonstration of differences between cell population maps of different influenza virus-infected mice. Shown is the inferred relative abundance of MPS cells (y axis) for each individual reference single cell (x axis), for two representative individuals (#8 and #38, as indicated in Fig. 3a,b). Shown are average and standard deviations of each reference cell over the N bootstrap reference subsets. Reference cells are ranked according to their averaged relative cell abundance. The analysis demonstrates inter-individual variation in cell population maps. (b) Experimental design: analysis of cellular heterogeneity during in-vivo influenza virus infection. Bulk profiles of transcriptional response to influenza virus infection in lungs (2 days p.i., versus PBS-treated) were generated across the panel of 38 CC mice. These profiles were then integrated with reference scRNA-seq data of the same experimental design, to infer the cellular heterogeneity of each bulk sample. CPM utilized the trajectory of cell-activation states as the cell-state space (brownish scale). For each activation cell state, we further calculated the correlation (across individuals) between the measured percentage of body weight loss and the inferred relative cell abundance (for example, for cell state x, cell-to-phenotype correlation is rx). (c) Cell-to-phenotype correlations along the activation trajectory are robust to different cell-state-space solutions. Cell-to-phenotype correlation coefficients for the 38 CC mice (y axis) are presented for the indicated cell types (panels) across cell activation state bins (ranked bins; x axis). CPM analysis was repeated 10 times, each repeat was applied using a different tSNE-generated cell-state-space input. For each cell-state-space input, we first applied CPM and then used the inferred quantities to calculate the cell-to-phenotype correlation coefficients; next, coefficients were averaged over cells from each activation-state bin (denoted ‘averaged coefficients'). Shown are mean and standard deviation of averaged-coefficients across the n=10 repeats. (d,e,f,g) The robustness of variation in cell-to-phenotype correlations along the cell activation trajectory. Cell-to-phenotype Pearson correlation coefficients across the 38 infected CC mice (y axis), averaged by the cells in each activation state bins (x axis), presented for T cells (top, n=378 cells), fibroblasts (middle, n=375 cells) and MPS cells (bottom, n=103 cells). Data are mean ± s.d. over cells. The original analysis (as in Fig. 3c) was modified in several ways: (d) usage of absolute bulk profiles of infected mice (rather than bulk relative profiles); (e) usage of the averaged profile of each recombinant inbred strain (rather than profiles of individual mice); (f) usage of single cell reference data from a replicate mouse; and (g) using single-cell and bulk data from different experimental settings, generated in different labs: scRNA-seq data at 2 days post influneza infection16 and published microarray bulk data at 4 days post influneza infection across the pre-CC mice21. (h) A significant gradual change in cell-to-phenotype correlations over the trajectory of cell activation states. Top: Gradual-change test. Fibroblasts, T cells and MPS demonstrate monotonic (ever-increasing) cell-to-phenotype correlations over the trajectory of cell activation states (red; test statistics values), whereas reshuffling-based data (n = 100 permutations) demonstrate a poor gradual change (black box plots of the test statistics values). P-values were calculated based on comparison between the reshuffling-based and observed test statistics values (a gradual-change test, Supplementary Note 1). Bottom: Stepwise-change test. Fibroblasts, T cells and MPS demonstrate gradual change in cell-to-phenotype correlations over the activation trajectory (red; LR score, a stepwise-change test), whereas reshuffling-based data (n = 100 permutations) demonstrate a poor stepwise change (black box plots of LR scores). P-values were calculated based on comparison between the reshuffling-based and observed LR scores (a stepwise change test, Supplementary Note 1). Each box represents 0.25–0.75 percentiles of reshuffling-based scores; whiskers show 95% confidence interval; horizontal lines represent the median. (i) Analysis of cell-to-phenotype correlations in MPS cells using alternative methods. The entire experimental design was applied as illustrated in plot B for CPM, but using the alternative deconvolution methods. Results are shown as in Fig. 3c but instead of using CPM, were generated with alternative deconvolution methods (DCQ: top, SVR: middle, and Cibersort: bottom) and different granularity levels (granularity = 4,10 and 20 in left, middle and right panels, respectively). Shown are CPM-inferred relative MPS abundance values (y axis), averaged over cells from each activation state bin (x axis). n = 103 MPS cells; error bars, standard deviations. The plots indicate a lack of consistency among different levels of granularity and different methods (in particular, decrease in DCQ with granularity = 4; increase in SVR with granularity = 4,10 and Cibersort with granularity = 10; and an absence of trend when using DCQ with granularity = 10,20, SVR with granularity=20 and Cibersort with granularity=4,20). (j) Analysis of cell-to-phenotype correlations using alternative reference data. The entire experimental design was applied as illustrated in plot B, using two optional reference datasets and their associated cell-state space: (1) a relevant reference that was derived from an influenza virus-infected mouse and its associated trajectory16 (black); and (2) an irrelevant reference data, derived from the lungs of an uninfected C57BL/6J mouse16 together with its trajectory (red). Shown are CPM-inferred relative abundance values (y axis), averaged over cells from each activation state bin (x axis), for T cells (top, n=378) and fibroblasts (bottom, n=375). Error bars, standard deviations. The wider trajectory of the infected mice stems from its broad heterogeneity of activation states. Plots are presented as described in Fig. 3c. The lack of an increasing trend obtained when using an irrelevant reference data highlights the importance of using reference data that can represent the cellular heterogeneity within the bulk tissue under study.

Supplementary Figure 8 Inferring dynamics with a Markov model.

(a) Poor correlations between the percentage of MO/MΦs in the entire tested cell population and the body weight loss. The percentage of body weight loss of infected mouse individuals (y axis) as a function of their percentage of MO/MΦs (including all cell states, as measured by flow cytometry; x axis). Relying on this overall poor correlation, we anticipated that the inter-individual variation in cell-state distribution (Figs. 3c and 4c,e) stems from inter-individual variation in temporal cell-state dynamics. One such model is examined in panels be. (b) A Markov model. The analysis takes as input (i) the absolute cell abundance levels along the activation cell-state trajectory, for each infected mouse; and (ii) absolute cell abundance levels in an uninfected tissue. We assume a model of several discrete cell states over the activation process. Based on a Markovian assumption we calculate (for each mouse) the ‘transition rates' between cell states, as described in the Methods. (c) Demonstration of differences between transition rates of MPS from different mice. Shown is the inferred transition rates (y axis) calculated based on CPM-inferred cell abundance of MPS for each transition between consecutive activation states (x axis), for the three representative individuals that are highlighted in Fig. 3a,b (with the same color coding). Data are mean ± s.e.m. of transition rates over 100 repeats of transition-rate calculations; each such calculation was performed with cell quantities that were randomly sampled from the inferred distribution of cell quantities across the N deconvolution repeats. (d) Experimental design. For each mouse, cell abundance was collected (either using CPM or flow cytometry) along the various cell activation states. For each transition between consecutive activation states, we further calculated the transition rate, for each mouse individual (see panel b). This data allows calculation of regression (across individuals) between the measured clinical outcome and the inferred transition rate, in each state transition along the activation process ('transition-rate-to-phenotype regression'). For instance, for a given cell state x, the transition-rate-to-phenotype regression coefficient is bx. (e,f) Transition-rate-to-phenotype relations. Transition-rate-to-phenotype regression coefficients (y axis) for different transitions along the cell-state trajectory (x axis) of the MPS population. Shown are coefficients that were calculated based on absolute cell abundance levels that were inferred using CPM (e), or measured for the population of MPS using flow cytometry (f). Of note, transition-rate-to-phenotype coefficients based on direct flow cytometry measurements (f) closely matched those of CPM predictions (e). In e, data are mean ± s.e.m. of regression coefficients over 100 repeats of transition-rate calculations that were generated by random sampling.

Supplementary Information

Supplementary Information

Supplementary Figures 1–8, Supplementary Table 1 and Supplementary Note 1

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Frishberg, A., Peshes-Yaloz, N., Cohn, O. et al. Cell composition analysis of bulk genomics using single-cell data. Nat Methods 16, 327–332 (2019). https://doi.org/10.1038/s41592-019-0355-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-019-0355-5

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing