Cell composition analysis of bulk genomics using single-cell data

Frishberg, Amit; Peshes-Yaloz, Naama; Cohn, Ofir; Rosentul, Diana; Steuerman, Yael; Valadarsky, Liran; Yankovitz, Gal; Mandelboim, Michal; Iraqi, Fuad A.; Amit, Ido; Mayo, Lior; Bacharach, Eran; Gat-Viks, Irit

doi:10.1038/s41592-019-0355-5

Article
Published: 18 March 2019

Cell composition analysis of bulk genomics using single-cell data

Amit Frishberg¹^na1,
Naama Peshes-Yaloz¹^na1,
Ofir Cohn¹,
Diana Rosentul¹,
Yael Steuerman¹,
Liran Valadarsky²,
Gal Yankovitz¹,
Michal Mandelboim^3,4,
Fuad A. Iraqi⁵,
Ido Amit ORCID: orcid.org/0000-0003-2968-877X²,
Lior Mayo^1,6,
Eran Bacharach ORCID: orcid.org/0000-0001-8341-2833¹^na2 &
…
Irit Gat-Viks ORCID: orcid.org/0000-0002-5431-6444¹^na2

Nature Methods volume 16, pages 327–332 (2019)Cite this article

15k Accesses
67 Citations
72 Altmetric
Metrics details

Subjects

Abstract

Single-cell RNA sequencing (scRNA-seq) is a rich resource of cellular heterogeneity, opening new avenues in the study of complex tissues. We introduce Cell Population Mapping (CPM), a deconvolution algorithm in which reference scRNA-seq profiles are leveraged to infer the composition of cell types and states from bulk transcriptome data (‘scBio’ CRAN R-package). Analysis of individual variations in lungs of influenza-virus-infected mice reveals that the relationship between cell abundance and clinical symptoms is a cell-state-specific property that varies gradually along the continuum of cell-activation states. The gradual change is confirmed in subsequent experiments and is further explained by a mathematical model in which clinical outcomes relate to cell-state dynamics along the activation process. Our results demonstrate the power of CPM in reconstructing the continuous spectrum of cell states within heterogeneous tissues.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of the CPM algorithm.**

**Fig. 2: Performance assessed via synthetic data.**

**Fig. 3: Cellular heterogeneity during in vivo influenza virus infection, reconstructed by CPM.**

**Fig. 4: Confirmation of gradual changes in relationships of cells to physiology over a trajectory of cell-activation states.**

Interpretation of T cell states from single-cell transcriptomics data using reference atlases

Article Open access 20 May 2021

Fast, sensitive and accurate integration of single-cell data with Harmony

Article 18 November 2019

Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq

Article 27 July 2020

Code availability

CPM is implemented in the ‘scBio’ CRAN R package (the CPM function), available at https://cran.r-project.org/web/packages/scBio/index.html.

Data availability

All RNA-seq data that support the findings of this study have been deposited in the Gene Expression Omnibus (GEO) database under accession numbers GSE113530 and GSE117975.

References

Wagner, A., Regev, A. & Yosef, N. Revealing the vectors of cellular identity with single-cell genomics. Nat. Biotechnol. 34, 1145–1160 (2016).
Article CAS Google Scholar
Chen, X., Teichmann, S. A. & Meyer, K. B. From tissues to cell types and back: single-cell gene expression analysis of tissue architecture. Annual Review of Biomedical Data Science 1, 29–51 (2018).
Article Google Scholar
Krieg, C. et al. High-dimensional single-cell analysis predicts response to anti-PD-1immunotherapy. Nat. Med. 24, 144–153 (2018).
Article CAS Google Scholar
Shalek, A. K. & Benson, M. Single-cell analyses to tailor treatments. Sci. Transl. Med. 9, eaan4730 (2017).
Article Google Scholar
Kim, K.-T. et al. Application of single-cell RNA sequencing in optimizing a combinatorial therapeutic strategy in metastatic renal cell carcinoma. Genome. Biol. 17, 80 (2016).
Article Google Scholar
Shen-Orr, S. S. & Gaujoux, R. Computational deconvolution: extracting cell type-specific information from heterogeneous samples. Curr. Opin. Immunol. 25, 571–578 (2013).
Article CAS Google Scholar
Baron, M. et al. A single-cell transcriptomic map of the human and mouse Pancreas reveals inter- and intra-cell population structure. Cell Syst. 3, 346–360 (2016).
Article CAS Google Scholar
Frishberg, A., Brodt, A., Steuerman, Y. & Gat-Viks, I. ImmQuant: a user-friendly tool for inferring immune cell-type composition from gene-expression data. Bioinformatics 32, 3842–3843 (2016).
Article CAS Google Scholar
Avila Cobos, F., Vandesompele, J., Mestdagh, P. & De Preter, K. Computational deconvolution of transcriptomics data from mixed cell populations. Bioinformatics 34, 1969–1979 (2018).
Article Google Scholar
Puram, S. V. et al. Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck. Cancer Cell 171, 1611–1624 (2017).
CAS Google Scholar
Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).
Article CAS Google Scholar
Schelker, M. et al. Estimation of immune cell content in tumour tissue using single-cell RNA-seq data. Nat. Commun. 8, 2032 (2017).
Article Google Scholar
Trapnell, C. Defining cell types and states with single-cell genomics. Genome Res. 25, 1491–1498 (2015).
Article CAS Google Scholar
Rostom, R., Svensson, V., Teichmann, S. A. & Kar, G. Computational approaches for interpreting scRNA-seq data. FEBS Lett. 591, 2213–2225 (2017).
Article CAS Google Scholar
Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13, 599–604 (2018).
Article CAS Google Scholar
Steuerman, Y. et al. Dissection of influenza infection in vivo by single-cell RNA sequencing. Cell Syst. 6, 679–691.e4 (2018).
Article CAS Google Scholar
Altboum, Z. et al. Digital cell quantification identifies global immune cell dynamics during influenza infection. Mol. Syst. Biol. 10, 720 (2014).
Article Google Scholar
Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457 (2015).
Article CAS Google Scholar
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R. & Lin, C.-J. LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008).
Google Scholar
Welsh, C. E. et al. Status and access to the collaborative cross population. Mamm. Genome 23, 706–712 (2012).
Article Google Scholar
Bottomly, D. et al. Expression quantitative trait loci for extreme host response to influenza a in pre-collaborative cross mice. G3 (Bethesda) 2, 213–221 (2012).
Article Google Scholar
Yu, Y.-R. A. et al. A protocol for the comprehensive flow cytometric analysis of immune cells in normal and inflamed murine non-lymphoid tissues. PLoS ONE 11, e0150606 (2016).
Article Google Scholar
Ferris, M. T. et al. Modeling host genetic regulation of influenza pathogenesis in the collaborative cross. PLoS Pathog. 9, e1003196 (2013).
Article Google Scholar
Dengler, L. et al. Cellular changes in blood indicate severe respiratory disease during influenza infections in mice. PLoS ONE 9, e103149 (2014).
Article Google Scholar
Coates, B. M. et al. Inflammatory monocytes drive influenza a virus-mediated lung injury in juvenile mice. J. Immunol. 200, 2391–2404 (2018).
Article CAS Google Scholar
Tanay, A. & Regev, A. Scaling single-cell genomics from phenomenology to mechanism. Nature 541, 331–338 (2017).
Article CAS Google Scholar
Shen-Orr, S. S. et al. Cell type–specific gene expression differences in complex tissues. Nat. Methods 7, 287–289 (2010).
Article CAS Google Scholar
Hutter, C. & Zenklusen, J. C. The Cancer Genome Atlas: creating lasting value beyond its data. Cell 173, 283–285 (2018).
Article CAS Google Scholar
eGTEx Project. Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease. Nat. Genet. 49, 1664–1670 (2017).
Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).
Article Google Scholar
Aran, D., Hu, Z. & Butte, A. J. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome. Biol. 18, 220 (2017).
Article Google Scholar
Singer, B. D. et al. Flow-cytometric method for simultaneous analysis of mouse lung epithelial, endothelial, and hematopoietic lineage cells. Am. J. Physiol. Lung Cell. Mol. Physiol. 310, L796–L801 (2016).
Article Google Scholar

Download references

Acknowledgements

This work was supported by the European Research Council (637885) (to A.F., N.P.-Y., O.C., D.R., Y.S., and I.G.-V.), and partially supported by the Israeli Centers of Research Excellence (I-CORE) Center No 41/11 (to A.F. and Y.S.), the Broad-Israel Science Foundation (ISF) 1168/14 (to D.R. and N.P.-Y.), ISF 1824/13 (to E.B.), partial fellowships from the Edmond J. Safra Center for Bioinformatics at Tel Aviv University (to A.F. and Y.S.), and a Shulamit Aloni Scholarship (to Y.S.). Research in the Gat-Viks laboratory was supported by ISF 288/16. I.G.-V. is a Faculty Fellow of the Edmond J. Safra Center for Bioinformatics at Tel Aviv University and an Alon Fellow. We thank O. Danziger and H.J. Abu-Toamih Atamni (Tel Aviv University) for help with mouse work, and S. Smith for scientific editing.

Author information

These authors contributed equally: Amit Frishberg, Naama Peshes-Yaloz.
These authors jointly supervised this work: Eran Bacharach, Irit Gat-Viks.

Authors and Affiliations

School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
Amit Frishberg, Naama Peshes-Yaloz, Ofir Cohn, Diana Rosentul, Yael Steuerman, Gal Yankovitz, Lior Mayo, Eran Bacharach & Irit Gat-Viks
Department of Immunology, The Weizmann Institute of Science, Rehovot, Israel
Liran Valadarsky & Ido Amit
National Center for Influenza and Respiratory Viruses, Central Virology Laboratory, Sheba Medical Center at Tel HaShomer, Ramat-Gan, Israel
Michal Mandelboim
Department of Epidemiology and Preventive Medicine, School of Public Health, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Michal Mandelboim
Department of Clinical Microbiology and Immunology, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Fuad A. Iraqi
Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
Lior Mayo

Authors

Amit Frishberg
View author publications
You can also search for this author in PubMed Google Scholar
Naama Peshes-Yaloz
View author publications
You can also search for this author in PubMed Google Scholar
Ofir Cohn
View author publications
You can also search for this author in PubMed Google Scholar
Diana Rosentul
View author publications
You can also search for this author in PubMed Google Scholar
Yael Steuerman
View author publications
You can also search for this author in PubMed Google Scholar
Liran Valadarsky
View author publications
You can also search for this author in PubMed Google Scholar
Gal Yankovitz
View author publications
You can also search for this author in PubMed Google Scholar
Michal Mandelboim
View author publications
You can also search for this author in PubMed Google Scholar
Fuad A. Iraqi
View author publications
You can also search for this author in PubMed Google Scholar
Ido Amit
View author publications
You can also search for this author in PubMed Google Scholar
Lior Mayo
View author publications
You can also search for this author in PubMed Google Scholar
Eran Bacharach
View author publications
You can also search for this author in PubMed Google Scholar
Irit Gat-Viks
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.F., N.P.-Y., E.B. and I.G.-V. conceived and designed the study. N.P.-Y., O.C., D.R., L.V., F.I., M.M., L.M., I.A., and E.B. developed experimental protocols and conducted the experiments. A.F. developed computational methods and performed bioinformatics analysis. Y.S. performed bioinformatic analyses. A.F., N.P.-Y., E.B., and I.G.-V. wrote the manuscript with input from all other authors.

Corresponding authors

Correspondence to Eran Bacharach or Irit Gat-Viks.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Exploring the performance of deconvolution using untreated mice.

(a) Validation of synthetic data generation. To validate our simulation strategy, we compared measured bulk expression levels (x axis) and computationally-generated bulk expression levels (y axis) of untreated lungs from a C57BL/6J mouse (each dot is a single gene). Y axis: to synthetically-generate the bulk tissue, we applied eq. 1 (Supplementary Note 1) to calculate a weighted average of nine different cell types, using scRNA-seq profiles derived from the lungs of an uninfected C57BL/6J mouse and their known cell-type annotations¹⁶. The weighting is in accordance with the known fractions of the different cell types within the untreated lungs of C57BL/6J mice^22,32. X axis: a bulk RNA-seq profile that was measured from the lungs of a naive C57BL/6J mouse ('untreated mice'; Supplementary Table 1). The scatter plot suggests a good match between measured and synthetically-generated bulk gene expression data, supporting the validity of our simulation approach. (b) Performance of deconvolution. We used lung tissues from naive (untreated) mice to demonstrate the robustness of deconvolution in the case of a limited cell-activation heterogeneity. To address this, deconvolution was applied using the following input data: (i) bulk RNA-seq profiles of the lung tissue, generated from five naive CC mice ('untreated mice'; Supplementary Table 1); and (ii) scRNA-seq reference data, derived from an uninfected mouse¹⁶. Using this input data, deconvolution was applied to predict cell-type abundance. Shown is a comparison between measured cell-type fractions in the naive lung tissue (data from refs. ^22,32; x axis) and deconvolution-predicted cell type quantities (averaged across mice; y axis), inferred by different methods (sub-panels; level of granularity=1 in all cases). Of note, all compared methods manifest a good match between measured and predicted cell-type quantities. This agrees well with the high accuracy of all deconvolution methods in the cell-type simulation (Fig. 2b).

Supplementary Figure 2 Detailed analysis of CPM performance using relative bulk input data.

(a,b) Performance of CPM for cell-subtype simulation (a) and gradual-change simulation (b). Accuracy of inferring the correct cell heterogeneity was calculated using the Pearson correlation coefficient, between predicted and ground-truth cell abundance. The calculated accuracy is plotted against varying data parameters, for alternative deconvolution methods: SVR (Top), DCQ (middle) and Cibersort (bottom). CPM accuracy is color coded in black; other colors represent the number of single-cell groups (the level of granularity) within each cell type. (c,d) Running time. (c) Running time of the CPM algorithm (y axis; seconds) for different simulation types (x axis) across various numbers of deconvolution repeats (color coded). Reported is the running time per sample, averaged across 100 samples. (d) Running time (y axis; seconds) of different deconvolution methods (color coded) for different granularity levels of the reference data (x axis). Shown is running time for the cell-subtype simulation (left) and gradual change simulation (right). Reported is the running time per sample, averaged across 100 input bulk samples. (e) Predictions in the gradual changes simulation. Shown is a comparison between predicted (x axis) and true (y axis) T-cell abundance for a representative synthetic relative bulk profile using four different methods (sub-panels). CPM successfully captures the continuity in the gradual-change simulation. In contrast, the compared methods are limited to one inferred value for each of the four cell groups. The alternative methods were applied with a reference dataset that was generated using granularity of 4 cell groups. (f) Analysis of granularity. Performance of the different deconvolution methods (y axis) across varying granularity of the reference data that is given as input to the alternative methods (x axis), for the cell-type simulation (top), cell-subtype simulation (middle) and gradual-change simulation (bottom). X axis, top: the granularity of the reference data, defined as the numbers of single-cell groups within each cell type (each of these groups is represented with a single reference profile). X axis, bottom: the total number of reference profiles. In the cell-type-level simulation (top), granularity=1 attains the best performance. In the two other simulations (middle and bottom), a higher granularity allows better sensitivity but also leads to scalability issues. Data are mean± stdev over 100 synthetic bulk profiles. Abbreviations: Grn, Granularity. (g) Performance assessment in comparison with the enrichment scheme. Accuracy (y axis) is shown across varying expression noise levels (x axis) for different methods: CPM (black) and an enrichment scheme ('ES'; see implementation in Supplementary Note 1) that was applied using reference data of varying granularity (2 to 20; color coded). Results are shown for fibroblasts (left) and T cells (right), building on a single-cell-type design that was tailored to provide an unbiased comparison with ES. The accuracy was evaluated through both parametric (Pearson, top) and a-parametric (Spearman, bottom) correlation coefficients to avoid bias due to potential non-linear relationships of enrichment scores³¹. (h) Assessment of SVR performance. Box plots of the accuracy of SVR (y axis) across 100 cell-subtype simulations (top) and 100 gradual-change simulations (bottom) across different ε values (x axis). Each box represents 0.25-0.75 percentiles; whiskers show 95% confidence interval; horizontal lines represent the median. The plots indicate that the selected value of ε does not have a substantial effect on the overall performance.

Supplementary Figure 3 CPM performance assessment for absolute bulk input data.

Plots in a, b and c are depicted as in Fig. 2c,d and Supplementary Fig. 2e, respectively, but for the case of absolute bulk input data.

Supplementary Figure 4 CPM performance assessment using additional parameter settings.

(a-c) Performance across different CPM parameters. Shown is CPM accuracy (y axis) for different numbers of deconvolution repeats (left), cell neighborhood sizes (middle) and reference subset sizes (right) (x axis), for the case of cell subtype simulation (a), and gradual-change simulation (b,c), in which the various cell-state trajectories carry the same directions (b), or mixed directions (c), of gradual alternations. Default parameter settings are marked by arrows. Data are mean ± s.d. over 100 synthetic bulk profiles. The plots suggest that CPM is generally robust to the number of repeats and may gain from an increased cell neighborhood size. In contrast, CPM attains high accuracy only when the reference subset size (Ns parameter) is relatively small. (d) Performance of deconvolution using alternative effect sizes. Shown is the accuracy (y axis) of different deconvolution methods (color-coded) when applied on gradual-change simulation with different levels of effect size (denoted x_g in Supplementary Note 1; x axis). Unlike CPM, the alternative deconvolution methods were applied on a reference datasets with a granularity of 4. A constant effect size x_g=1 was used as a default value in all other analyses. (e) Performance of deconvolution using four alternative reference-reconstruction methods. Shown is the accuracy of different deconvolution methods when using different reference reconstruction methods (see Supplementary Note 1 for details): DBSCAN with mean-center approach (as devised in previous studies^10,11; red), and K-means with mean-centers (purple), median-centers (blue), and harmonic-mean (green). K-means was applied with K=4 (that is, granularity=4) within each cell type. Data are mean ± s.d. over 100 synthetic bulk profiles. Overall, all reference-reconstruction methods attained similar performance. (f,g,h) CPM performance for mixed increase/decrease gradual-change directions. Plots are depicted as in Fig. 2d, but for synthetic data in which gradual alterations in some cell types were reversed. (f) Gradual alterations in two cell types; reversed trajectory in one cell type (Kc=2, Kr=1). (g) Gradual alterations in four cell types; reversed trajectory in two cell types (Kc=4, Kr=2). (h) Gradual alterations in six cell types, reversed trajectory in three cell types (Kc=6, Kr=3). Accuracy (y axis) is shown against different cell space noise (top) or expression noise (bottom) (x axis). The alternative methods were applied with a reference dataset that was generated using granularity of 4 cell groups.

Supplementary Figure 5 The impact of cell-state space on the performance of CPM.

(a) CPM performance for different cell-state-space diagrams. Left: Distributions of the accuracy (y axis) of different deconvolution methods (x axis) across n=100 analyses that differ only in their input cell-state-space diagram. Each of the cell-state-space diagrams was generated by a separate tSNE run that was applied on the reference scRNA-seq data (in all cases dimension reduction was applied on the 101 generic response genes). All alternative methods were applied with granularity=4. Each box represents 0.25-0.75 percentiles; whiskers show 95% confidence interval; horizontal lines represent the median. Right: Exemplified are two cell-state-space diagrams of the same cell type (top: T cells, bottom: fibroblasts) that were generated through separate tSNE runs and were used in this performance analysis. Overall, the high accuracy of CPM remained high when using different tSNE-derived cell-state-space diagrams (left), despite marked differences between different diagrams of the same cell type (right). (b) The impact of the cell density on CPM performance. Shown is the distribution of the accuracy measure (y axis) across different cell-density levels within the cell-state space (ranked binned densities; x axis). Results are shown for different CPM parameters (color coded): the number of deconvolution repeats (left), cell neighborhood sizes (middle) and reference subset sizes (right). n=100 synthetic profiles. Each box represents 0.25-0.75 percentiles; whiskers show 95% confidence interval; horizontal lines represent the median. The plots suggest that the accuracy of CPM is generally independent of the local density across the cell-state space. (c) The impact of extrapolation. Shown is the distribution of the accuracy measure (y axis) across different cell-density levels within the cell-state space (ranked binned densities; x axis). Results are shown for the accuracy of CPM in the presence (light blue) or absence (orange) of extrapolation. n=100 synthetic profiles. Each box represents 0.25-0.75 percentiles; whiskers show 95% confidence interval; horizontal lines represent the median. The plot demonstrates the advantage of extrapolation and that a similar CPM accuracy was attained in regions that differ in their density.

Supplementary Figure 6 The impact of the quality of scRNA-seq data on CPM performance.

Shown is accuracy (y axis) for different parameters of the reference scRNA-seq data: averaged numbers of reads per cell (a), numbers of reference cells (b) and numbers of cell types within the reference dataset (c) (x axis). The different reference datasets were generated by eliminating some cells or reads from the original scRNAseq. Yet, all synthetic bulk data collections were still generated using the original (complete) scRNA-seq data. Results are shown for the case of cell subtype simulation (top) and gradual-change simulation (bottom). Plots are depicted as in Fig. 2c,d. The alternative methods were applied with a reference dataset that was generated using granularity of 4 cell groups.

Supplementary Figure 7 CPM analyses of influenza virus infection.

(a) Demonstration of differences between cell population maps of different influenza virus-infected mice. Shown is the inferred relative abundance of MPS cells (y axis) for each individual reference single cell (x axis), for two representative individuals (#8 and #38, as indicated in Fig. 3a,b). Shown are average and standard deviations of each reference cell over the N bootstrap reference subsets. Reference cells are ranked according to their averaged relative cell abundance. The analysis demonstrates inter-individual variation in cell population maps. (b) Experimental design: analysis of cellular heterogeneity during in-vivo influenza virus infection. Bulk profiles of transcriptional response to influenza virus infection in lungs (2 days p.i., versus PBS-treated) were generated across the panel of 38 CC mice. These profiles were then integrated with reference scRNA-seq data of the same experimental design, to infer the cellular heterogeneity of each bulk sample. CPM utilized the trajectory of cell-activation states as the cell-state space (brownish scale). For each activation cell state, we further calculated the correlation (across individuals) between the measured percentage of body weight loss and the inferred relative cell abundance (for example, for cell state x, cell-to-phenotype correlation is r_x). (c) Cell-to-phenotype correlations along the activation trajectory are robust to different cell-state-space solutions. Cell-to-phenotype correlation coefficients for the 38 CC mice (y axis) are presented for the indicated cell types (panels) across cell activation state bins (ranked bins; x axis). CPM analysis was repeated 10 times, each repeat was applied using a different tSNE-generated cell-state-space input. For each cell-state-space input, we first applied CPM and then used the inferred quantities to calculate the cell-to-phenotype correlation coefficients; next, coefficients were averaged over cells from each activation-state bin (denoted ‘averaged coefficients'). Shown are mean and standard deviation of averaged-coefficients across the n=10 repeats. (d,e,f,g) The robustness of variation in cell-to-phenotype correlations along the cell activation trajectory. Cell-to-phenotype Pearson correlation coefficients across the 38 infected CC mice (y axis), averaged by the cells in each activation state bins (x axis), presented for T cells (top, n=378 cells), fibroblasts (middle, n=375 cells) and MPS cells (bottom, n=103 cells). Data are mean ± s.d. over cells. The original analysis (as in Fig. 3c) was modified in several ways: (d) usage of absolute bulk profiles of infected mice (rather than bulk relative profiles); (e) usage of the averaged profile of each recombinant inbred strain (rather than profiles of individual mice); (f) usage of single cell reference data from a replicate mouse; and (g) using single-cell and bulk data from different experimental settings, generated in different labs: scRNA-seq data at 2 days post influneza infection¹⁶ and published microarray bulk data at 4 days post influneza infection across the pre-CC mice²¹. (h) A significant gradual change in cell-to-phenotype correlations over the trajectory of cell activation states. Top: Gradual-change test. Fibroblasts, T cells and MPS demonstrate monotonic (ever-increasing) cell-to-phenotype correlations over the trajectory of cell activation states (red; test statistics values), whereas reshuffling-based data (n = 100 permutations) demonstrate a poor gradual change (black box plots of the test statistics values). P-values were calculated based on comparison between the reshuffling-based and observed test statistics values (a gradual-change test, Supplementary Note 1). Bottom: Stepwise-change test. Fibroblasts, T cells and MPS demonstrate gradual change in cell-to-phenotype correlations over the activation trajectory (red; LR score, a stepwise-change test), whereas reshuffling-based data (n = 100 permutations) demonstrate a poor stepwise change (black box plots of LR scores). P-values were calculated based on comparison between the reshuffling-based and observed LR scores (a stepwise change test, Supplementary Note 1). Each box represents 0.25–0.75 percentiles of reshuffling-based scores; whiskers show 95% confidence interval; horizontal lines represent the median. (i) Analysis of cell-to-phenotype correlations in MPS cells using alternative methods. The entire experimental design was applied as illustrated in plot B for CPM, but using the alternative deconvolution methods. Results are shown as in Fig. 3c but instead of using CPM, were generated with alternative deconvolution methods (DCQ: top, SVR: middle, and Cibersort: bottom) and different granularity levels (granularity = 4,10 and 20 in left, middle and right panels, respectively). Shown are CPM-inferred relative MPS abundance values (y axis), averaged over cells from each activation state bin (x axis). n = 103 MPS cells; error bars, standard deviations. The plots indicate a lack of consistency among different levels of granularity and different methods (in particular, decrease in DCQ with granularity = 4; increase in SVR with granularity = 4,10 and Cibersort with granularity = 10; and an absence of trend when using DCQ with granularity = 10,20, SVR with granularity=20 and Cibersort with granularity=4,20). (j) Analysis of cell-to-phenotype correlations using alternative reference data. The entire experimental design was applied as illustrated in plot B, using two optional reference datasets and their associated cell-state space: (1) a relevant reference that was derived from an influenza virus-infected mouse and its associated trajectory¹⁶ (black); and (2) an irrelevant reference data, derived from the lungs of an uninfected C57BL/6J mouse¹⁶ together with its trajectory (red). Shown are CPM-inferred relative abundance values (y axis), averaged over cells from each activation state bin (x axis), for T cells (top, n=378) and fibroblasts (bottom, n=375). Error bars, standard deviations. The wider trajectory of the infected mice stems from its broad heterogeneity of activation states. Plots are presented as described in Fig. 3c. The lack of an increasing trend obtained when using an irrelevant reference data highlights the importance of using reference data that can represent the cellular heterogeneity within the bulk tissue under study.

Supplementary Figure 8 Inferring dynamics with a Markov model.

(a) Poor correlations between the percentage of MO/MΦs in the entire tested cell population and the body weight loss. The percentage of body weight loss of infected mouse individuals (y axis) as a function of their percentage of MO/MΦs (including all cell states, as measured by flow cytometry; x axis). Relying on this overall poor correlation, we anticipated that the inter-individual variation in cell-state distribution (Figs. 3c and 4c,e) stems from inter-individual variation in temporal cell-state dynamics. One such model is examined in panels b–e. (b) A Markov model. The analysis takes as input (i) the absolute cell abundance levels along the activation cell-state trajectory, for each infected mouse; and (ii) absolute cell abundance levels in an uninfected tissue. We assume a model of several discrete cell states over the activation process. Based on a Markovian assumption we calculate (for each mouse) the ‘transition rates' between cell states, as described in the Methods. (c) Demonstration of differences between transition rates of MPS from different mice. Shown is the inferred transition rates (y axis) calculated based on CPM-inferred cell abundance of MPS for each transition between consecutive activation states (x axis), for the three representative individuals that are highlighted in Fig. 3a,b (with the same color coding). Data are mean ± s.e.m. of transition rates over 100 repeats of transition-rate calculations; each such calculation was performed with cell quantities that were randomly sampled from the inferred distribution of cell quantities across the N deconvolution repeats. (d) Experimental design. For each mouse, cell abundance was collected (either using CPM or flow cytometry) along the various cell activation states. For each transition between consecutive activation states, we further calculated the transition rate, for each mouse individual (see panel b). This data allows calculation of regression (across individuals) between the measured clinical outcome and the inferred transition rate, in each state transition along the activation process ('transition-rate-to-phenotype regression'). For instance, for a given cell state x, the transition-rate-to-phenotype regression coefficient is b_x. (e,f) Transition-rate-to-phenotype relations. Transition-rate-to-phenotype regression coefficients (y axis) for different transitions along the cell-state trajectory (x axis) of the MPS population. Shown are coefficients that were calculated based on absolute cell abundance levels that were inferred using CPM (e), or measured for the population of MPS using flow cytometry (f). Of note, transition-rate-to-phenotype coefficients based on direct flow cytometry measurements (f) closely matched those of CPM predictions (e). In e, data are mean ± s.e.m. of regression coefficients over 100 repeats of transition-rate calculations that were generated by random sampling.

Supplementary Information

Supplementary Figures 1–8, Supplementary Table 1 and Supplementary Note 1

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Frishberg, A., Peshes-Yaloz, N., Cohn, O. et al. Cell composition analysis of bulk genomics using single-cell data. Nat Methods 16, 327–332 (2019). https://doi.org/10.1038/s41592-019-0355-5

Download citation

Received: 10 June 2018
Accepted: 12 February 2019
Published: 18 March 2019
Issue Date: April 2019
DOI: https://doi.org/10.1038/s41592-019-0355-5

This article is cited by

Performance of tumour microenvironment deconvolution methods in breast cancer using single-cell simulated bulk mixtures
- Khoa A. Tran
- Venkateswar Addala
- Nicola Waddell
Nature Communications (2023)
Peripheral blood cellular dynamics of rheumatoid arthritis treatment informs about efficacy of response to disease modifying drugs
- Åsa K. Hedman
- Eitan Winter
- Scott A. Jelinsky
Scientific Reports (2023)
Whole genome deconvolution unveils Alzheimer’s resilient epigenetic signature
- Eloise Berson
- Anjali Sreenivas
- Thomas J. Montine
Nature Communications (2023)
Neurite outgrowth deficits caused by rare PLXNB1 mutation in pediatric bipolar disorder
- Guang Yang
- H. M. Arif Ullah
- Alex Shcheglovitov
Molecular Psychiatry (2023)
Mixed model-based deconvolution of cell-state abundances (MeDuSA) along a one-dimensional trajectory
- Liyang Song
- Xiwei Sun
- Jian Yang
Nature Computational Science (2023)