Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Identifying phenotype-associated subpopulations by integrating bulk and single-cell sequencing data

Abstract

Single-cell RNA sequencing (scRNA-seq) distinguishes cell types, states and lineages within the context of heterogeneous tissues. However, current single-cell data cannot directly link cell clusters with specific phenotypes. Here we present Scissor, a method that identifies cell subpopulations from single-cell data that are associated with a given phenotype. Scissor integrates phenotype-associated bulk expression data and single-cell data by first quantifying the similarity between each single cell and each bulk sample. It then optimizes a regression model on the correlation matrix with the sample phenotype to identify relevant subpopulations. Applied to a lung cancer scRNA-seq dataset, Scissor identified subsets of cells associated with worse survival and with TP53 mutations. In melanoma, Scissor discerned a T cell subpopulation with low PDCD1/CTLA4 and high TCF7 expression associated with an immunotherapy response. Beyond cancer, Scissor was effective in interpreting facioscapulohumeral muscular dystrophy and Alzheimer’s disease datasets. Scissor identifies biologically and clinically relevant cell subpopulations from single-cell assays by leveraging phenotype and bulk-omics datasets.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The workflow of Scissor and its performance in applications with known phenotype-associated cell subpopulations.
Fig. 2: Scissor identification results on lung cancer cells guided by TCGA-LUAD survival outcomes.
Fig. 3: Scissor identification results on lung cancer cells guided by TP53 mutation status.
Fig. 4: Scissor identification results on melanoma T cells.
Fig. 5: Scissor identification results on FSHD cells.
Fig. 6: Scissor identification results on AD.

Similar content being viewed by others

Data availability

All datasets analyzed in this study were published previously. The corresponding descriptions and pre-processing steps are described in the Supplementary Materials.

Software availability

The open-source Scissor R package and tutorial are available at GitHub: https://github.com/sunduanchen/Scissor.

References

  1. Zhang, Q. et al. Landscape and dynamics of single immune cells in hepatocellular carcinoma. Cell 179, 829–845 (2019).

    Article  CAS  PubMed  Google Scholar 

  2. Yofe, I., Dahan, R. & Amit, I. Single-cell genomic approaches for developing the next generation of immunotherapies. Nat. Med. 26, 171–177 (2020).

    Article  CAS  PubMed  Google Scholar 

  3. Wagner, J. et al. A single-cell atlas of the tumor and immune ecosystem of human breast cancer. Cell 177, 1330–1345 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Villani, A. C. et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science 356, eaah4573 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Levine, J. H. et al. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Miao, Y. et al. Adaptive immune resistance emerges from tumor-initiating stem cells. Cell 177, 1172–1186 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Azizi, E. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 174, 1293–1308 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Lambrechts, D. et al. Phenotype molding of stromal cells in the lung tumor microenvironment. Nat. Med. 24, 1277–1289 (2018).

    Article  CAS  PubMed  Google Scholar 

  11. Guo, X. et al. Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing. Nat. Med. 24, 978–985 (2018).

    Article  CAS  PubMed  Google Scholar 

  12. Cancer Genome Atlas Research Networket al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

    Article  CAS  PubMed Central  Google Scholar 

  13. Karaayvaz, M. et al. Unravelling subclonal heterogeneity and aggressive disease states in TNBC through single-cell RNA-seq. Nat. Commun. 9, 3588 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).

    Article  CAS  Google Scholar 

  16. Lawson, D. A. et al. Single-cell analysis reveals a stem-cell program in human metastatic breast cancer cells. Nature 526, 131–135 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Brady, S. W. et al. Combating subclonal evolution of resistant cancer phenotypes. Nat. Commun. 8, 1231 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Ryan, H. E. et al. Hypoxia-inducible factor-1α is a positive factor in solid tumor growth. Cancer Res. 60, 4010–4015 (2000).

    CAS  PubMed  Google Scholar 

  19. Gentles, A. J. et al. The prognostic landscape of genes and infiltrating immune cells across human cancers. Nat. Med. 21, 938–945 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Wilson, W. R. & Hay, M. P. Targeting hypoxia in cancer therapy. Nat. Rev. Cancer 11, 393–410 (2011).

    Article  CAS  PubMed  Google Scholar 

  21. Santoro, A. et al. p53 loss in breast cancer leads to Myc activation, increased cell plasticity, and expression of a mitotic signature with prognostic value. Cell Rep 26, 624–638 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Barsotti, A. M. & Prives, C. Pro-proliferative FoxM1 is a target of p53-mediated repression. Oncogene 28, 4295–4305 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Perri, F., Pisconti, S. & Della Vittoria Scarpati, G. P53 mutations and cancer: a tight linkage. Ann. Transl. Med. 4, 522 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Sade-Feldman, M. et al. Resistance to checkpoint blockade therapy through inactivation of antigen presentation. Nat. Commun. 8, 1136 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Robert, C. et al. Pembrolizumab versus Ipilimumab in advanced melanoma. N. Engl. J. Med. 372, 2521–2532 (2015).

    Article  CAS  PubMed  Google Scholar 

  26. Weber, J. S. et al. Nivolumab versus chemotherapy in patients with advanced melanoma who progressed after anti-CTLA-4 treatment (CheckMate 037): a randomised, controlled, open-label, phase 3 trial. Lancet Oncol. 16, 375–384 (2015).

    Article  CAS  PubMed  Google Scholar 

  27. Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Hugo, W. et al. Genomic and transcriptomic features of response to anti-PD-1 therapy in metastatic melanoma. Cell 165, 35–44 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Van Allen, E. M. et al. Genomic correlates of response to CTLA-4 blockade in metastatic melanoma. Science 350, 207–211 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Chen, Z. et al. TCF-1-centered transcriptional network drives an effector versus exhausted CD8 T cell-fate decision. Immunity 51, 840–855 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Siddiqui, I. et al. Intratumoral Tcf1+PD-1+CD8+ T cells with stem-like properties promote tumor control in response to vaccination and checkpoint blockade immunotherapy. Immunity 50, 195–211 (2019).

    Article  CAS  PubMed  Google Scholar 

  32. van den Heuvel, A. et al. Single-cell RNA sequencing in facioscapulohumeral muscular dystrophy disease etiology and development. Hum. Mol. Genet. 28, 1064–1075 (2019).

    Article  CAS  PubMed  Google Scholar 

  33. Cooper, D. & Upadhhyaya, M. Facioscapulohumeral Muscular Dystrophy (FSHD): Clinical Medicine and Molecular Cell Biology (Taylor & Francis, 2004).

  34. Tiwari, A., Pattnaik, N., Mohanty Jaiswal, A. & Dixit, M. Increased FSHD region gene1 expression reduces in vitro cell migration, invasion, and angiogenesis, ex vivo supported by reduced expression in tumors. Biosci. Rep. 37, BSR20171062 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Lassche, S. et al. Sarcomeric dysfunction contributes to muscle weakness in facioscapulohumeral muscular dystrophy. Neurology 80, 733–737 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Banerji, C. R. S. et al. Dynamic transcriptomic analysis reveals suppression of PGC1α/ERRα drives perturbed myogenesis in facioscapulohumeral muscular dystrophy. Hum. Mol. Genet. 28, 1244–1259 (2019).

    Article  CAS  PubMed  Google Scholar 

  37. Grubman, A. et al. A single-cell atlas of entorhinal cortex from individuals with Alzheimer’s disease reveals cell-type-specific gene expression regulation. Nat. Neurosci. 22, 2087–2097 (2019).

    Article  CAS  PubMed  Google Scholar 

  38. Ashraf, G. M. et al. Protein misfolding and aggregation in Alzheimer’s disease and type 2 diabetes mellitus. CNS Neurol. Disord. Drug Targets 13, 1280–1293 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Neef, D. W., Jaeger, A. M. & Thiele, D. J. Heat shock transcription factor 1 as a therapeutic target in neurodegenerative diseases. Nat. Rev. Drug Discov. 10, 930–944 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Yu, S. P., Sensi, S. L., Canzoniero, L. M., Buisson, A. & Choi, D. W. Membrane-delimited modulation of NMDA currents by metabotropic glutamate receptor subtypes 1/5 in cultured mouse cortical neurons. J. Physiol. 499, 721–732 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Prieto, G. A. et al. Pharmacological rescue of long-term potentiation in Alzheimer diseased synapses. J. Neurosci. 37, 1197–1212 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Muramori, F., Kobayashi, K. & Nakamura, I. A quantitative study of neurofibrillary tangles, senile plaques and astrocytes in the hippocampal subdivisions and entorhinal cortex in Alzheimer’s disease, normal controls and non-Alzheimer neuropsychiatric diseases. Psychiatry Clin. Neurosci. 52, 593–599 (1998).

    Article  CAS  PubMed  Google Scholar 

  43. Chatterjee, P. et al. Plasma glial fibrillary acidic protein is elevated in cognitively normal older adults at risk of Alzheimer’s disease. Transl. Psychiatry 11, 27 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Vieira Braga, F. A. et al. A cellular census of human lungs identifies novel cell states in health and in asthma. Nat. Med. 25, 1153–1163 (2019).

    Article  CAS  PubMed  Google Scholar 

  45. Fan, J. et al. Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data. Genome Res. 28, 1217–1227 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. inferCNV of the Trinity CTAT Project. https://github.com/broadinstitute/inferCNV

  47. Kiselev, V. Y., Andrews, T. S. & Hemberg, M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 20, 273–282 (2019).

    Article  CAS  PubMed  Google Scholar 

  48. Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Mulqueen, R. M. et al. Highly scalable generation of DNA methylation profiles in single cells. Nat. Biotechnol. 36, 428–431 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Li, C. & Li, H. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24, 1175–1182 (2008).

    Article  CAS  PubMed  Google Scholar 

  51. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Li, X., Xie, S., Zeng, D. & Wang, Y. Efficient ℓ0-norm feature selection based on augmented and penalized minimization. Stat. Med. 37, 473–486 (2018).

    Article  CAS  PubMed  Google Scholar 

  53. Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).

    Article  CAS  Google Scholar 

  54. Wu, D. & Smyth, G. K. Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Res. 40, e133 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Mognol, G. P. et al. Exhaustion-associated regulatory regions in CD8+ tumor-infiltrating T cells. Proc. Natl Acad. Sci. USA 114, E2776–E2785 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Hanzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 14, 7 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Kwon, A. T., Arenillas, D. J., Worsley Hunt, R. & Wasserman, W. W. oPOSSUM-3: advanced analysis of regulatory motif over-representation across genes or ChIP-Seq datasets. G3 (Bethesda) 2, 987–1002 (2012).

    Article  CAS  Google Scholar 

  59. Lefebvre, C. et al. A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers. Mol. Syst. Biol. 6, 377 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Robertson, A. G. et al. Integrative analysis identifies four molecular and clinical subsets in uveal melanoma. Cancer Cell 32, 204–220 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Alvarez, M. J. et al. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat. Genet. 48, 838–847 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by the following funding: NIH 5K01LM012877 (to Z.X.); NIH 1R21HL145426 (to Z.X.); NIH 1R01CA207377 (to D.Z.Q.); NIH NIGMS MIRA R35GM124704 (to A.C.A.); the Medical Research Foundation of Oregon (to Z.X.); NCI R01 CA251245, P50 CA097186, P50 CA186786, P50 CA186786-07S1 and Department of Defense Impact Award W81XWH-16-1-0597 (to J.J.A.); and NCI R01CA244576 (to A.V.D.). We thank W. Anderson and A. Hill for editing the manuscript. The resources of the Exacloud high-performance computing environment, developed jointly by Oregon Health & Science University (OHSU) and Intel, and the technical support of the OHSU Advanced Computing Center are gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Contributions

D.S. and Z.X. conceived the idea, implemented the algorithm and performed the analyses. D.S., G.X., P.T.S. and Z.X. interpreted the results. A.E.M., L.Y.W, D.Z.Q., P.S., M.D., A.V.D., J.J.A. and A.C.A. provided scientific insights on the applications. Z.X. supervised the study. D.S. and Z.X. wrote the manuscript with feedback from all other authors. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zheng Xia.

Ethics declarations

Competing interests

A.E.M. discloses receipt of a sponsored research agreement from AstraZeneca. A.V.D. reports consultancy from Abbvie, Beigene, Celgene, Curis, Janssen, Karyopharm, Nurix, Seattle Genetics, Teva Oncology and TG Therapeutics; research funding from Aptose Biosciences, Bristol Myers Squibb, Gilead Sciences and Takeda Oncology; and consultancy and research funding from AstraZeneca, Bayer Oncology, Genentech and Verastem Oncology. J.A.A. has received consulting income from Janssen Biotech, Merck Sharp & Dohme and Dendreon and honoraria for speaker’s fees from Astellas. All other authors declare no competing interests.

Additional information

Peer review information Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Dataset descriptions, Supplementary Figs. 1–10 and Supplementary Tables 7–14.

Reporting Summary

Supplementary Table 1

Additional information for the LUAD cancer survival analysis, including differentially expressed genes, signature genes, enriched pathways, motifs and the descriptions of the validation datasets.

Supplementary Table 2

Additional information for the LUAD cancer cell TP53 mutation analysis, including differentially expressed genes, signature genes and enriched pathways.

Supplementary Table 3

Additional information for the melanoma T cell immunotherapy response analysis, including differentially expressed genes, signature genes and enriched pathways.

Supplementary Table 4

Additional information for the FSHD muscle analysis, including differentially expressed genes, signature genes and enriched pathways.

Supplementary Table 5

Additional information for the Alzheimer’s disease analysis, including differentially expressed genes, signature genes and enriched pathways.

Supplementary Table 6

Description of the ten experiments analyzed by Scissor.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, D., Guan, X., Moran, A.E. et al. Identifying phenotype-associated subpopulations by integrating bulk and single-cell sequencing data. Nat Biotechnol 40, 527–538 (2022). https://doi.org/10.1038/s41587-021-01091-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-021-01091-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing