Abstract
Single-cell RNA sequencing (scRNA-seq) distinguishes cell types, states and lineages within the context of heterogeneous tissues. However, current single-cell data cannot directly link cell clusters with specific phenotypes. Here we present Scissor, a method that identifies cell subpopulations from single-cell data that are associated with a given phenotype. Scissor integrates phenotype-associated bulk expression data and single-cell data by first quantifying the similarity between each single cell and each bulk sample. It then optimizes a regression model on the correlation matrix with the sample phenotype to identify relevant subpopulations. Applied to a lung cancer scRNA-seq dataset, Scissor identified subsets of cells associated with worse survival and with TP53 mutations. In melanoma, Scissor discerned a T cell subpopulation with low PDCD1/CTLA4 and high TCF7 expression associated with an immunotherapy response. Beyond cancer, Scissor was effective in interpreting facioscapulohumeral muscular dystrophy and Alzheimer’s disease datasets. Scissor identifies biologically and clinically relevant cell subpopulations from single-cell assays by leveraging phenotype and bulk-omics datasets.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Use of machine learning-based integration to develop a monocyte differentiation-related signature for improving prognosis in patients with sepsis
Molecular Medicine Open Access 20 March 2023
-
X-box binding protein 1 as a key modulator in “healing endothelial cells”, a novel EC phenotype promoting angiogenesis after MCAO
Cellular & Molecular Biology Letters Open Access 08 November 2022
-
Single-cell transcriptomics reveals the role of Macrophage-Naïve CD4 + T cell interaction in the immunosuppressive microenvironment of primary liver carcinoma
Journal of Translational Medicine Open Access 11 October 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout






Data availability
All datasets analyzed in this study were published previously. The corresponding descriptions and pre-processing steps are described in the Supplementary Materials.
Software availability
The open-source Scissor R package and tutorial are available at GitHub: https://github.com/sunduanchen/Scissor.
References
Zhang, Q. et al. Landscape and dynamics of single immune cells in hepatocellular carcinoma. Cell 179, 829–845 (2019).
Yofe, I., Dahan, R. & Amit, I. Single-cell genomic approaches for developing the next generation of immunotherapies. Nat. Med. 26, 171–177 (2020).
Wagner, J. et al. A single-cell atlas of the tumor and immune ecosystem of human breast cancer. Cell 177, 1330–1345 (2019).
Villani, A. C. et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science 356, eaah4573 (2017).
Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014).
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).
Levine, J. H. et al. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015).
Miao, Y. et al. Adaptive immune resistance emerges from tumor-initiating stem cells. Cell 177, 1172–1186 (2019).
Azizi, E. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 174, 1293–1308 (2018).
Lambrechts, D. et al. Phenotype molding of stromal cells in the lung tumor microenvironment. Nat. Med. 24, 1277–1289 (2018).
Guo, X. et al. Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing. Nat. Med. 24, 978–985 (2018).
Cancer Genome Atlas Research Networket al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
Karaayvaz, M. et al. Unravelling subclonal heterogeneity and aggressive disease states in TNBC through single-cell RNA-seq. Nat. Commun. 9, 3588 (2018).
Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).
Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).
Lawson, D. A. et al. Single-cell analysis reveals a stem-cell program in human metastatic breast cancer cells. Nature 526, 131–135 (2015).
Brady, S. W. et al. Combating subclonal evolution of resistant cancer phenotypes. Nat. Commun. 8, 1231 (2017).
Ryan, H. E. et al. Hypoxia-inducible factor-1α is a positive factor in solid tumor growth. Cancer Res. 60, 4010–4015 (2000).
Gentles, A. J. et al. The prognostic landscape of genes and infiltrating immune cells across human cancers. Nat. Med. 21, 938–945 (2015).
Wilson, W. R. & Hay, M. P. Targeting hypoxia in cancer therapy. Nat. Rev. Cancer 11, 393–410 (2011).
Santoro, A. et al. p53 loss in breast cancer leads to Myc activation, increased cell plasticity, and expression of a mitotic signature with prognostic value. Cell Rep 26, 624–638 (2019).
Barsotti, A. M. & Prives, C. Pro-proliferative FoxM1 is a target of p53-mediated repression. Oncogene 28, 4295–4305 (2009).
Perri, F., Pisconti, S. & Della Vittoria Scarpati, G. P53 mutations and cancer: a tight linkage. Ann. Transl. Med. 4, 522 (2016).
Sade-Feldman, M. et al. Resistance to checkpoint blockade therapy through inactivation of antigen presentation. Nat. Commun. 8, 1136 (2017).
Robert, C. et al. Pembrolizumab versus Ipilimumab in advanced melanoma. N. Engl. J. Med. 372, 2521–2532 (2015).
Weber, J. S. et al. Nivolumab versus chemotherapy in patients with advanced melanoma who progressed after anti-CTLA-4 treatment (CheckMate 037): a randomised, controlled, open-label, phase 3 trial. Lancet Oncol. 16, 375–384 (2015).
Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).
Hugo, W. et al. Genomic and transcriptomic features of response to anti-PD-1 therapy in metastatic melanoma. Cell 165, 35–44 (2016).
Van Allen, E. M. et al. Genomic correlates of response to CTLA-4 blockade in metastatic melanoma. Science 350, 207–211 (2015).
Chen, Z. et al. TCF-1-centered transcriptional network drives an effector versus exhausted CD8 T cell-fate decision. Immunity 51, 840–855 (2019).
Siddiqui, I. et al. Intratumoral Tcf1+PD-1+CD8+ T cells with stem-like properties promote tumor control in response to vaccination and checkpoint blockade immunotherapy. Immunity 50, 195–211 (2019).
van den Heuvel, A. et al. Single-cell RNA sequencing in facioscapulohumeral muscular dystrophy disease etiology and development. Hum. Mol. Genet. 28, 1064–1075 (2019).
Cooper, D. & Upadhhyaya, M. Facioscapulohumeral Muscular Dystrophy (FSHD): Clinical Medicine and Molecular Cell Biology (Taylor & Francis, 2004).
Tiwari, A., Pattnaik, N., Mohanty Jaiswal, A. & Dixit, M. Increased FSHD region gene1 expression reduces in vitro cell migration, invasion, and angiogenesis, ex vivo supported by reduced expression in tumors. Biosci. Rep. 37, BSR20171062 (2017).
Lassche, S. et al. Sarcomeric dysfunction contributes to muscle weakness in facioscapulohumeral muscular dystrophy. Neurology 80, 733–737 (2013).
Banerji, C. R. S. et al. Dynamic transcriptomic analysis reveals suppression of PGC1α/ERRα drives perturbed myogenesis in facioscapulohumeral muscular dystrophy. Hum. Mol. Genet. 28, 1244–1259 (2019).
Grubman, A. et al. A single-cell atlas of entorhinal cortex from individuals with Alzheimer’s disease reveals cell-type-specific gene expression regulation. Nat. Neurosci. 22, 2087–2097 (2019).
Ashraf, G. M. et al. Protein misfolding and aggregation in Alzheimer’s disease and type 2 diabetes mellitus. CNS Neurol. Disord. Drug Targets 13, 1280–1293 (2014).
Neef, D. W., Jaeger, A. M. & Thiele, D. J. Heat shock transcription factor 1 as a therapeutic target in neurodegenerative diseases. Nat. Rev. Drug Discov. 10, 930–944 (2011).
Yu, S. P., Sensi, S. L., Canzoniero, L. M., Buisson, A. & Choi, D. W. Membrane-delimited modulation of NMDA currents by metabotropic glutamate receptor subtypes 1/5 in cultured mouse cortical neurons. J. Physiol. 499, 721–732 (1997).
Prieto, G. A. et al. Pharmacological rescue of long-term potentiation in Alzheimer diseased synapses. J. Neurosci. 37, 1197–1212 (2017).
Muramori, F., Kobayashi, K. & Nakamura, I. A quantitative study of neurofibrillary tangles, senile plaques and astrocytes in the hippocampal subdivisions and entorhinal cortex in Alzheimer’s disease, normal controls and non-Alzheimer neuropsychiatric diseases. Psychiatry Clin. Neurosci. 52, 593–599 (1998).
Chatterjee, P. et al. Plasma glial fibrillary acidic protein is elevated in cognitively normal older adults at risk of Alzheimer’s disease. Transl. Psychiatry 11, 27 (2021).
Vieira Braga, F. A. et al. A cellular census of human lungs identifies novel cell states in health and in asthma. Nat. Med. 25, 1153–1163 (2019).
Fan, J. et al. Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data. Genome Res. 28, 1217–1227 (2018).
inferCNV of the Trinity CTAT Project. https://github.com/broadinstitute/inferCNV
Kiselev, V. Y., Andrews, T. S. & Hemberg, M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 20, 273–282 (2019).
Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).
Mulqueen, R. M. et al. Highly scalable generation of DNA methylation profiles in single cells. Nat. Biotechnol. 36, 428–431 (2018).
Li, C. & Li, H. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24, 1175–1182 (2008).
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Li, X., Xie, S., Zeng, D. & Wang, Y. Efficient ℓ0-norm feature selection based on augmented and penalized minimization. Stat. Med. 37, 473–486 (2018).
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).
Wu, D. & Smyth, G. K. Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Res. 40, e133 (2012).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Mognol, G. P. et al. Exhaustion-associated regulatory regions in CD8+ tumor-infiltrating T cells. Proc. Natl Acad. Sci. USA 114, E2776–E2785 (2017).
Hanzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 14, 7 (2013).
Kwon, A. T., Arenillas, D. J., Worsley Hunt, R. & Wasserman, W. W. oPOSSUM-3: advanced analysis of regulatory motif over-representation across genes or ChIP-Seq datasets. G3 (Bethesda) 2, 987–1002 (2012).
Lefebvre, C. et al. A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers. Mol. Syst. Biol. 6, 377 (2010).
Robertson, A. G. et al. Integrative analysis identifies four molecular and clinical subsets in uveal melanoma. Cancer Cell 32, 204–220 (2017).
Alvarez, M. J. et al. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat. Genet. 48, 838–847 (2016).
Acknowledgements
This work was supported by the following funding: NIH 5K01LM012877 (to Z.X.); NIH 1R21HL145426 (to Z.X.); NIH 1R01CA207377 (to D.Z.Q.); NIH NIGMS MIRA R35GM124704 (to A.C.A.); the Medical Research Foundation of Oregon (to Z.X.); NCI R01 CA251245, P50 CA097186, P50 CA186786, P50 CA186786-07S1 and Department of Defense Impact Award W81XWH-16-1-0597 (to J.J.A.); and NCI R01CA244576 (to A.V.D.). We thank W. Anderson and A. Hill for editing the manuscript. The resources of the Exacloud high-performance computing environment, developed jointly by Oregon Health & Science University (OHSU) and Intel, and the technical support of the OHSU Advanced Computing Center are gratefully acknowledged.
Author information
Authors and Affiliations
Contributions
D.S. and Z.X. conceived the idea, implemented the algorithm and performed the analyses. D.S., G.X., P.T.S. and Z.X. interpreted the results. A.E.M., L.Y.W, D.Z.Q., P.S., M.D., A.V.D., J.J.A. and A.C.A. provided scientific insights on the applications. Z.X. supervised the study. D.S. and Z.X. wrote the manuscript with feedback from all other authors. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
A.E.M. discloses receipt of a sponsored research agreement from AstraZeneca. A.V.D. reports consultancy from Abbvie, Beigene, Celgene, Curis, Janssen, Karyopharm, Nurix, Seattle Genetics, Teva Oncology and TG Therapeutics; research funding from Aptose Biosciences, Bristol Myers Squibb, Gilead Sciences and Takeda Oncology; and consultancy and research funding from AstraZeneca, Bayer Oncology, Genentech and Verastem Oncology. J.A.A. has received consulting income from Janssen Biotech, Merck Sharp & Dohme and Dendreon and honoraria for speaker’s fees from Astellas. All other authors declare no competing interests.
Additional information
Peer review information Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Dataset descriptions, Supplementary Figs. 1–10 and Supplementary Tables 7–14.
Supplementary Table 1
Additional information for the LUAD cancer survival analysis, including differentially expressed genes, signature genes, enriched pathways, motifs and the descriptions of the validation datasets.
Supplementary Table 2
Additional information for the LUAD cancer cell TP53 mutation analysis, including differentially expressed genes, signature genes and enriched pathways.
Supplementary Table 3
Additional information for the melanoma T cell immunotherapy response analysis, including differentially expressed genes, signature genes and enriched pathways.
Supplementary Table 4
Additional information for the FSHD muscle analysis, including differentially expressed genes, signature genes and enriched pathways.
Supplementary Table 5
Additional information for the Alzheimer’s disease analysis, including differentially expressed genes, signature genes and enriched pathways.
Supplementary Table 6
Description of the ten experiments analyzed by Scissor.
Rights and permissions
About this article
Cite this article
Sun, D., Guan, X., Moran, A.E. et al. Identifying phenotype-associated subpopulations by integrating bulk and single-cell sequencing data. Nat Biotechnol 40, 527–538 (2022). https://doi.org/10.1038/s41587-021-01091-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41587-021-01091-3
This article is cited by
-
Use of machine learning-based integration to develop a monocyte differentiation-related signature for improving prognosis in patients with sepsis
Molecular Medicine (2023)
-
Single-cell Sequence Analysis Combined with Multiple Machine Learning to Identify Markers in Sepsis Patients: LILRA5
Inflammation (2023)
-
Single-cell RNA sequencing to identify cellular heterogeneity and targets in cardiovascular diseases: from bench to bedside
Basic Research in Cardiology (2023)
-
X-box binding protein 1 as a key modulator in “healing endothelial cells”, a novel EC phenotype promoting angiogenesis after MCAO
Cellular & Molecular Biology Letters (2022)
-
Single-cell transcriptomics reveals the role of Macrophage-Naïve CD4 + T cell interaction in the immunosuppressive microenvironment of primary liver carcinoma
Journal of Translational Medicine (2022)