Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Reusability report: Leveraging supervised learning to uncover phenotype-relevant biology from single-cell RNA sequencing data

Abstract

Recent advances in single-cell transcriptome sequencing and computational analysis methods have improved our understanding of cellular heterogeneity. However, associating different cell subsets with phenotypes remains challenging. Recently, Ren et al. introduced PENCIL, a supervised learning framework incorporating gene selection to discern phenotype-relevant cells. To assess PENCIL’s reproducibility and transferability, we conducted a comprehensive evaluation across 12 single-cell RNA sequencing datasets representing four distinct phenotypes. We identified a few caveats with the original version of PENCIL, such as sensitivity to input perturbation, the correction of which contributed to PENCIL’s enhanced reproducibility. We highlight that boosting PENCIL’s cell subsets identification with gene set variation analysis creates a cytotoxic T cell immunotherapy response signature (CyTIR) predictive of immune checkpoint blockade response in skin cancer across multiple datasets, with an area under curve >0.75 and accuracy >0.71. Overall, our assessments enhance PENCIL’s reproducibility and utility, further extending its potential for identifying phenotype-relevant cell subsets in diverse biomedical applications.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Sensitivity testing of PENCIL to model parameters and input data.
Fig. 2: Predicting ICB response in skin cancer using a newly curated signature derived from ICB-response-related CD8 + T cells.
Fig. 3: Characterizing tumour-originated and PBMC-originated CD8+ T cells.

Similar content being viewed by others

Data availability

All the data used in this study are publicly available on Gene Expression Omnibus (GEO) via the following GEO accession numbers: GSE120575 (ref. 9), GSE123813 (ref. 14), GSE166181 (ref. 15), GSE115978 (ref. 30), GSE144236 (ref. 31), GSE145328 (ref. 32), GSE139324 (ref. 23), GSE164690 (ref. 24), GSE162025 (ref. 25), GSE180268 (ref. 26), GSE182227 (ref. 27) and GSE200996 (ref. 28).

Code availability

All code necessary to replicate these analyses is freely available at GitHub (https://github.com/rootchang/PENCIL_reusability_report) and Zenodo (https://doi.org/10.5281/zenodo.10121113)39.

References

  1. Gavish, A. et al. Hallmarks of transcriptional intratumour heterogeneity across a thousand tumours. Nature 618, 598–606 (2023).

    Article  ADS  CAS  PubMed  Google Scholar 

  2. Cao, J. Y. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  3. Eraslan, G. et al. Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. Science 376, 712 (2022).

    Article  Google Scholar 

  4. van de Sande, B. et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat. Protoc. 15, 2247–2276 (2020).

    Article  PubMed  Google Scholar 

  5. Ren, T. et al. Supervised learning of high-confidence phenotypic subpopulations from single-cell data. Nat. Mach. Intell. 5, 528–541 (2023).

    Article  Google Scholar 

  6. Dann, E., Henderson, N. C., Teichmann, S. A., Morgan, M. D. & Marioni, J. C. Differential abundance testing on single-cell data using k-nearest neighbor graphs. Nat. Biotechnol. 40, 245–253 (2022).

    Article  CAS  PubMed  Google Scholar 

  7. Zhao, J. et al. Detection of differentially abundant cell subpopulations in scRNA-seq data. Proc. Natl Acad. Sci. USA 118, e2100293118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Burkhardt, D. B. et al. Quantifying the effect of experimental perturbations at single-cell resolution. Nat. Biotechnol. 39, 619–629 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Sade-Feldman, M. et al. Defining T cell states associated with response to checkpoint immunotherapy in melanoma. Cell 175, 998 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Conde, C. D. et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science 376, 713 (2022).

    Google Scholar 

  11. Aran, D. et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol. 20, 163–172 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Ianevski, A., Giri, A. K. & Aittokallio, T. Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data. Nat. Commun. 13, 1246 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  13. Abdelaal, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20, 194 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Yost, K. E. et al. Clonal replacement of tumor-specific T cells following PD-1 blockade. Nat. Med. 25, 1251–1259 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. De Biasi, S. et al. Circulating mucosal-associated invariant T cells identify patients responding to anti-PD-1 therapy. Nat. Commun. 12, 1669 (2021).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  16. Wherry, E. J. et al. Molecular signature of CD8(+) T cell exhaustion during chronic viral infection. Immunity 27, 670–684 (2007).

    Article  CAS  PubMed  Google Scholar 

  17. Rooney, M. S., Shukla, S. A., Wu, C. J., Getz, G. & Hacohen, N. Molecular and genetic properties of tumors associated with local immune cytolytic activity. Cell 160, 48–61 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Damotte, D. et al. The tumor inflammation signature (TIS) is associated with anti-PD-1 treatment benefit in the CERTIM pan-cancer cohort. J. Transl. Med. 17, 357 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Ayers, M. et al. IFN-γ-related mRNA profile predicts clinical response to PD-1 blockade. J. Clin. Invest. 127, 2930–2940 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Thommen, D. S. et al. A transcriptionally and functionally distinct PD-1(+) CD8(+) T cell pool with predictive potential in non-small-cell lung cancer treated with PD-1 blockade. Nat. Med. 24, 994–1004 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Chow, A. et al. The ectonucleotidase CD39 identifies tumor-reactive CD8+T cells predictive of immune checkpoint blockade efficacy in human lung cancer. Immunity 56, 93–106.e6 (2023).

    Article  CAS  PubMed  Google Scholar 

  22. Duhen, T. et al. Co-expression of CD39 and CD103 identifies tumor-reactive CD8 T cells in human solid tumors. Nat. Commun. 9, 2724 (2018).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  23. Cillo, A. R. et al. Immune landscape of viral- and carcinogen-driven head and neck cancer. Immunity 52, 183–199.e9 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Kurten, C. H. L. et al. Investigating immune and non-immune cell interactions in head and neck tumors by single-cell RNA sequencing. Nat. Commun. 12, 7388 (2021).

    Article  Google Scholar 

  25. Liu, Y. et al. Tumour heterogeneity and intercellular networks of nasopharyngeal carcinoma at single cell resolution. Nat. Commun. 12, 741 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  26. Eberhardt, C. S. et al. Functional HPV-specific PD-1(+) stem-like CD8 T cells in head and neck cancer. Nature 597, 279–284 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  27. Puram, S. V. et al. Cellular states are coupled to genomic and viral heterogeneity in HPV-related oropharyngeal carcinoma. Nat. Genet. 55, 640–650 (2023).

  28. Luoma, A. M. et al. Tissue-resident memory and circulating T cells are early responders to pre-surgical cancer immunotherapy. Cell 185, 2918–2935 e29 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Cabrita, R. et al. Tertiary lymphoid structures improve immunotherapy and survival in melanoma (vol 577, 561, 2020). Nature 580, E1 (2020).

    Article  CAS  PubMed  Google Scholar 

  30. Jerby-Arnon, L. et al. A cancer cell program promotes T cell exclusion and resistance to checkpoint blockade. Cell 175, 984–997 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Ji, A. L. et al. Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell 182, 497–514 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Frazzette, N. et al. Decreased cytotoxic T cells and TCR clonality in organ transplant recipients with squamous cell carcinoma. Npj Precis. Onc. 4, 13 (2020).

    Article  CAS  Google Scholar 

  33. Sun, D. Q. et al. TISCH: a comprehensive web resource enabling interactive single-cell transcriptome visualization of tumor microenvironment. Nucleic Acids Res. 49, D1420–D1430 (2021).

    Article  CAS  PubMed  Google Scholar 

  34. Hanzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics 14, 7 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Yu, G. C. & He, Q. Y. ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Mol. Biosyst. 12, 477–479 (2016).

    Article  CAS  PubMed  Google Scholar 

  36. Kumar, B. V. et al. Human tissue-resident memory T cells are defined by core transcriptional and functional signatures in lymphoid and mucosal sites. Cell Reports 20, 2921–2934 (2017).

    Article  CAS  PubMed  Google Scholar 

  37. Ayers, M. et al. IFN-gamma-related mRNA profile predicts clinical response to PD-1 blockade. J. Clin. Invest. 127, 2930–2940 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Andreatta, M. & Carmona, S. J. UCell: robust and scalable single-cell gene signature scoring. Computational and Structural Biotechnology Journal 19, 3796–3798 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Cao, Y., Chang, T.G., Sahni, S. & Ruppin, E. PENCIL reusability report v1.0.0. Zenodo https://doi.org/10.5281/zenodo.10121113 (2023).

Download references

Acknowledgements

This research was supported in part by the NIH Intramural Research Program, National Cancer Institute. This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov).

Author information

Authors and Affiliations

Authors

Contributions

E.R., Y.C. and T.-G.C. conceived and designed the study. Y.C., T.-G.C. and S.S. collected and managed the data. T.-G.C. and Y.C. performed the analyses. T.-G.C., Y.C. and E.R. wrote the paper. All authors critically revised the manuscript for important intellectual content.

Corresponding author

Correspondence to Eytan Ruppin.

Ethics declarations

Competing interests

E.R. is a co-founder of MedAware, Metabomed and Pangea Biomed (divested) and an unpaid member of Pangea Biomed’s scientific advisory board. The other authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–4 and Table 1.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, Y., Chang, TG., Sahni, S. et al. Reusability report: Leveraging supervised learning to uncover phenotype-relevant biology from single-cell RNA sequencing data. Nat Mach Intell 6, 307–314 (2024). https://doi.org/10.1038/s42256-024-00804-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-024-00804-y

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer