Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

CaSee: A lightning transfer-learning model directly used to discriminate cancer/normal cells from scRNA-seq

Abstract

Single-cell RNA sequencing (scRNA-seq) is one of the most efficient technologies for human tumor research. However, data analysis is still faced with technical challenges, especially the difficulty in efficiently and accurately discriminating cancer/normal cells in the scRNA-seq expression matrix. If we can address these challenges, we can have a deeper understanding of the intratumoral and intertumoral heterogeneity. In this study, we developed a cancer/normal cell discrimination pipeline called pan-Cancer Seeker (CaSee) devoted to scRNA-seq expression matrix, which is based on the traditional high-quality pan-cancer bulk sequencing data using transfer learning. CaSee is the first tool directly used to discriminate cancer/normal cells in the scRNA-seq expression matrix, with much wider application fields and higher efficiency than copy number variation (CNV) method which requires corresponding reference cells. CaSee is user-friendly and can adapt to a variety of data sources, including but not limited to scRNA tissue sequencing data, scRNA cell line sequencing data, scRNA xenograft cell sequencing data and scRNA circulating tumor cell sequencing data. It is compatible with mainstream sequencing technology platforms, 10× Genomics Chromium, Smart-seq2, and Microwell-seq. Here, CaSee pipeline exhibited excellent performance in the multicenter data evaluation of 11 retrospective cohorts and one independent dataset, with an average discrimination accuracy of 96.69%. In general, the development of a deep-learning based, pan-cancer cell discrimination model, CaSee, to distinguish cancer cells from normal cells will be compelling to researchers working in the genomics, cancer, and single-cell fields.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Introduction and overview of the CaSee Model workflow.
Fig. 2: Comparison of CaSee and copy number variation method.
Fig. 3: Comparison of differences between cancer cells and normal cells discriminated by different methods.
Fig. 4: Practical application analysis of CaSee.

Similar content being viewed by others

Data availability

All data download address links used in this study are in Supplementary Table 2. Raw CTC single-cell sequencing data is available in the China National Center for Bioinformation (BioProject ID PRJCA007531, accession number HRA002759).

Code availability

All data processing codes, and configuration files are stored on GitHub (https://github.com/yuansh3354/CaSee).

References

  1. Wagner J, Rapsomaniki MA, Chevrier S, Anzeneder T, Langwieder C, Dykgers A, et al. A Single-Cell Atlas of the Tumor and Immune Ecosystem of Human Breast Cancer. Cell. 2019;177:1330–1345.e18.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Villani A-C, Satija R, Reynolds G, Sarkizova S, Shekhar K, Fletcher J, et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science 2017;356:eaah4573.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Peng J, Sun B-F, Chen C-Y, Zhou J-Y, Chen Y-S, Chen H, et al. Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma. Cell Res. 2019;29:725–38.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Patel AP, Tirosh I, Trombetta JJ, Shalek AK, Gillespie SM, Wakimoto H, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 2014;344:1396–401.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Nguyen QH, Pervolarakis N, Blake K, Ma D, Davis RT, James N, et al. Profiling human breast epithelial cells using single cell RNA sequencing identifies cell diversity. Nat Commun. 2018;9:2028.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Bassez A, Vos H, Van Dyck L, Floris G, Arijs I, Desmedt C, et al. A single-cell map of intratumoral changes during anti-PD1 treatment of patients with breast cancer. Nat Med. 2021;27:820–32.

    Article  CAS  PubMed  Google Scholar 

  7. Kim N, Kim HK, Lee K, Hong Y, Cho JH, Choi JW, et al. Single-cell RNA sequencing demonstrates the molecular and cellular reprogramming of metastatic lung adenocarcinoma. Nat Commun. 2020;11:2285.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Zhang X, Lan Y, Xu J, Quan F, Zhao E, Deng C, et al. CellMarker: a manually curated resource of cell markers in human and mouse. Nucleic Acids Res. 2019;47:D721–D728.

    Article  CAS  PubMed  Google Scholar 

  9. Yuan H, Yan M, Zhang G, Liu W, Deng C, Liao G, et al. CancerSEA: a cancer single-cell state atlas. Nucleic Acids Res. 2019;47:D900–D908.

    Article  CAS  PubMed  Google Scholar 

  10. Oh DY, Kwek SS, Raju SS, Li T, McCarthy E, Chow E, et al. Intratumoral CD4+ T Cells Mediate Anti-tumor Cytotoxicity in Human Bladder Cancer. Cell. 2020;181:1612–1625.e13.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Taylor AM, Shih J, Ha G, Gao GF, Zhang X, Berger AC, et al. Genomic and functional approaches to understanding cancer aneuploidy. Cancer Cell. 2018;33:676–689.e3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Zarrei M, MacDonald JR, Merico D, Scherer SW. A copy number variation map of the human genome. Nat Rev Genet. 2015;16:172–83.

    Article  CAS  PubMed  Google Scholar 

  13. Zhou Y, Bian S, Zhou X, Cui Y, Wang W, Wen L, et al. Single-cell multiomics sequencing reveals prevalent genomic alterations in tumor stromal cells of human colorectal cancer. Cancer Cell. 2020;38:818–828.e5.

    Article  CAS  PubMed  Google Scholar 

  14. Gao R, Bai S, Henderson YC, Lin Y, Schalck A, Yan Y, et al. Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes. Nat Biotechnol. 2021;39:599–608.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Shao X, Yang H, Zhuang X, Liao J, Yang P, Cheng J, et al. scDeepSort: A pre-trained cell-type annotation method for single-cell transcriptomics using deep learning with a weighted graph neural network. Nucleic Acids Res. 2021;49:e122–e122.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. He Y, Yuan H, Wu C, Xie Z. DISC: A highly scalable and accurate inference of gene expression and structure for single-cell transcriptomes using semi-supervised deep learning. Genome Biol. 2020;21:170.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Yamada H, Liu C, Wu S, Koyama Y, Ju S, Shiomi J, et al. Predicting materials properties with little data using shotgun transfer learning. ACS Cent Sci. 2019;5:1717–30.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Zhu R, Qiu T, Wang J, Sui S, Hao C, Liu T, et al. Phase-to-pattern inverse design paradigm for fast realization of functional metasurfaces via transfer learning. Nat Commun. 2021;12:2974.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Hu J, Li X, Hu G, Lyu Y, Susztak K, Li M. Iterative transfer learning with neural network for clustering and cell type classification in single-cell RNA-seq analysis. Nat Mach Intell. 2020;2:607–18.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Qiu YL, Zheng H, Devos A, Selby H, Gevaert O. A meta-learning approach for genomic survival analysis. Nat Commun. 2020;11:6350.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Bell CC, Fennell KA, Chan Y-C, Rambow F, Yeung MM, Vassiliadis D, et al. Targeting enhancer switching overcomes non-genetic drug resistance in acute myeloid leukaemia. Nat Commun. 2019;10:2723.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Maynard A, McCoach CE, Rotow JK, Harris L, Haderk F, Kerr DL, et al. Therapy-induced evolution of human lung cancer revealed by single-cell RNA sequencing. Cell. 2020;182:1232–1251.e22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Roels J, Kuchmiy A, De Decker M, Strubbe S, Lavaert M, Liang KL, et al. Distinct and temporary-restricted epigenetic mechanisms regulate human αβ and γδ T cell development. Nat Immunol. 2020;21:1280–92.

    Article  CAS  PubMed  Google Scholar 

  24. Hao Y, Hao S, Andersen-Nissen E, Mauck WM, Zheng S, Butler A, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184:3573–3587.e29.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Krämer A, Green J, Pollard J, Tugendreich S. Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics. 2014;30:523–30.

    Article  PubMed  Google Scholar 

  26. Xi E, Bing S, Jin Y. Capsule Network Performance on Complex Data. arXiv:171203480 [cs, stat] 2017. http://arxiv.org/abs/1712.03480 (accessed 8 Dec2021).

  27. Wang L, Nie R, Yu Z, Xin R, Zheng C, Zhang Z, et al. An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data. Nat Mach Intell. 2020;2:693–703.

    Article  Google Scholar 

  28. Qiao K, Zhang C, Wang L, Yan B, Chen J, Zeng L, et al. Accurate reconstruction of image stimuli from human fMRI based on the decoding model with capsule network architecture. 14.

  29. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60:84–90.

    Article  Google Scholar 

  30. Sabour S, Frosst N, Hinton GE. Dynamic routing between capsules. arXiv:171009829 [cs] 2017. http://arxiv.org/abs/1710.09829 (accessed 8 Dec2021).

  31. Kharchenko PV, Silberstein L, Scadden DT. Bayesian approach to single-cell differential expression analysis. Nat Methods. 2014;11:740–2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Rambow F, Rogiers A, Marin-Bejar O, Aibar S, Femel J, Dewaele M, et al. Toward minimal residual disease-directed therapy in melanoma. Cell. 2018;174:843–855.e19.

    Article  CAS  PubMed  Google Scholar 

  33. Lee H-O, Hong Y, Etlioglu HE, Cho YB, Pomella V, Van den Bosch B, et al. Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer. Nat Genet. 2020;52:594–603.

    Article  CAS  PubMed  Google Scholar 

  34. Oren Y, Tsabar M, Cuoco MS, Amir-Zilberstein L, Cabanos HF, Hütter J-C, et al. Cycling cancer persister cells arise from lineages with distinct programs. Nature. 2021;596:576–82.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Han X, Zhou Z, Fei L, Sun H, Wang R, Chen Y, et al. Construction of a human cell landscape at single-cell level. Nature. 2020;581:303–9.

    Article  CAS  PubMed  Google Scholar 

  36. Bischoff P, Trinks A, Obermayer B, Pett JP, Wiederspahn J, Uhlitz F, et al. Single-cell RNA sequencing reveals distinct tumor microenvironmental patterns in lung adenocarcinoma. Oncogene 2021. https://doi.org/10.1038/s41388-021-02054-3.

  37. Zhang L, Zhou W, Velculescu VE, Kern SE, Hruban RH, Hamilton SR, et al. Gene expression profiles in normal and cancer cells. Science. 1997;276:1268–72.

    Article  CAS  PubMed  Google Scholar 

  38. Andreatta M, Carmona SJ. UCell: Robust and scalable single-cell gene signature scoring. Computational Struct Biotechnol J. 2021;19:3796–8.

    Article  CAS  Google Scholar 

  39. Hu L, Liang S, Chen H, Lv T, Wu J, Chen D, et al. ΔNp63α is a common inhibitory target in oncogenic PI3K/Ras/Her2-induced cell motility and tumor metastasis. Proc Natl Acad Sci USA. 2017;114:E3964–E3973.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by the China Postdoctoral Science Foundation (2021M690806), Beijing Natural Science Foundation Haidian original innovation joint fund (L202023), the National Natural Science Foundation of China (32027801, 31870992, 21775031), the Strategic Priority Research Program of Chinese Academy of Sciences (Grant No. XDB36000000, XDB38010400), CAS-JSPS (Grant No. GJHZ2094), Research Foundation for Advanced Talents of Fujian Medical University (XRCZX2017020, XRCZX2019005). The funding body had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. We thank Dr.Jianming Zeng(University of Macau), and all the members of his bioinformatics team, biotrainee, for generously sharing their experience and codes. We also thank Prof. Xiaopei Shen (Fujian Medical University) and his teammate, Hao Fu, Haibo Zhu, Guanghao Liu, Mengyao Wang, et al. for generously help, discussion, and advice about CaSee pipeline.

Author information

Authors and Affiliations

Authors

Contributions

YSH and XLZ were responsible for designing, analysing data, interpreting results and writing the paper. ZMY was responsible for analysing data. JYD, YZW, YZ, and XJL were responsible for CTC isolation and scRNA-seq sequencing. ZYH and CXG contributed to interpreting results. XLZ and ZYH conducted the analyses.

Corresponding authors

Correspondence to Yuan Sh, Xiuli Zhang or Zhiyuan Hu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sh, Y., Zhang, X., Yang, Z. et al. CaSee: A lightning transfer-learning model directly used to discriminate cancer/normal cells from scRNA-seq. Oncogene 41, 4866–4876 (2022). https://doi.org/10.1038/s41388-022-02478-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41388-022-02478-5

This article is cited by

Search

Quick links