Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Identification of cancer-related mutations in human pluripotent stem cells using RNA-seq analysis


Human pluripotent stem cells (hPSCs) are known to acquire genetic aberrations during in vitro propagation. In addition to recurrent chromosomal aberrations, it has recently been shown that these cells also gain point mutations in cancer-related genes, predominantly in TP53. The need for routine quality control of hPSCs is critical for both basic research and clinical applications. Here we discuss the relevance of detecting mutations for various hPSCs applications, and present a detailed protocol to identify cancer-related point mutations using data from RNA sequencing, an assay commonly performed during the growth and differentiation of hPSCs. In this protocol, we describe how to process and align the sequencing data, analyze it and conservatively interpret the results in order to generate an accurate estimation of mutations in tumor-related genes. This pipeline is designed to work in high throughput and is available as a software container at The protocol requires minimal command-line skills and can be carried out in 1–2 d.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Relevance of cancer-related point mutations in hPCSs.
Fig. 2: Schematic representation of the pipeline for identification of cancer-related mutations in hPSCs from RNA-seq data.

Data availability

The RNA-seq sample used as an example in this protocol can be retrieved from the Sequence Read Archive (SRA) database ( under the accession number SRR3090631.

Code availabilty

All the code used in this protocol is available at


  1. 1.

    De Los Angeles, A. et al. Hallmarks of pluripotency. Nature 525, 469–478 (2015).

    Article  Google Scholar 

  2. 2.

    Tabar, V. & Studer, L. Pluripotent stem cells in regenerative medicine: challenges and recent progress. Nat. Rev. Genet. 15, 82–92 (2014).

    CAS  Article  Google Scholar 

  3. 3.

    Avior, Y., Sagi, I. & Benvenisty, N. Pluripotent stem cells in disease modelling and drug discovery. Nat. Rev. Mol. Cell Biol. 17, 170–182 (2016).

    CAS  Article  Google Scholar 

  4. 4.

    Shahbazi, M. N., Siggia, E. D. & Zernicka-Goetz, M. Self-organization of stem cells into embryos: a window on early mammalian development. Science 364, 948–951 (2019).

    CAS  Article  Google Scholar 

  5. 5.

    Weissbein, U., Benvenisty, N. & Ben-David, U. Genome maintenance in pluripotent stem cells. J. Cell Biol. 204, 153–163 (2014).

    CAS  Article  Google Scholar 

  6. 6.

    Bar, S. & Benvenisty, N. Epigenetic aberrations in human pluripotent stem cells. EMBO J. 38, 1–18 (2019).

    Article  Google Scholar 

  7. 7.

    Na, J., Baker, D., Zhang, J., Andrews, P. W. & Barbaric, I. Aneuploidy in pluripotent stem cells and implications for cancerous transformation. Protein Cell 5, 569–579 (2014).

    Article  Google Scholar 

  8. 8.

    Jo, H. Y. et al. Functional in vivo and in vitro effects of 20q11.21 genetic aberrations on hPSC differentiation. Sci. Rep. 10, 1–14 (2020).

    Article  Google Scholar 

  9. 9.

    Ben-David, U. & Benvenisty, N. The tumorigenicity of human embryonic and induced pluripotent stem cells. Nat. Rev. Cancer 11, 268–277 (2011).

    CAS  Article  Google Scholar 

  10. 10.

    Ben-David, U. et al. Aneuploidy induces profound changes in gene expression, proliferation and tumorigenicity of human pluripotent stem cells. Nat. Commun. 5, 4825 (2014).

    CAS  Article  Google Scholar 

  11. 11.

    Simonson, O. E., Domogatskaya, A., Volchkov, P. & Rodin, S. The safety of human pluripotent stem cells in clinical treatment. Ann. Med. 47, 370–380 (2015).

    Article  Google Scholar 

  12. 12.

    Gore, A. et al. Somatic coding mutations in human induced pluripotent stem cells. Nature 471, 63–67 (2011).

    CAS  Article  Google Scholar 

  13. 13.

    Merkle, F. T. et al. Human pluripotent stem cells recurrently acquire and expand dominant negative P53 mutations. Nature 545, 229–233 (2017).

    CAS  Article  Google Scholar 

  14. 14.

    Avior, Y., Lezmi, E., Eggan, K. & Benvenisty, N. Cancer-related mutations identified in primed human pluripotent stem cells. Cell Stem Cell 28, 10–11 (2021).

    CAS  Article  Google Scholar 

  15. 15.

    Stirparo, G. G., Smith, A. & Guo, G. Cancer-related mutations are not enriched in naive human pluripotent stem cells. Cell Stem Cell 28, 164–169.e2 (2021).

    CAS  Article  Google Scholar 

  16. 16.

    Halliwell, J., Barbaric, I. & Andrews, P. W. Acquired genetic changes in human pluripotent stem cells: origins and consequences. Nat. Rev. Mol. Cell Biol. 21, 715–728 (2020).

    CAS  Article  Google Scholar 

  17. 17.

    Merkle, F. T. et al. Biological insights from the whole genome analysis of human embryonic stem cells. Preprint at bioRxiv (2020).

  18. 18.

    Trounson, A. & DeWitt, N. D. Pluripotent stem cells progressing to the clinic. Nat. Rev. Mol. Cell Biol. 17, 194–200 (2016).

    CAS  Article  Google Scholar 

  19. 19.

    Tate, J. G. et al. COSMIC: the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 47, D941–D947 (2019).

    CAS  Article  Google Scholar 

  20. 20.

    Shihab, H. A. et al. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum. Mutat. 34, 57–65 (2013).

    CAS  Article  Google Scholar 

  21. 21.

    Sherry, S. T. et al. DbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    CAS  Article  Google Scholar 

  22. 22.

    Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  Google Scholar 

  23. 23.

    Yizhak, K. et al. RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science 364, eaaw0726 (2019).

    CAS  Article  Google Scholar 

  24. 24.

    Coudray, A., Battenhouse, A. M., Bucher, P. & Iyer, V. R. Detection and benchmarking of somatic mutations in cancer genomes using RNA-seq data. PeerJ 6, (2018).

  25. 25.

    Weissbein, U., Schachter, M., Egli, D. & Benvenisty, N. Analysis of chromosomal aberrations and recombination by allelic bias in RNA-Seq. Nat. Commun. 7, 12144 (2016).

    Article  Google Scholar 

  26. 26.

    Radenbaugh, A. J. et al. RADIA: RNA and DNA integrated analysis for somatic mutation detection. PLoS One 9, e111516 (2014).

    Article  Google Scholar 

  27. 27.

    DI Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).

    Article  Google Scholar 

  28. 28.

    Merkel, D. Docker: lightweight Linux containers for consistent development and deployment. Linux J. 239, 2 (2014).

    Google Scholar 

  29. 29.

    Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  Article  Google Scholar 

  30. 30.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  31. 31.

    Danecek, P. & McCarthy, S. A. BCFtools/csq: haplotype-aware variant consequences. Bioinformatics 33, 2037–2039 (2017).

    CAS  Article  Google Scholar 

  32. 32.

    Li, H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27, 718–719 (2011).

    Article  Google Scholar 

  33. 33.

    Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    CAS  Article  Google Scholar 

  34. 34.

    Brouard, J.-S., Schenkel, F., Marete, A. & Bissonnette, N. The GATK joint genotyping workflow is appropriate for calling variants in RNA-seq experiments. J. Anim. Sci. Biotechnol. 10, 44 (2019).

    Article  Google Scholar 

  35. 35.

    Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).

    CAS  Article  Google Scholar 

  36. 36.

    Sondka, Z. et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).

    CAS  Article  Google Scholar 

  37. 37.

    Kluin, R. J. C. et al. XenofilteR: computational deconvolution of mouse and human reads in tumor xenograft sequence data. BMC Bioinformatics 19, 366 (2018).

    CAS  Article  Google Scholar 

  38. 38.

    Collinson, A. et al. Deletion of the polycomb-group protein EZH2 leads to compromised self-renewal and differentiation defects in human embryonic stem cells article deletion of the Polycomb-group protein EZH2 leads to compromised self-renewal and differentiation defects in Hu. Cell Rep. 17, 2700–2714 (2016).

    CAS  Article  Google Scholar 

  39. 39.

    Lezmi, E. Identification of cancer-related mutations in human pluripotent stem cells utilizing RNA-seq analysis. elyadlezmi/RNA2CM (2021).

Download references


We thank S. Kinreich and A. Pagis for testing the pipeline and providing their constructive input and all members of The Azrieli Center for Stem Cells and Genetic Research for critical reading of the manuscript. This work was partially supported by the Israel Science Foundation (494/17), the Rosetrees Trust, and Azrieli Foundation. N.B. is the Herbert Cohn Chair in Cancer Research.

Author information




E.L. and N.B. designed the analysis. E.L. developed the bioinformatic pipeline and wrote the manuscript with input from N.B., who supervised the work.

Corresponding authors

Correspondence to Elyad Lezmi or Nissim Benvenisty.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Protocols thanks Anna Esteve-Codina and the other, anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Merkle, F. et al. Nature 545, 229–233 (2017):

Avior, Y. et al. Cell Stem Cell 28, 10–11 (2021):

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lezmi, E., Benvenisty, N. Identification of cancer-related mutations in human pluripotent stem cells using RNA-seq analysis. Nat Protoc 16, 4522–4537 (2021).

Download citation


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing