Somatic mutations and cell identity linked by Genotyping of Transcriptomes

Nam, Anna S.; Kim, Kyu-Tae; Chaligne, Ronan; Izzo, Franco; Ang, Chelston; Taylor, Justin; Myers, Robert M.; Abu-Zeinah, Ghaith; Brand, Ryan; Omans, Nathaniel D.; Alonso, Alicia; Sheridan, Caroline; Mariani, Marisa; Dai, Xiaoguang; Harrington, Eoghan; Pastore, Alessandro; Cubillos-Ruiz, Juan R.; Tam, Wayne; Hoffman, Ronald; Rabadan, Raul; Scandura, Joseph M.; Abdel-Wahab, Omar; Smibert, Peter; Landau, Dan A.

doi:10.1038/s41586-019-1367-0

Article
Published: 03 July 2019

Somatic mutations and cell identity linked by Genotyping of Transcriptomes

Anna S. Nam^1,2,3^na1,
Kyu-Tae Kim^2,3,4^na1,
Ronan Chaligne^2,3,4^na1,
Franco Izzo^2,3,4,
Chelston Ang^2,3,4,
Justin Taylor⁵,
Robert M. Myers^2,3,4,6,
Ghaith Abu-Zeinah^4,7,
Ryan Brand^2,3,4,
Nathaniel D. Omans^2,3,4,8,
Alicia Alonso⁹,
Caroline Sheridan⁹,
Marisa Mariani⁹,
Xiaoguang Dai¹⁰,
Eoghan Harrington¹⁰,
Alessandro Pastore⁵,
Juan R. Cubillos-Ruiz¹¹,
Wayne Tam¹,
Ronald Hoffman¹²,
Raul Rabadan¹³,
Joseph M. Scandura^3,7,
Omar Abdel-Wahab⁵,
Peter Smibert¹⁴^na2 &
…
Dan A. Landau^2,3,4,15^na2

Nature volume 571, pages 355–360 (2019)Cite this article

42k Accesses
147 Citations
149 Altmetric
Metrics details

Subjects

Abstract

Defining the transcriptomic identity of malignant cells is challenging in the absence of surface markers that distinguish cancer clones from one another, or from admixed non-neoplastic cells. To address this challenge, here we developed Genotyping of Transcriptomes (GoT), a method to integrate genotyping with high-throughput droplet-based single-cell RNA sequencing. We apply GoT to profile 38,290 CD34⁺ cells from patients with CALR-mutated myeloproliferative neoplasms to study how somatic mutations corrupt the complex process of human haematopoiesis. High-resolution mapping of malignant versus normal haematopoietic progenitors revealed an increasing fitness advantage with myeloid differentiation of cells with mutated CALR. We identified the unfolded protein response as a predominant outcome of CALR mutations, with a considerable dependency on cell identity, as well as upregulation of the NF-κB pathway specifically in uncommitted stem cells. We further extended the GoT toolkit to genotype multiple targets and loci that are distant from transcript ends. Together, these findings reveal that the transcriptional output of somatic mutations in myeloproliferative neoplasms is dependent on the native cell identity.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: GoT provides genotyping of somatic mutations for thousands of cancer cells, and reveals a differential fitness effect of *CALR* mutation in subsets of haematopoietic progenitor cells.**

**Fig. 2: *CALR* mutations result in a higher proliferative effect on MkPs compared to HSPCs.**

**Fig. 3: Transcriptional effects of *CALR* mutation are dependent on cell identity.**

**Fig. 4: Effects of *CALR* mutation on haematopoietic progenitor cells from patients with myelofibrosis.**

**Fig. 5: GoT dissects subclonal identity through multiplexing and targets loci that are distant from transcript ends via circularization.**

Integrative genotyping of cancer and immune phenotypes by long-read sequencing

Article Open access 02 January 2024

Livius Penter, Mehdi Borji, … Catherine J. Wu

Long-read whole-genome analysis of human single cells

Article Open access 24 August 2023

Joanna Hård, Jeff E. Mold, … Adam Ameur

Mitochondrial variant enrichment from high-throughput single-cell RNA sequencing resolves clonal populations

Article 24 February 2022

Tyler E. Miller, Caleb A. Lareau, … Peter van Galen

Data availability

All of the sequencing data are available via the Gene Expression Omnibus (GEO) under the accession number GSE117826. Any other relevant data are available from the corresponding author upon reasonable request.

Code availability

The IronThrone GoT pipeline is available on GitHub at https://github.com/landau-lab/IronThrone-GoT.

References

Sperling, A. S., Gibson, C. J. & Ebert, B. L. The genetics of myelodysplastic syndrome: from clonal haematopoiesis to secondary leukaemia. Nat. Rev. Cancer 17, 5–19 (2017).
CAS PubMed Google Scholar
Landau, D. A. et al. The evolutionary landscape of chronic lymphocytic leukemia treated with ibrutinib targeted therapy. Nat. Commun. 8, 2185 (2017).
ADS PubMed PubMed Central Google Scholar
Burger, J. A. et al. Clonal evolution in patients with chronic lymphocytic leukaemia developing resistance to BTK inhibition. Nat. Commun. 7, 11589 (2016).
ADS CAS PubMed PubMed Central Google Scholar
Landau, D. A. et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013).
CAS PubMed PubMed Central Google Scholar
Landau, D. A. et al. Mutations driving CLL and their evolution in progression and relapse. Nature 526, 525–530 (2015).
ADS CAS PubMed PubMed Central Google Scholar
Nangalia, J. et al. Somatic CALR mutations in myeloproliferative neoplasms with nonmutated JAK2. N. Engl. J. Med. 369, 2391–2405 (2013).
CAS PubMed PubMed Central Google Scholar
Klampfl, T. et al. Somatic mutations of calreticulin in myeloproliferative neoplasms. N. Engl. J. Med. 369, 2379–2390 (2013).
CAS PubMed Google Scholar
Giustacchini, A. et al. Single-cell transcriptomics uncovers distinct molecular signatures of stem cells in chronic myeloid leukemia. Nat. Med. 23, 692–702 (2017).
CAS PubMed Google Scholar
Cheow, L. F. et al. Single-cell multimodal profiling reveals cellular epigenetic heterogeneity. Nat. Methods 13, 833–836 (2016).
CAS PubMed Google Scholar
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
CAS PubMed PubMed Central Google Scholar
Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
CAS PubMed PubMed Central Google Scholar
Elf, S. et al. Mutant calreticulin requires both its mutant C-terminus and the thrombopoietin receptor for oncogenic transformation. Cancer Discov. 6, 368–381 (2016).
CAS PubMed PubMed Central Google Scholar
Defour, J. P., Chachoua, I., Pecquet, C. & Constantinescu, S. N. Oncogenic activation of MPL/thrombopoietin receptor by 17 mutations at W515: implications for myeloproliferative neoplasms. Leukemia 30, 1214–1216 (2016).
CAS PubMed Google Scholar
Kollmann, K. et al. A novel signalling screen demonstrates that CALR mutations activate essential MAPK signalling and facilitate megakaryocyte differentiation. Leukemia 31, 934–944 (2017).
CAS PubMed Google Scholar
Marty, C. et al. Calreticulin mutants in mice induce an MPL-dependent thrombocytosis with frequent progression to myelofibrosis. Blood 127, 1317–1324 (2016).
CAS PubMed Google Scholar
Nivarthi, H. et al. Thrombopoietin receptor is required for the oncogenic function of CALR mutants. Leukemia 30, 1759–1763 (2016).
CAS PubMed PubMed Central Google Scholar
Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).
CAS PubMed PubMed Central Google Scholar
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
CAS PubMed PubMed Central Google Scholar
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
CAS PubMed PubMed Central Google Scholar
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).
CAS PubMed PubMed Central Google Scholar
Farrell, J. A. et al. Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis. Science 360, eaar313 (2018).
Google Scholar
Chen, E. et al. Distinct clinical phenotypes associated with JAK2V617F reflect differential STAT1 signaling. Cancer Cell 18, 524–535 (2010).
CAS PubMed PubMed Central Google Scholar
Rampal, R. et al. Integrated genomic analysis illustrates the central role of JAK–STAT pathway activation in myeloproliferative neoplasm pathogenesis. Blood 123, e123–e133 (2014).
PubMed PubMed Central Google Scholar
Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).
ADS CAS PubMed PubMed Central Google Scholar
Georgantas, R. W. III et al. Microarray and serial analysis of gene expression analyses identify known and novel transcripts overexpressed in hematopoietic stem cells. Cancer Res. 64, 4434–4441 (2004).
CAS PubMed Google Scholar
Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).
CAS PubMed PubMed Central Google Scholar
Hetz, C. & Papa, F. R. The unfolded protein response and cell fate control. Mol. Cell 69, 169–181 (2018).
CAS PubMed Google Scholar
Pronier, E. et al. Targeting the CALR interactome in myeloproliferative neoplasms. JCI Insight 3, e122703 (2018).
PubMed Central Google Scholar
Lau, W. W., Hannah, R., Green, A. R. & Göttgens, B. The JAK–STAT signaling pathway is differentially activated in CALR-positive compared with JAK2V617F-positive ET patients. Blood 125, 1679–1681 (2015).
CAS PubMed PubMed Central Google Scholar
Shivarov, V., Ivanova, M. & Tiu, R. V. Mutated calreticulin retains structurally disordered C terminus that cannot bind Ca²⁺: some mechanistic and therapeutic implications. Blood Cancer J. 4, e185 (2014).
CAS PubMed PubMed Central Google Scholar
Zini, R. et al. CALR mutational status identifies different disease subtypes of essential thrombocythemia showing distinct expression profiles. Blood Cancer J. 7, 638 (2017).
PubMed PubMed Central Google Scholar
Wu, J. et al. ATF6α optimizes long-term endoplasmic reticulum function to protect cells from chronic stress. Dev. Cell 13, 351–364 (2007).
CAS PubMed Google Scholar
van Galen, P. et al. The unfolded protein response governs integrity of the haematopoietic stem-cell pool during stress. Nature 510, 268–272 (2014).
ADS PubMed Google Scholar
Lee, A. H., Iwakoshi, N. N. & Glimcher, L. H. XBP-1 regulates a subset of endoplasmic reticulum resident chaperone genes in the unfolded protein response. Mol. Cell. Biol. 23, 7448–7459 (2003).
CAS PubMed PubMed Central Google Scholar
Cubillos-Ruiz, J. R. et al. ER stress sensor XBP1 controls anti-tumor immunity by disrupting dendritic cell homeostasis. Cell 161, 1527–1538 (2015).
CAS PubMed PubMed Central Google Scholar
Yoshida, H., Matsui, T., Yamamoto, A., Okada, T. & Mori, K. XBP1 mRNA is induced by ATF6 and spliced by IRE1 in response to ER stress to produce a highly active transcription factor. Cell 107, 881–891 (2001).
CAS PubMed Google Scholar
Luo, J. L., Kamata, H. & Karin, M. IKK/NF-κB signaling: balancing life and death—a new approach to cancer therapy. J. Clin. Invest. 115, 2625–2632 (2005).
CAS PubMed PubMed Central Google Scholar
Stein, S. J. & Baldwin, A. S. Deletion of the NF-κB subunit p65/RelA in the hematopoietic compartment leads to defects in hematopoietic stem cell function. Blood 121, 5015–5024 (2013).
CAS PubMed PubMed Central Google Scholar
Abu-Zeinah, G. et al. Myeloproliferative neoplasm (MPN) driver mutations are enriched during hematopoietic stem cell differentiation in patterns that correlate with clinical phenotype and treatment response. Blood 132, 4317 (2018).
Google Scholar
Castro-Malaspina, H., Rabellino, E. M., Yen, A., Nachman, R. L. & Moore, M. A. Human megakaryocyte stimulation of proliferation of bone marrow fibroblasts. Blood 57, 781–787 (1981).
CAS PubMed Google Scholar
Ciurea, S. O. et al. Pivotal contributions of megakaryocytes to the biology of idiopathic myelofibrosis. Blood 110, 986–993 (2007).
CAS PubMed PubMed Central Google Scholar
Terui, T. et al. The production of transforming growth factor-beta in acute megakaryoblastic leukemia and its possible implications in myelofibrosis. Blood 75, 1540–1548 (1990).
CAS PubMed Google Scholar
Obeng, E. A. et al. Physiologic expression of Sf3b1 ^K700E causes impaired erythropoiesis, aberrant splicing, and sensitivity to therapeutic spliceosome modulation. Cancer Cell 30, 404–417 (2016).
CAS PubMed PubMed Central Google Scholar
Saikia, M. et al. Simultaneous multiplexed amplicon sequencing and transcriptome profiling in single cells. Nat. Methods 16, 59–62 (2019).
CAS PubMed Google Scholar
Hill, A. J. et al. On the design of CRISPR-based single-cell molecular screens. Nat. Methods 15, 271–274 (2018).
CAS PubMed PubMed Central Google Scholar
Kleppe, M. et al. Dual targeting of oncogenic activation and inflammatory signaling increases therapeutic efficacy in myeloproliferative neoplasms. Cancer Cell 33, 785–787 (2018).
CAS PubMed PubMed Central Google Scholar
Mu, P. et al. SOX2 promotes lineage plasticity and antiandrogen resistance in TP53- and RB1-deficient prostate cancer. Science 355, 84–88 (2017).
ADS CAS PubMed PubMed Central Google Scholar
Suvà, M. L. et al. Reconstructing and reprogramming the tumor-propagating potential of glioblastoma stem-like cells. Cell 157, 580–594 (2014).
PubMed PubMed Central Google Scholar
Geyer, J. T. et al. Oligomonocytic chronic myelomonocytic leukemia (chronic myelomonocytic leukemia without absolute monocytosis) displays a similar clinicopathologic and mutational profile to classical chronic myelomonocytic leukemia. Mod. Pathol. 30, 1213–1222 (2017).
CAS PubMed Google Scholar
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
CAS PubMed PubMed Central Google Scholar
Bolker, B. M. et al. Generalized linear mixed models: a practical guide for ecology and evolution. Trends Ecol. Evol. 24, 127–135 (2009).
PubMed Google Scholar
Liaw, A. & Wiener, M. Classification and regression by randomForest. R News 2, 18–22 (2002).
Google Scholar
Reinius, B. & Sandberg, R. Random monoallelic expression of autosomal genes: stochastic transcription and allele-level regulation. Nat. Rev. Genet. 16, 653–664 (2015).
CAS PubMed Google Scholar
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
ADS CAS PubMed PubMed Central Google Scholar
Ntranos, V., Yi, L., Melsted, P. & Pachter, L. A discriminative learning approach to differential expression analysis for single-cell RNA-seq. Nat. Methods 16, 163–166 (2019).
CAS PubMed Google Scholar
Reimand, J. et al. g:Profiler–a web server for functional interpretation of gene lists (2016 update). Nucleic Acids Res. 44, W83–W89 (2016).
CAS PubMed PubMed Central Google Scholar
Rhee, J. K., Lee, S., Park, W. Y., Kim, Y. H. & Kim, T. M. Allelic imbalance of somatic mutations in cancer genomes and transcriptomes. Sci. Rep. 7, 1653 (2017).
ADS PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
PubMed PubMed Central Google Scholar
Forbes, S. A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2017).
CAS PubMed Google Scholar
Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).
ADS CAS PubMed PubMed Central Google Scholar
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 174, 1034–1035 (2018).
CAS PubMed PubMed Central Google Scholar
Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protocols 4, 1184–1191 (2009).
CAS PubMed Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The work was enabled by the Weill Cornell Epigenomics Core and Flow Cytometry Core. We thank A. Mullally (Brigham and Women’s Hospital) for sharing the cell lines for the species-mixing study, and N. Kuchine (Weill Cornell Medicine) for helping us to acquire one of the patient samples. R.C. is supported by Lymphoma Research Foundation and Marie Skłodowska-Curie fellowships. R.M.M. is supported by a Medical Scientist Training Program grant from the National Institute of General Medical Sciences of the National Institutes of Health, awarded to the Weill Cornell, Rockefeller University and Memorial Sloan Kettering Cancer Center Tri-Institutional MD-PhD Program (T32GM007739). G.A.-Z. and J.M.S. are supported by Cancer Research & Treatment Fund (CR&T). J.R.C.-R. is supported by the Stand Up to Cancer Innovative Research Grant (SU2C-AACR-IRG-03-16) and Department of Defense Early-Career Investigator Award (W81XWH-16-1-0438). D.A.L. is supported by the Burroughs Wellcome Fund Career Award for Medical Scientists, the American Society of Hematology Scholar Award, Pershing Square Sohn Prize for Young Investigators in Cancer Research and the National Institutes of Health Director’s New Innovator Award (DP2-CA239065). This work was also supported by the Leukemia Lymphoma Society Translational Research Program, Columbia University Physical Sciences in Oncology Center Pilot Grant (U54CA193313), National Heart Lung and Blood Institute (R01HL145283-01) and Stand Up To Cancer Innovative Research Grant (SU2C-AACR-IRG-0616). Stand Up To Cancer is a program of the Entertainment Industry Foundation. Research grants are administered by the American Association for Cancer Research, the scientific partner of SU2C.

Author information

These authors contributed equally: Anna S. Nam, Kyu-Tae Kim, Ronan Chaligne
These authors jointly supervised this work: Peter Smibert, Dan A. Landau

Authors and Affiliations

Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
Anna S. Nam & Wayne Tam
New York Genome Center, New York, NY, USA
Anna S. Nam, Kyu-Tae Kim, Ronan Chaligne, Franco Izzo, Chelston Ang, Robert M. Myers, Ryan Brand, Nathaniel D. Omans & Dan A. Landau
Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
Anna S. Nam, Kyu-Tae Kim, Ronan Chaligne, Franco Izzo, Chelston Ang, Robert M. Myers, Ryan Brand, Nathaniel D. Omans, Joseph M. Scandura & Dan A. Landau
Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medicine, New York, NY, USA
Kyu-Tae Kim, Ronan Chaligne, Franco Izzo, Chelston Ang, Robert M. Myers, Ghaith Abu-Zeinah, Ryan Brand, Nathaniel D. Omans & Dan A. Landau
Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Justin Taylor, Alessandro Pastore & Omar Abdel-Wahab
Tri-Institutional MD-PhD Program, Weill Cornell Medicine, Rockefeller University, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Robert M. Myers
Richard T. Silver MD Myeloproliferative Neoplasms Center, Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medicine, New York, NY, USA
Ghaith Abu-Zeinah & Joseph M. Scandura
Tri-Institutional Training Program in Computational Biology and Medicine, Memorial Sloan Kettering Cancer Center, Cornell University, Weill Cornell Medicine, New York, NY, USA
Nathaniel D. Omans
Epigenomics Core Facility, Weill Cornell Medicine, New York, NY, USA
Alicia Alonso, Caroline Sheridan & Marisa Mariani
Oxford Nanopore Technologies, New York, NY, USA
Xiaoguang Dai & Eoghan Harrington
Department of Obstetrics and Gynecology, Weill Cornell Medicine, New York, NY, USA
Juan R. Cubillos-Ruiz
Division of Hematology and Medical Oncology, Department of Medicine, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Ronald Hoffman
Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
Raul Rabadan
Technology Innovation Lab, New York Genome Center, New York, NY, USA
Peter Smibert
Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
Dan A. Landau

Authors

Anna S. Nam
View author publications
You can also search for this author in PubMed Google Scholar
Kyu-Tae Kim
View author publications
You can also search for this author in PubMed Google Scholar
Ronan Chaligne
View author publications
You can also search for this author in PubMed Google Scholar
Franco Izzo
View author publications
You can also search for this author in PubMed Google Scholar
Chelston Ang
View author publications
You can also search for this author in PubMed Google Scholar
Justin Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Robert M. Myers
View author publications
You can also search for this author in PubMed Google Scholar
Ghaith Abu-Zeinah
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Brand
View author publications
You can also search for this author in PubMed Google Scholar
Nathaniel D. Omans
View author publications
You can also search for this author in PubMed Google Scholar
Alicia Alonso
View author publications
You can also search for this author in PubMed Google Scholar
Caroline Sheridan
View author publications
You can also search for this author in PubMed Google Scholar
Marisa Mariani
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoguang Dai
View author publications
You can also search for this author in PubMed Google Scholar
Eoghan Harrington
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Pastore
View author publications
You can also search for this author in PubMed Google Scholar
Juan R. Cubillos-Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Wayne Tam
View author publications
You can also search for this author in PubMed Google Scholar
Ronald Hoffman
View author publications
You can also search for this author in PubMed Google Scholar
Raul Rabadan
View author publications
You can also search for this author in PubMed Google Scholar
Joseph M. Scandura
View author publications
You can also search for this author in PubMed Google Scholar
Omar Abdel-Wahab
View author publications
You can also search for this author in PubMed Google Scholar
Peter Smibert
View author publications
You can also search for this author in PubMed Google Scholar
Dan A. Landau
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.S.N., K.-T.K., R.C., P.S., O.A.-W. and D.A.L. devised the research strategy. A.S.N., K.-T.K., R.C., P.S. and D.A.L. developed the tools. A.S.N., R.C., P.S., C.A., N.D.O., A.A., C.S., M.M., J.T., X.D., R.M.M., E.H. and G.A.-Z. performed the experiments. A.S.N., K.-T.K., R.B., A.P. and F.I. performed the analyses. A.S.N., K.-T.K., R.C., P.S. and D.A.L. wrote the manuscript. A.S.N., K.-T.K., R.C., J.T., P.S., J.R.C.-R., W.T., R.H., J.M.S., R.R., O.A.-W. and D.A.L. helped to interpret results. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Dan A. Landau.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Comparison of VAF between WES and RNA-seq, and primer sequences and positions of linear and circularization GoT.

a, Pie charts show the fraction of variants, which are categorized as described in the top panel. The distribution of the mutant allele fraction is annotated as oncogene or tumour-suppressor gene (TSG) (according to previously published definitions^60,61). Diagonal dashed lines indicate an equal allelic fraction between WES and RNA-seq. Yellow density contours represent driver distributions. BRCA, breast invasive carcinoma; HNSC, head and neck squamous cell carcinoma; KIRC, kidney renal clear cell carcinoma; LUAD, lung adenocarcinoma; STAD, stomach adenocarcinoma. b, Schematic localization of primers for linear GoT and circularization GoT for 3′ and 5′ libraries. c, Primer positions and sequences of the regions targeted by GoT and circularization GoT.

Extended Data Fig. 2 Optimization of parameters in processing targeted amplicon sequences in the IronThrone GoT pipeline.

a, Representation of amplicon reads. b, Flow chart of the GoT analysis pipeline (Methods). CB, cell barcode. c, Mouse (green) and human (blue) genome alignment of 10x data (y axes) with genotyping data by GoT (x axes) with various thresholds for minimum duplicate reads (across) and maximum mismatch ratio (down). d, Results of precision, recall and F₁ score analysis for combinations of minimum duplicate reads and maximum mismatch ratios. e, Measure of the importance of each variable used for the calculation of splits in trees in random-forest classification test. f, Ratio of cell loss and genotyping errors (z-score on y axis) based on mismatch ratio thresholds (x axis). The area of intersection is highlighted in grey around the mismatch ratio 0.2. g, Heat maps showing z-scores of the number of filtered cells (left) and predicted error rates (right) from random-forest classification tests for combinations of minimum duplicate reads and maximum mismatch ratio thresholds.

Extended Data Fig. 3 GoT captures genotyping information of single cells through cDNA.

a, Percentage of cells by number of UMIs with the CALR-mutation locus captured in standard 10x data (left panels) and GoT data (right panels) (see c for cell numbers in each sample). b, Number of UMIs per cell of CALR transcript from standard 10x data (blue shading) or targeted CALR locus from standard 10x or GoT (pink shading) (see c for cell numbers in each sample). c, Summary of clinical, pathological and GoT data from patients with CALR-mutated myeloproliferative neoplasms. BM, bone marrow; PB, peripheral blood. d, Number of genes per cell (left) and number of UMIs per cell (right) from published standard 10x data of healthy control CD34⁺ cells and 10x data from 3′ v.2 chemistry of CD34⁺ cells from patient samples that underwent concurrent GoT, after random downsampling of the reads from each sample to 50 million reads × 3 iterations, showing that the extra cycle of PCR and portioning a small aliquot from the 10x cDNA library for GoT using 3′ v.2 chemistry does not compromise scRNA-seq data.

Extended Data Fig. 4 Integration of samples from patients with essential thrombocythaemia and assignment of progenitor subsets.

a, t-SNE projection of CD34⁺ progenitor cells from samples ET01–ET05, after integration and batch correction using the Seurat package (Methods). b, Heat map of top ten differentially expressed genes for clusters; lineage-specific genes from a previous publication²⁶ are highlighted (Methods). c, Representative lineage-specific genes projected onto the t-SNE representation of CD34⁺ cells from samples from patients with essential thrombocythaemia. d, t-SNE projection of CD34⁺ cells from samples ET01–ET05 after applying a deep generative modelling approach for the single-cell analysis using the scVI package (Methods)¹⁹, showing assignments of progenitor subsets as determined after clustering the cells using the Seurat package. e, Genotyping data from GoT are projected onto the t-SNE representation generated after the scVI analysis of progenitor cells from samples ET01–ET05. Cells without any GoT data are labelled NA (not assignable).

Extended Data Fig. 5 Results of GoT analysis are robust to various amplicon UMI thresholds and linear modelling.

a, Frequency of wild-type and mutant cells in HSPCs and MkPs with variable minimum genotyping UMI thresholds (two-sided Fisher’s exact test; see Supplementary Table 6 for sample size). b, Pseudotime comparison between wild-type and mutant cells with an increasing number of thresholds for targeted genotyping UMI (two-sided t-test; see Supplementary Table 6 for sample size). c, Pseudotime comparison between mutant and wild-type cells with UMI threshold of 1 (same datasets as b), with statistical test using a generalized linear model including mutation status and total number of amplicon UMIs per cell. d, Across 100 iterations, the genotyping amplicon UMIs were downsampled to one per cell and the mutant-cell frequency was determined for MkPs or precursor B cells. This frequency was then divided by the total mutant-cell frequency across all progenitor subsets for each of the 100 iterations. Mean ± s.d. after n = 100 downsampling iterations (two-sided Wilcoxon rank-sum test). Essential thrombocythaemia samples with at least 20 cells in each cluster were analysed. e, VAF of CALR mutation in CD34⁺CD38⁻ (left), CD34⁺CD38⁺ (middle) and CD34⁺CD10⁺ (right) FACS-sorted peripheral blood cells from patients with essential thrombocythaemia determined by ddPCR.

Extended Data Fig. 6 Cell-cycle module expression in mutant and wild-type progenitor cells.

a, S-phase and G2- and M-phase gene-module expression in wild-type and mutant cells, in HSPC and MkP clusters from essential thrombocythaemia samples. Cell-cycle module score represents the sum of S-phase and G2- and M-phase gene-module expression (two-sided Wilcoxon rank-sum test; see Methods and Supplementary Table 6 for sample size). Analysis was performed for clusters with at least 20 cells. b, Ratio of committed-progenitor priming-module expression of mutant and wild-type HSPCs. One mutant and one wild-type HSPC were randomly sampled from samples ET01–ET05 for each round of analysis (n = 1,000 iterations, two-sided Wilcoxon rank-sum test).

Extended Data Fig. 7 ATF6 and IRE1 branches of the UPR are activated in CALR-mutated progenitor cells.

By sample (ET01–ET05), expression of ATF6-, PERK- and XBP1-target genes in the UPR in CALR wild-type and mutant MkPs and HSPCs (two-sided Wilcoxon rank-sum test).

Extended Data Fig. 8 CALR-mutated haematopoietic progenitor cells from myelofibrosis show upregulation of the IRE1-mediated UPR.

a, t-SNE projection of CD34⁺ progenitor cells from samples MF01–MF04, after integration and batch correction using the Seurat package (Methods) (n = 11,093). b, Left, t-SNE projection of CD34⁺ progenitor cells from samples MF01–MF04 labelled with pseuodotime²¹ (n = 11,093). Right, pseudotime comparison between wild-type (n = 2,221) and mutant (n = 7,483) cells. P values from likelihood ratio tests of linear mixed model with genotype as fixed effect and individual patient samples as random effect, against the model without the genotype effect (Methods). c, Cell-cycle module score comparison between wild-type and mutant cells in patients with myelofibrosis (two-sided Wilcoxon rank-sum test). d, Ratio of TGFβ-signalling-pathway gene expression of mutant and wild-type MkPs. One mutant and one wild-type MkP were randomly sampled for each round of analysis (n = 100 iterations; two-sided Wilcoxon-rank sum test). e, Differentially expressed genes between wild-type MkPs with high cell-cycle expression (n = 220) and wild-type MkPs with low cell-cycle expression (n = 110), common across samples MF02–MF04. P values were combined using Fisher’s combined test with Benjamini–Hochberg adjustments. Weighted average of fold change (expressed in log₂) based on cell number across samples is shown (Methods).

Extended Data Fig. 9 Deciphering subclonal progenitor identities using multiplex GoT, and targeting loci that are distant from transcript ends using circularization GoT.

a, Single-cell cloning assay of peripheral blood cells from patient MF05 (Methods). b, Rate of targeted locus capture (per cent) as a function of gene expression and the distance of the targeted locus from the transcript ends. c, Distance of the mutation locus from transcript ends for pan-cancer drivers, and their frequencies (based on the number of times they are reported in the COSMIC database). Mutations are annotated as oncogenes, tumour-suppressor genes or passengers (as previously defined^60,61). Relative density of each subclass of mutations from the closer end (that is, 3′ or 5′) is shown in the top panel. d, Schematic of analysis of ONT sequencing reads. e, Frequency of SF3B1-mutant and wild-type reads of linear GoT amplicon library sequenced with ONT. f, Analysis of SF3B1 amplicon reads sequenced by ONT for inter-transcript PCR recombination by mapping 50 bp at the opposite end of the targeted locus, showing only the 2.2% of fragments that reflect inter-transcript recombination. g, Pairwise difference of read lengths for duplicate reads (that is, reads with the same cell barcode and UMI) of the SF3B1 amplicon library sequenced with ONT, showing consistent read lengths of duplicate reads that support a low rate of intra-transcript PCR recombination. h, Comparison of genotype assignment for CALR in sample MF01 between linear GoT and circularization GoT after downsampling reads to 300,000 with 10 iterations (n = 320 cells). i, Comparison of CALR-mutant UMI fraction per cell in sample MF01 between linear GoT and circularization GoT after downsampling reads to 300,000 with 10 iterations (n = 320 cells, Pearson’s correlation, F-test).

Extended Data Fig. 10 Evaluation of barcode replacement in IronThrone GoT processing.

a, Fraction of reads with cell barcodes that are not perfectly matched to the whitelisted cell barcodes from the species-mixing experiment. ‘>Hamm-1’ denotes filtered reads with barcodes that are more than one Hamming distance away from whitelisted barcodes (n = 139,422 reads). ‘Not significant’ denotes filtered reads with barcodes that are one Hamming distance away from the whitelisted barcodes, but which have a low probability of originating from the barcode (posterior probability < 0.99, n = 14,830 reads). ‘Replaced’ denotes rescued reads with barcodes that have candidates that are one Hamming distance away from the whitelisted barcodes, with statistical significance (posterior probability ≥ 0.99, n = 224,085 reads). b, c, Number of supporting reads per candidate barcode and base quality at the differing base positions (b) and across base positions (c). Two-sided Wilcoxon rank-sum tests were applied to compare not significant (n = 14,830) and replaced (n = 224,085) barcodes. d, Correlation between the number of supporting reads per candidate barcode and median base quality at the differing base (two-tailed Pearson’s correlation, F-test). e, Distribution of prior and posterior probabilities from not significant (n = 14,830) and replaced (n = 224,085) barcodes. The dashed red line represents the posterior probability cut-off (0.99). f–h, To further evaluate the efficiency of barcode replacement, we generated synthetic cell barcodes by randomly changing one base in whitelisted cell barcodes (n = 100 iterations). f, Percentage of reads with cell barcodes that are not identical to the whitelisted cell barcodes (n = 100 iterations). Percentages of replaced reads were 99.1% ± 0.001% (median ± absolute deviation) in simulations with 1 base changed, 1.1% ± 0.002% in simulations with 2 bases changed and 0.7 ± 0.001% in simulations with 3 bases changed. g, Determination of whether replaced cell barcodes are identical to the original cell barcodes. In simulations with 1 base change, the percentage of reads with replaced cell barcodes that were identical to the original cell barcodes was 100 ± 0% (median ± absolute deviation of 100 iterations). h, Estimation of prediction power for classifying cell barcodes from simulations with 1 base changed (n = 100 iterations).

Supplementary information

Reporting Summary

Supplementary Table 1

Summary of patients’ clinical history and pathology.

Supplementary Table 2

Gene sets/modules.

Supplementary Table 3

Differential gene expression analysis between mutant and wild-type cells for each of the progenitor cluster for each patient (see Supplementary Table 6 for sample size) was performed using the logistic regression framework⁵⁷ within the Seurat package with UMI included as a latent variable. P-values from this test combined using Fisher combine test with Benjamini-Hochberg adjustment. A weighted average of the log2(fold change) across the samples was performed.

Supplementary Table 4

Gene set enrichment analysis (g:Profiler). Hypergeometric test for gene set enrichment analysis was performed using the gProfileR package (v. 0.6.7)⁵⁸. See tables for size of terms used for the test. Multiple hypothesis testing correction was performed using the g:SCS algorithm developed by the authors of the gProfileR package.

Supplementary Table 5

List of primers used in GoT and single cell colony assay.

Supplementary Table 6

Cell number for each patient and progenitor subsets.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nam, A.S., Kim, KT., Chaligne, R. et al. Somatic mutations and cell identity linked by Genotyping of Transcriptomes. Nature 571, 355–360 (2019). https://doi.org/10.1038/s41586-019-1367-0

Download citation

Received: 24 October 2018
Accepted: 05 June 2019
Published: 03 July 2019
Issue Date: 18 July 2019
DOI: https://doi.org/10.1038/s41586-019-1367-0

This article is cited by

scSNV-seq: high-throughput phenotyping of single nucleotide variants by coupled single-cell genotyping and transcriptomics
- Sarah E. Cooper
- Matthew A. Coelho
- Andrew R. Bassett
Genome Biology (2024)
Human embryonic genetic mosaicism and its effects on development and disease
- Sarah M. Waldvogel
- Jennifer E. Posey
- Margaret A. Goodell
Nature Reviews Genetics (2024)
Integrative genotyping of cancer and immune phenotypes by long-read sequencing
- Livius Penter
- Mehdi Borji
- Catherine J. Wu
Nature Communications (2024)
Deciphering cell states and genealogies of human haematopoiesis
- Chen Weng
- Fulong Yu
- Vijay G. Sankaran
Nature (2024)
De novo identification of expressed cancer somatic mutations from single-cell RNA sequencing data
- Tianyun Zhang
- Hanying Jia
- Ning Shen
Genome Medicine (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.