Defining the transcriptomic identity of malignant cells is challenging in the absence of surface markers that distinguish cancer clones from one another, or from admixed non-neoplastic cells. To address this challenge, here we developed Genotyping of Transcriptomes (GoT), a method to integrate genotyping with high-throughput droplet-based single-cell RNA sequencing. We apply GoT to profile 38,290 CD34+ cells from patients with CALR-mutated myeloproliferative neoplasms to study how somatic mutations corrupt the complex process of human haematopoiesis. High-resolution mapping of malignant versus normal haematopoietic progenitors revealed an increasing fitness advantage with myeloid differentiation of cells with mutated CALR. We identified the unfolded protein response as a predominant outcome of CALR mutations, with a considerable dependency on cell identity, as well as upregulation of the NF-κB pathway specifically in uncommitted stem cells. We further extended the GoT toolkit to genotype multiple targets and loci that are distant from transcript ends. Together, these findings reveal that the transcriptional output of somatic mutations in myeloproliferative neoplasms is dependent on the native cell identity.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
All of the sequencing data are available via the Gene Expression Omnibus (GEO) under the accession number GSE117826. Any other relevant data are available from the corresponding author upon reasonable request.
The IronThrone GoT pipeline is available on GitHub at https://github.com/landau-lab/IronThrone-GoT.
Sperling, A. S., Gibson, C. J. & Ebert, B. L. The genetics of myelodysplastic syndrome: from clonal haematopoiesis to secondary leukaemia. Nat. Rev. Cancer 17, 5–19 (2017).
Landau, D. A. et al. The evolutionary landscape of chronic lymphocytic leukemia treated with ibrutinib targeted therapy. Nat. Commun. 8, 2185 (2017).
Burger, J. A. et al. Clonal evolution in patients with chronic lymphocytic leukaemia developing resistance to BTK inhibition. Nat. Commun. 7, 11589 (2016).
Landau, D. A. et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013).
Landau, D. A. et al. Mutations driving CLL and their evolution in progression and relapse. Nature 526, 525–530 (2015).
Nangalia, J. et al. Somatic CALR mutations in myeloproliferative neoplasms with nonmutated JAK2. N. Engl. J. Med. 369, 2391–2405 (2013).
Klampfl, T. et al. Somatic mutations of calreticulin in myeloproliferative neoplasms. N. Engl. J. Med. 369, 2379–2390 (2013).
Giustacchini, A. et al. Single-cell transcriptomics uncovers distinct molecular signatures of stem cells in chronic myeloid leukemia. Nat. Med. 23, 692–702 (2017).
Cheow, L. F. et al. Single-cell multimodal profiling reveals cellular epigenetic heterogeneity. Nat. Methods 13, 833–836 (2016).
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
Elf, S. et al. Mutant calreticulin requires both its mutant C-terminus and the thrombopoietin receptor for oncogenic transformation. Cancer Discov. 6, 368–381 (2016).
Defour, J. P., Chachoua, I., Pecquet, C. & Constantinescu, S. N. Oncogenic activation of MPL/thrombopoietin receptor by 17 mutations at W515: implications for myeloproliferative neoplasms. Leukemia 30, 1214–1216 (2016).
Kollmann, K. et al. A novel signalling screen demonstrates that CALR mutations activate essential MAPK signalling and facilitate megakaryocyte differentiation. Leukemia 31, 934–944 (2017).
Marty, C. et al. Calreticulin mutants in mice induce an MPL-dependent thrombocytosis with frequent progression to myelofibrosis. Blood 127, 1317–1324 (2016).
Nivarthi, H. et al. Thrombopoietin receptor is required for the oncogenic function of CALR mutants. Leukemia 30, 1759–1763 (2016).
Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).
Farrell, J. A. et al. Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis. Science 360, eaar313 (2018).
Chen, E. et al. Distinct clinical phenotypes associated with JAK2V617F reflect differential STAT1 signaling. Cancer Cell 18, 524–535 (2010).
Rampal, R. et al. Integrated genomic analysis illustrates the central role of JAK–STAT pathway activation in myeloproliferative neoplasm pathogenesis. Blood 123, e123–e133 (2014).
Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).
Georgantas, R. W. III et al. Microarray and serial analysis of gene expression analyses identify known and novel transcripts overexpressed in hematopoietic stem cells. Cancer Res. 64, 4434–4441 (2004).
Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).
Hetz, C. & Papa, F. R. The unfolded protein response and cell fate control. Mol. Cell 69, 169–181 (2018).
Pronier, E. et al. Targeting the CALR interactome in myeloproliferative neoplasms. JCI Insight 3, e122703 (2018).
Lau, W. W., Hannah, R., Green, A. R. & Göttgens, B. The JAK–STAT signaling pathway is differentially activated in CALR-positive compared with JAK2V617F-positive ET patients. Blood 125, 1679–1681 (2015).
Shivarov, V., Ivanova, M. & Tiu, R. V. Mutated calreticulin retains structurally disordered C terminus that cannot bind Ca2+: some mechanistic and therapeutic implications. Blood Cancer J. 4, e185 (2014).
Zini, R. et al. CALR mutational status identifies different disease subtypes of essential thrombocythemia showing distinct expression profiles. Blood Cancer J. 7, 638 (2017).
Wu, J. et al. ATF6α optimizes long-term endoplasmic reticulum function to protect cells from chronic stress. Dev. Cell 13, 351–364 (2007).
van Galen, P. et al. The unfolded protein response governs integrity of the haematopoietic stem-cell pool during stress. Nature 510, 268–272 (2014).
Lee, A. H., Iwakoshi, N. N. & Glimcher, L. H. XBP-1 regulates a subset of endoplasmic reticulum resident chaperone genes in the unfolded protein response. Mol. Cell. Biol. 23, 7448–7459 (2003).
Cubillos-Ruiz, J. R. et al. ER stress sensor XBP1 controls anti-tumor immunity by disrupting dendritic cell homeostasis. Cell 161, 1527–1538 (2015).
Yoshida, H., Matsui, T., Yamamoto, A., Okada, T. & Mori, K. XBP1 mRNA is induced by ATF6 and spliced by IRE1 in response to ER stress to produce a highly active transcription factor. Cell 107, 881–891 (2001).
Luo, J. L., Kamata, H. & Karin, M. IKK/NF-κB signaling: balancing life and death—a new approach to cancer therapy. J. Clin. Invest. 115, 2625–2632 (2005).
Stein, S. J. & Baldwin, A. S. Deletion of the NF-κB subunit p65/RelA in the hematopoietic compartment leads to defects in hematopoietic stem cell function. Blood 121, 5015–5024 (2013).
Abu-Zeinah, G. et al. Myeloproliferative neoplasm (MPN) driver mutations are enriched during hematopoietic stem cell differentiation in patterns that correlate with clinical phenotype and treatment response. Blood 132, 4317 (2018).
Castro-Malaspina, H., Rabellino, E. M., Yen, A., Nachman, R. L. & Moore, M. A. Human megakaryocyte stimulation of proliferation of bone marrow fibroblasts. Blood 57, 781–787 (1981).
Ciurea, S. O. et al. Pivotal contributions of megakaryocytes to the biology of idiopathic myelofibrosis. Blood 110, 986–993 (2007).
Terui, T. et al. The production of transforming growth factor-beta in acute megakaryoblastic leukemia and its possible implications in myelofibrosis. Blood 75, 1540–1548 (1990).
Obeng, E. A. et al. Physiologic expression of Sf3b1 K700E causes impaired erythropoiesis, aberrant splicing, and sensitivity to therapeutic spliceosome modulation. Cancer Cell 30, 404–417 (2016).
Saikia, M. et al. Simultaneous multiplexed amplicon sequencing and transcriptome profiling in single cells. Nat. Methods 16, 59–62 (2019).
Hill, A. J. et al. On the design of CRISPR-based single-cell molecular screens. Nat. Methods 15, 271–274 (2018).
Kleppe, M. et al. Dual targeting of oncogenic activation and inflammatory signaling increases therapeutic efficacy in myeloproliferative neoplasms. Cancer Cell 33, 785–787 (2018).
Mu, P. et al. SOX2 promotes lineage plasticity and antiandrogen resistance in TP53- and RB1-deficient prostate cancer. Science 355, 84–88 (2017).
Suvà, M. L. et al. Reconstructing and reprogramming the tumor-propagating potential of glioblastoma stem-like cells. Cell 157, 580–594 (2014).
Geyer, J. T. et al. Oligomonocytic chronic myelomonocytic leukemia (chronic myelomonocytic leukemia without absolute monocytosis) displays a similar clinicopathologic and mutational profile to classical chronic myelomonocytic leukemia. Mod. Pathol. 30, 1213–1222 (2017).
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Bolker, B. M. et al. Generalized linear mixed models: a practical guide for ecology and evolution. Trends Ecol. Evol. 24, 127–135 (2009).
Liaw, A. & Wiener, M. Classification and regression by randomForest. R News 2, 18–22 (2002).
Reinius, B. & Sandberg, R. Random monoallelic expression of autosomal genes: stochastic transcription and allele-level regulation. Nat. Rev. Genet. 16, 653–664 (2015).
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Ntranos, V., Yi, L., Melsted, P. & Pachter, L. A discriminative learning approach to differential expression analysis for single-cell RNA-seq. Nat. Methods 16, 163–166 (2019).
Reimand, J. et al. g:Profiler–a web server for functional interpretation of gene lists (2016 update). Nucleic Acids Res. 44, W83–W89 (2016).
Rhee, J. K., Lee, S., Park, W. Y., Kim, Y. H. & Kim, T. M. Allelic imbalance of somatic mutations in cancer genomes and transcriptomes. Sci. Rep. 7, 1653 (2017).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Forbes, S. A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2017).
Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 174, 1034–1035 (2018).
Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protocols 4, 1184–1191 (2009).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
The work was enabled by the Weill Cornell Epigenomics Core and Flow Cytometry Core. We thank A. Mullally (Brigham and Women’s Hospital) for sharing the cell lines for the species-mixing study, and N. Kuchine (Weill Cornell Medicine) for helping us to acquire one of the patient samples. R.C. is supported by Lymphoma Research Foundation and Marie Skłodowska-Curie fellowships. R.M.M. is supported by a Medical Scientist Training Program grant from the National Institute of General Medical Sciences of the National Institutes of Health, awarded to the Weill Cornell, Rockefeller University and Memorial Sloan Kettering Cancer Center Tri-Institutional MD-PhD Program (T32GM007739). G.A.-Z. and J.M.S. are supported by Cancer Research & Treatment Fund (CR&T). J.R.C.-R. is supported by the Stand Up to Cancer Innovative Research Grant (SU2C-AACR-IRG-03-16) and Department of Defense Early-Career Investigator Award (W81XWH-16-1-0438). D.A.L. is supported by the Burroughs Wellcome Fund Career Award for Medical Scientists, the American Society of Hematology Scholar Award, Pershing Square Sohn Prize for Young Investigators in Cancer Research and the National Institutes of Health Director’s New Innovator Award (DP2-CA239065). This work was also supported by the Leukemia Lymphoma Society Translational Research Program, Columbia University Physical Sciences in Oncology Center Pilot Grant (U54CA193313), National Heart Lung and Blood Institute (R01HL145283-01) and Stand Up To Cancer Innovative Research Grant (SU2C-AACR-IRG-0616). Stand Up To Cancer is a program of the Entertainment Industry Foundation. Research grants are administered by the American Association for Cancer Research, the scientific partner of SU2C.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Comparison of VAF between WES and RNA-seq, and primer sequences and positions of linear and circularization GoT.
a, Pie charts show the fraction of variants, which are categorized as described in the top panel. The distribution of the mutant allele fraction is annotated as oncogene or tumour-suppressor gene (TSG) (according to previously published definitions60,61). Diagonal dashed lines indicate an equal allelic fraction between WES and RNA-seq. Yellow density contours represent driver distributions. BRCA, breast invasive carcinoma; HNSC, head and neck squamous cell carcinoma; KIRC, kidney renal clear cell carcinoma; LUAD, lung adenocarcinoma; STAD, stomach adenocarcinoma. b, Schematic localization of primers for linear GoT and circularization GoT for 3′ and 5′ libraries. c, Primer positions and sequences of the regions targeted by GoT and circularization GoT.
Extended Data Fig. 2 Optimization of parameters in processing targeted amplicon sequences in the IronThrone GoT pipeline.
a, Representation of amplicon reads. b, Flow chart of the GoT analysis pipeline (Methods). CB, cell barcode. c, Mouse (green) and human (blue) genome alignment of 10x data (y axes) with genotyping data by GoT (x axes) with various thresholds for minimum duplicate reads (across) and maximum mismatch ratio (down). d, Results of precision, recall and F1 score analysis for combinations of minimum duplicate reads and maximum mismatch ratios. e, Measure of the importance of each variable used for the calculation of splits in trees in random-forest classification test. f, Ratio of cell loss and genotyping errors (z-score on y axis) based on mismatch ratio thresholds (x axis). The area of intersection is highlighted in grey around the mismatch ratio 0.2. g, Heat maps showing z-scores of the number of filtered cells (left) and predicted error rates (right) from random-forest classification tests for combinations of minimum duplicate reads and maximum mismatch ratio thresholds.
a, Percentage of cells by number of UMIs with the CALR-mutation locus captured in standard 10x data (left panels) and GoT data (right panels) (see c for cell numbers in each sample). b, Number of UMIs per cell of CALR transcript from standard 10x data (blue shading) or targeted CALR locus from standard 10x or GoT (pink shading) (see c for cell numbers in each sample). c, Summary of clinical, pathological and GoT data from patients with CALR-mutated myeloproliferative neoplasms. BM, bone marrow; PB, peripheral blood. d, Number of genes per cell (left) and number of UMIs per cell (right) from published standard 10x data of healthy control CD34+ cells and 10x data from 3′ v.2 chemistry of CD34+ cells from patient samples that underwent concurrent GoT, after random downsampling of the reads from each sample to 50 million reads × 3 iterations, showing that the extra cycle of PCR and portioning a small aliquot from the 10x cDNA library for GoT using 3′ v.2 chemistry does not compromise scRNA-seq data.
Extended Data Fig. 4 Integration of samples from patients with essential thrombocythaemia and assignment of progenitor subsets.
a, t-SNE projection of CD34+ progenitor cells from samples ET01–ET05, after integration and batch correction using the Seurat package (Methods). b, Heat map of top ten differentially expressed genes for clusters; lineage-specific genes from a previous publication26 are highlighted (Methods). c, Representative lineage-specific genes projected onto the t-SNE representation of CD34+ cells from samples from patients with essential thrombocythaemia. d, t-SNE projection of CD34+ cells from samples ET01–ET05 after applying a deep generative modelling approach for the single-cell analysis using the scVI package (Methods)19, showing assignments of progenitor subsets as determined after clustering the cells using the Seurat package. e, Genotyping data from GoT are projected onto the t-SNE representation generated after the scVI analysis of progenitor cells from samples ET01–ET05. Cells without any GoT data are labelled NA (not assignable).
Extended Data Fig. 5 Results of GoT analysis are robust to various amplicon UMI thresholds and linear modelling.
a, Frequency of wild-type and mutant cells in HSPCs and MkPs with variable minimum genotyping UMI thresholds (two-sided Fisher’s exact test; see Supplementary Table 6 for sample size). b, Pseudotime comparison between wild-type and mutant cells with an increasing number of thresholds for targeted genotyping UMI (two-sided t-test; see Supplementary Table 6 for sample size). c, Pseudotime comparison between mutant and wild-type cells with UMI threshold of 1 (same datasets as b), with statistical test using a generalized linear model including mutation status and total number of amplicon UMIs per cell. d, Across 100 iterations, the genotyping amplicon UMIs were downsampled to one per cell and the mutant-cell frequency was determined for MkPs or precursor B cells. This frequency was then divided by the total mutant-cell frequency across all progenitor subsets for each of the 100 iterations. Mean ± s.d. after n = 100 downsampling iterations (two-sided Wilcoxon rank-sum test). Essential thrombocythaemia samples with at least 20 cells in each cluster were analysed. e, VAF of CALR mutation in CD34+CD38− (left), CD34+CD38+ (middle) and CD34+CD10+ (right) FACS-sorted peripheral blood cells from patients with essential thrombocythaemia determined by ddPCR.
a, S-phase and G2- and M-phase gene-module expression in wild-type and mutant cells, in HSPC and MkP clusters from essential thrombocythaemia samples. Cell-cycle module score represents the sum of S-phase and G2- and M-phase gene-module expression (two-sided Wilcoxon rank-sum test; see Methods and Supplementary Table 6 for sample size). Analysis was performed for clusters with at least 20 cells. b, Ratio of committed-progenitor priming-module expression of mutant and wild-type HSPCs. One mutant and one wild-type HSPC were randomly sampled from samples ET01–ET05 for each round of analysis (n = 1,000 iterations, two-sided Wilcoxon rank-sum test).
Extended Data Fig. 7 ATF6 and IRE1 branches of the UPR are activated in CALR-mutated progenitor cells.
By sample (ET01–ET05), expression of ATF6-, PERK- and XBP1-target genes in the UPR in CALR wild-type and mutant MkPs and HSPCs (two-sided Wilcoxon rank-sum test).
Extended Data Fig. 8 CALR-mutated haematopoietic progenitor cells from myelofibrosis show upregulation of the IRE1-mediated UPR.
a, t-SNE projection of CD34+ progenitor cells from samples MF01–MF04, after integration and batch correction using the Seurat package (Methods) (n = 11,093). b, Left, t-SNE projection of CD34+ progenitor cells from samples MF01–MF04 labelled with pseuodotime21 (n = 11,093). Right, pseudotime comparison between wild-type (n = 2,221) and mutant (n = 7,483) cells. P values from likelihood ratio tests of linear mixed model with genotype as fixed effect and individual patient samples as random effect, against the model without the genotype effect (Methods). c, Cell-cycle module score comparison between wild-type and mutant cells in patients with myelofibrosis (two-sided Wilcoxon rank-sum test). d, Ratio of TGFβ-signalling-pathway gene expression of mutant and wild-type MkPs. One mutant and one wild-type MkP were randomly sampled for each round of analysis (n = 100 iterations; two-sided Wilcoxon-rank sum test). e, Differentially expressed genes between wild-type MkPs with high cell-cycle expression (n = 220) and wild-type MkPs with low cell-cycle expression (n = 110), common across samples MF02–MF04. P values were combined using Fisher’s combined test with Benjamini–Hochberg adjustments. Weighted average of fold change (expressed in log2) based on cell number across samples is shown (Methods).
Extended Data Fig. 9 Deciphering subclonal progenitor identities using multiplex GoT, and targeting loci that are distant from transcript ends using circularization GoT.
a, Single-cell cloning assay of peripheral blood cells from patient MF05 (Methods). b, Rate of targeted locus capture (per cent) as a function of gene expression and the distance of the targeted locus from the transcript ends. c, Distance of the mutation locus from transcript ends for pan-cancer drivers, and their frequencies (based on the number of times they are reported in the COSMIC database). Mutations are annotated as oncogenes, tumour-suppressor genes or passengers (as previously defined60,61). Relative density of each subclass of mutations from the closer end (that is, 3′ or 5′) is shown in the top panel. d, Schematic of analysis of ONT sequencing reads. e, Frequency of SF3B1-mutant and wild-type reads of linear GoT amplicon library sequenced with ONT. f, Analysis of SF3B1 amplicon reads sequenced by ONT for inter-transcript PCR recombination by mapping 50 bp at the opposite end of the targeted locus, showing only the 2.2% of fragments that reflect inter-transcript recombination. g, Pairwise difference of read lengths for duplicate reads (that is, reads with the same cell barcode and UMI) of the SF3B1 amplicon library sequenced with ONT, showing consistent read lengths of duplicate reads that support a low rate of intra-transcript PCR recombination. h, Comparison of genotype assignment for CALR in sample MF01 between linear GoT and circularization GoT after downsampling reads to 300,000 with 10 iterations (n = 320 cells). i, Comparison of CALR-mutant UMI fraction per cell in sample MF01 between linear GoT and circularization GoT after downsampling reads to 300,000 with 10 iterations (n = 320 cells, Pearson’s correlation, F-test).
a, Fraction of reads with cell barcodes that are not perfectly matched to the whitelisted cell barcodes from the species-mixing experiment. ‘>Hamm-1’ denotes filtered reads with barcodes that are more than one Hamming distance away from whitelisted barcodes (n = 139,422 reads). ‘Not significant’ denotes filtered reads with barcodes that are one Hamming distance away from the whitelisted barcodes, but which have a low probability of originating from the barcode (posterior probability < 0.99, n = 14,830 reads). ‘Replaced’ denotes rescued reads with barcodes that have candidates that are one Hamming distance away from the whitelisted barcodes, with statistical significance (posterior probability ≥ 0.99, n = 224,085 reads). b, c, Number of supporting reads per candidate barcode and base quality at the differing base positions (b) and across base positions (c). Two-sided Wilcoxon rank-sum tests were applied to compare not significant (n = 14,830) and replaced (n = 224,085) barcodes. d, Correlation between the number of supporting reads per candidate barcode and median base quality at the differing base (two-tailed Pearson’s correlation, F-test). e, Distribution of prior and posterior probabilities from not significant (n = 14,830) and replaced (n = 224,085) barcodes. The dashed red line represents the posterior probability cut-off (0.99). f–h, To further evaluate the efficiency of barcode replacement, we generated synthetic cell barcodes by randomly changing one base in whitelisted cell barcodes (n = 100 iterations). f, Percentage of reads with cell barcodes that are not identical to the whitelisted cell barcodes (n = 100 iterations). Percentages of replaced reads were 99.1% ± 0.001% (median ± absolute deviation) in simulations with 1 base changed, 1.1% ± 0.002% in simulations with 2 bases changed and 0.7 ± 0.001% in simulations with 3 bases changed. g, Determination of whether replaced cell barcodes are identical to the original cell barcodes. In simulations with 1 base change, the percentage of reads with replaced cell barcodes that were identical to the original cell barcodes was 100 ± 0% (median ± absolute deviation of 100 iterations). h, Estimation of prediction power for classifying cell barcodes from simulations with 1 base changed (n = 100 iterations).
Summary of patients’ clinical history and pathology.
Differential gene expression analysis between mutant and wild-type cells for each of the progenitor cluster for each patient (see Supplementary Table 6 for sample size) was performed using the logistic regression framework57 within the Seurat package with UMI included as a latent variable. P-values from this test combined using Fisher combine test with Benjamini-Hochberg adjustment. A weighted average of the log2(fold change) across the samples was performed.
Gene set enrichment analysis (g:Profiler). Hypergeometric test for gene set enrichment analysis was performed using the gProfileR package (v. 0.6.7)58. See tables for size of terms used for the test. Multiple hypothesis testing correction was performed using the g:SCS algorithm developed by the authors of the gProfileR package.
List of primers used in GoT and single cell colony assay.
Cell number for each patient and progenitor subsets.