Bulk and single-cell DNA sequencing has enabled reconstructing clonal substructures of somatic tissues from frequency and cooccurrence patterns of somatic variants. However, approaches to characterize phenotypic variations between clones are not established. Here we present cardelino (https://github.com/single-cell-genetics/cardelino), a computational method for inferring the clonal tree configuration and the clone of origin of individual cells assayed using single-cell RNA-seq (scRNA-seq). Cardelino flexibly integrates information from imperfect clonal trees inferred based on bulk exome-seq data, and sparse variant alleles expressed in scRNA-seq data. We apply cardelino to a published cancer dataset and to newly generated matched scRNA-seq and exome-seq data from 32 human dermal fibroblast lines, identifying hundreds of differentially expressed genes between cells from different somatic clones. These genes are frequently enriched for cell cycle and proliferation pathways, indicating a role for cell division genes in somatic evolution in healthy skin.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Substantial somatic genomic variation and selection for BCOR mutations in human induced pluripotent stem cells
Nature Genetics Open Access 11 August 2022
Nature Communications Open Access 08 March 2022
Genome Biology Open Access 15 December 2021
Subscribe to Nature+
Get immediate online access to Nature and 55 other Nature journal
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
scRNA-seq data have been deposited in the ArrayExpress database at EMBL-EBI (www.ebi.ac.uk/arrayexpress) under accession number E-MTAB-7167. WES data is available through the HipSci portal (www.hipsci.org). The lines used in this study have the identifiers: euts, fawm, feec, fikt, garx, gesg, heja, hipn, ieki, joxm, kuco, laey, lexy, naju, nusw, oaaz, oilg, pipw, puie, qayj, qolg, qonc, rozh, sehl, ualf, vass, vils, vuna, wahn, wetu, xugn, zoxy. Metadata, processed data and large results files are available at https://doi.org/10.5281/zenodo.1403510
The cardelino methods are implemented in an open-source, publicly available R package (github.com/single-cell-genetics/cardelino). The code used to process and analyse the data is available (github.com/davismcc/fibroblast-clonality), with a reproducible workflow implemented in Snakemake64. Descriptions of how to reproduce the data processing and analysis workflows, with html output showing code and figures presented in this paper, are available at davismcc.github.io/fibroblast-clonality. Docker images providing the computing environment and software used for data processing (hub.docker.com/r/davismcc/fibroblast-clonality/) and data analyses in R (hub.docker.com/r/davismcc/r-singlecell-img/) are publicly available.
Burnet, F. M. Intrinsic mutagenesis: a genetic basis of ageing. Pathology 6, 1–11 (1974).
Martincorena, I. & Campbell, P. J. Somatic mutation in cancer and normal cells. Science 349, 1483–1489 (2015).
Stransky, N. et al. The mutational landscape of head and neck squamous cell carcinoma. Science 333, 1157–1160 (2011).
Hodis, E. et al. A landscape of driver mutations in melanoma. Cell 150, 251–263 (2012).
Huang, K.-L. et al. Pathogenic germline variants in 10,389 adult cancers. Cell 173, 355–370.e14 (2018).
Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Forbes, S. A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2017).
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385.e18 (2018).
Ding, L. et al. Perspective on oncogenic processes at the end of the beginning of cancer genomics. Cell 173, 305–320.e10 (2018).
Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods 11, 396 (2014).
Deshwar, A. G. et al. PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol. 16, 35 (2015).
Jiang, Y., Qiu, Y., Minn, A. J. & Zhang, N. R. Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing. Proc. Natl Acad. Sci. USA 113, E5528–E5537 (2016).
Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94 (2011).
Wang, Y. et al. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature 512, 155–160 (2014).
Navin, N. E. The first five years of single-cell cancer genomics and beyond. Genome Res. 25, 1499–1507 (2015).
Kim, K. I. & Simon, R. Using single cell sequencing data to model the evolutionary history of a tumor. BMC Bioinf. 15, 27 (2014).
Navin, N. E. & Chen, K. Genotyping tumor clones from single-cell data. Nat. Methods 13, 555–556 (2016).
Jahn, K., Kuipers, J. & Beerenwinkel, N. Tree inference for single-cell data. Genome Biol. 17, 86 (2016).
Kuipers, J., Jahn, K., Raphael, B. J. & Beerenwinkel, N. Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the life histories of tumors. Genome Res. 27, 1885–1894 (2017).
Roth, A. et al. Clonal genotype and population structure inference from single-cell tumor sequencing. Nat. Methods 13, 573–576 (2016).
Salehi, S. et al. ddClone: joint statistical inference of clonal populations from single cell and bulk tumour sequencing data. Genome Biol. 18, 44 (2017).
Malikic, S. et al. Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data. Nat. Commun. 10, 2750 (2019).
Müller, S. et al. Single‐cell sequencing maps gene expression to mutational phylogenies in PDGF‐ and EGF‐driven gliomas. Mol. Syst. Biol. 12, 889 (2016).
Tirosh, I. et al. Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma. Nature 539, 309–313 (2016).
Fan, J. et al. Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data. Genome Res. 28, 1217–1227 (2018).
Campbell, K. R. et al. clonealign: statistical integration of independent single-cell RNA and DNA sequencing data from human cancers. Genome Biol. 20, 54 (2019).
Giustacchini, A. et al. Single-cell transcriptomics uncovers distinct molecular signatures of stem cells in chronic myeloid leukemia. Nat. Med. 23, 692–702 (2017).
Cheow, L. F. et al. Single-cell multimodal profiling reveals cellular epigenetic heterogeneity. Nat. Methods 13, 833–836 (2016).
Saikia, M. et al. Simultaneous multiplexed amplicon sequencing and transcriptome profiling in single cells. Nat. Methods 16, 59–62 (2019).
Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).
Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017).
Williams, M. J. et al. Quantification of subclonal selection in cancer from bulk sequencing data. Nat. Genet. 50, 895–903 (2018).
Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 173, 1823 (2018).
Simons, B. D. Deep sequencing as a probe of normal stem cell fate and preneoplasia in human epidermis. Proc. Natl Acad. Sci. USA 113, 128–133 (2016).
Williams, M. J., Werner, B., Barnes, C. P., Graham, T. A. & Sottoriva, A. Identification of neutral tumor evolution across cancer types. Nat. Genet. 48, 238 (2016).
Ramaker, R. C. et al. RNA sequencing-based cell proliferation analysis across 19 cancers identifies a subset of proliferation-informative cancers with a common survival signature. Oncotarget. 8, 38668–38681 (2017).
Kowalczyk, M. S. et al. Single-cell RNA-seq reveals changes in cell cycle and differentiation programs upon aging of hematopoietic stem cells. Genome Res. 25, 1860–1872 (2015).
Tsang, J. C. H. et al. Single-cell transcriptomic reconstruction reveals cell cycle and multi-lineage differentiation defects in Bcl11a-deficient hematopoietic stem cells. Genome Biol. 16, 178 (2015).
Kolodziejczyk, A. A. et al. Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell 17, 471–485 (2015).
Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).
Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).
Guo, H. et al. Single-cell methylome landscapes of mouse embryonic stem cells and early embryos analyzed using reduced representation bisulfite sequencing. Genome Res. 23, 2126–2135 (2013).
Smallwood, S. A. et al. Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nat. Methods 11, 817–820 (2014).
Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014).
Streeter, I. et al. The human-induced pluripotent stem cell initiative—data resources for cellular genetics. Nucleic Acids Res. 45, 691–697 (2016).
Church, D. M. et al. Modernizing reference genome assemblies. PLoS Biol. 9, e1001091 (2011).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv [q-bio.GN] (2013).
Li, H. et al. The sequence alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Karczewski, K. J. et al. The ExAC browser: displaying reference data information from over 60 000 exomes. Nucleic Acids Res. 45, D840–D845 (2017).
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Fisher, R. A. On the interpretation of χ2 from contingency tables, and the calculation of P. J. R. Stat. Soc. 85, 87–94 (1922).
Gori, K. & Baez-Ortega, A. sigfit: flexible Bayesian inference of mutational signatures. Preprint at bioRxiv https://doi.org/10.1101/372896 (2018).
Flicek, P. et al. Ensembl 2014. Nucleic Acids Res. 42, D749–D755 (2014).
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
McCarthy, D. J., Campbell, K. R., Lun, A. T. L. & Wills, Q. F. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33, 1179–1186 (2017).
Lun, A. T. L., Bach, K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).
Hoffman, G. E. & Schadt, E. E. variancePartition: interpreting drivers of variation in complex gene expression studies. BMC Bioinf. 17, 483 (2016).
Lund, S. P., Nettleton, D., McCarthy, D. J. & Smyth, G. K. Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Stat. Appl. Genet. Mol. Biol. 11, https://doi.org/10.1515/1544-6115.1826 (2012).
Soneson, C. & Robinson, M. D. Bias, robustness and scalability in single-cell differential expression analysis. Nat. Methods 15, 255–261 (2018).
Wu, D. & Smyth, G. K. Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Res. 40, e133 (2012).
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
Ignatiadis, N., Klaus, B., Zaugg, J. B. & Huber, W. Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nat. Methods 13, 577–580 (2016).
Köster, J. & Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).
Smyth, G. K. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, 1–25 (2004).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
We thank D. Jörg for highly constructive discussions and P. Qiao for valuable comments on the manuscript. We acknowledge the Wellcome Sanger Institute Cellular Genetics and Phenotyping teams (in particular, A. Alderton, C. Gomez, R. Boyd, S. Patel and S. Barnett) and DNA pipelines for their invaluable assistance in generating the data for this study. We thank G. Kildisiute for assisting in CNV analysis of the fibroblast lines. This project was supported by Wellcome Sanger core funding (WT206194) and the Human Induced Pluripotent Stem Cell Initiative. Research in the Stegle laboratory is supported by the BMBF, the Volkswagen Foundation, the Chan Zuckerberg Initiative and the EU (ERC project DECODE, grant agreement 732546). D.J.M. is supported by the National Health and Medical Research Council of Australia (grants APP1112681 and APP1162829), seed funding from the Baker Foundation and the Holyoake Research Fellowship at St Vincent's Institute of Medical Research and the University of Melbourne. R.R. is supported the BBSRC Doctoral Training Programme. Y.H. is supported by the University of Cambridge and EMBL-EBI through an EBPOD postdoctoral fellowship. D.J.K. is supported by the Wellcome Trust under grants 203828/Z/16/A and 203828/Z/16/Z. T.H. is supported by a Human Frontier Science Program Fellowship, an EMBO Long-term Fellowship and an EMBO Advanced Fellowship.
The authors declare no competing interests.
Peer review information Nicole Rusk and Lin Tang were the primary editors on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
The clonal tree configuration matrix C is a random variable and follows a Bernoulli distribution encoded by an input tree configuration Ω that is provided to the model (for example estimated from bulk or single-cell DNA-seq data using existing methods such as Canopy) as well as an error rate ξ, which follows a beta prior distribution with hyper parameters 𝜅. The indicator matrix I defines the assignment of cells to clones, which is another unknown variable, and assumed to follow a multinomial prior with fixed parameter 𝜋 for each cell. The clone configuration C and cell identity I together encode the genotype ci,Ij of each variant i in each cell j. If ci,Ij is 1, the alternative allelic read count will follow a binomial distribution with gene specific parameter 𝜃i, otherwise with error related parameter 𝜃0. Both 𝜃i and 𝜃0 have a beta prior distribution, but with different parameters. Shaded nodes represent observed variables; unshaded nodes represent unknown variables; yellow circled nodes represent fixed hyper parameters.
Supplementary Figure 2 Distribution of key data characteristics from experimental scRNA-seq data from 32 fibroblast cell lines, used as the basis of parameter settings in simulations.
(a) Number of clones inferred from bulk exome-seq data. (b) The median number of variants per clonal branch; (c) The overall coverage of variants, namely the fraction of variants with at least one read. (d) Scatter plot between the mean number of reads per variant per cell and the overall coverage of variants in the same line. The default parameters used in simulations are highlighted with the red line.
Supplementary Figure 3 Simulation results evaluating the inferred relax (error) rate in the configuration of variants in the guide clonal tree.
(a) The estimated relax rate as a function of the simulated error rates. Errors are simulated by uniformly swapping the mutation states in the configuration matrix of the guide clonal tree, which means that a clone may contain false mutations in the guide clonal tree provided to cardelino (except in the case of the base clone which has no mutations under any simulation conditions). (b) The estimated relax rate across different fractions of variants that have wrong branch configuration. Errors are added by swapping branches for variants.
Supplementary Figure 4 Additional results from assessing cardelino and alternative methods using simulated data.
Assessment of cell assignment to clones across a variety of simulation settings, considering SingleCellGenotyper (SCG), Demuxlet, cardelino and its two versions: cardelino-free without any informative clone configuration prior and cardelino-fixed assuming that the clone configuration prior is correct (Methods; Supplementary Note). All methods were applied to simulated data with known ground truth, varying (a) the number of informative variants per clonal branch, (b) the fraction of informative variants covered (that is, nonzero scRNA-seq read coverage), (c) the total number of clones, (d) the precision (i.e., inverse variance) of allelic ratio across genes; lower precision means more genes with high allelic imbalance, (e) the rate of general errors of mutation states in the clone configuration matrix, (f) the fraction of wrongly clustered variants in the input clonal tree branch. Default parameter values are marked with an asterisk and are retained when varything other parameters. (g) The effects of the tree topology on the cell assignment accuracy. In the simulations there are 50 repeats for each parameter, where one of the tree topology candidates is randomly selected in each repeat. For the four-clone configurations, there are four different tree topologies (upper panel), and their performance (area under the precision-recall curve) for the five different methods are splitted (bottom panel).
Supplementary Figure 5 Estimated mutational signature exposures based upon the tri-nucleotide context of somatic SNVs called from whole-exome sequencing (WES) data for n=32 HipSci human fibroblast lines.
The x-axis shows 30 COSMIC mutational signatures, in order, and the y-axis shows estimated exposures (mutation fraction) using the sigfit package (Methods), with significant signatures highlighted in blue. Across lines, the only significant signatures are Signature 7 (UV mutagenic process) and Signature 11.
Supplementary Figure 6 Variant allele frequency (VAF) distributions for somatic variants called from whole exome sequencing data for the 32 fibroblast lines.
The grey lines indicate the minimum allele-frequency threshold (0.05) used for variants for this analysis (Methods). The blue lines indicate the model (neutral/selected) inferred by SubClonalSelection (shading 95% confidence interval). Donors with a selection probability below 0.3 are classified as ‘neutral’, above 0.7 as ‘selected’. Donors which are neither ‘selected’ nor ‘neutral’ remain ‘undetermined’. High confidence ‘selected’ lines (selection probability >0.7 and >100 somatic variants) are: joxm, wahn, garx, vass, ualf, euts, pipw, oilg, feec, fikt, qolg, and puie.
Supplementary Figure 7 Comparison of five methods on simulated data matching 32 fibroblast cell lines and estimated error rate and cell assignability with cardelino from experimental data for 32 fibroblast lines.
(a) Assessment of cell assignment to clones across a variety of simulation settings, considering SingleCellGenotyper (SCG), Demuxlet, cardelino, cardelino-free and cardelino-fixed (Methods; Supplementary Note). Considered are simulated data based on empirical characteristics observed in 32 fibroblast lines. For each line, the sequence coverage, clone configuration (i.e., number of clones, variants on each branch), and allelic imbalance parameters were obtained to derive simulation parameters. 200 cells are synthesised per line and a guide clonal tree with 10% errors in allocation of variants to clones. (b) Estimated error rate in the clonal tree configuration derived from bulk exome-seq data (based on cardelino) for each of 32 lines versus fraction of confidently assigned cells (>90% of cells assigned for 23 lines; at cardelino posterior probability P>0.5 for most-probable clone).
Supplementary Figure 8 Comparison of cell assignment between five methods on experimental data across 32 fibroblast lines.
(a) The fraction of assignable cells (i.e., highest P > thresholds) when varying the thresholds from 0.5 to 0.95. Shown are box plots depicting median and the first and third quantiles of the 32 lines. (b) The adjusted Rand index of cell assignment to clones between the five considered methods. The values are averaged across 32 fibroblast lines. (c) Scatter plot between the uncertainty of the inferred tree from cardelino-free (x-axis) and the mean absolute difference of the assignment probability between cardelino-free and cardelino (y-axis). The output posterior clonal configuration matrix from cardelino-free consists of the probability of each variant being present in each clone. A completely uninformative clonal tree would have all entries equal to 0.5. Thus, we measure the uncertainty of the output tree from cardelino-free by taking 0.5 minus the mean absolute difference of the posterior probability configuration matrix and the uninformative configuration probability matrix of all of entries equal to 0.5. With this measure, a value of 0.5 indicates a posterior configuration indistinguishable from the uninformative configuration and a value of 0 indicates very high-confidence from the model in the posterior configuration. (d) The comparison of cell assignment for one representative line (feec) when using different guide clonal trees sampled from Canopy’s posterior distribution as input. Each violin plot shows the adjusted Rand index of cell assignment between each of 435 tree pairs combining the 30 most probable trees from bulk exome-seq for the feec line. (e) Cell assignment similarity for each of the 32 lines when using different guide clonal trees, quantified with adjusted Rand index values between different pairs of guide clonal trees. For each line, we take the 30 most probable posterior trees from Canopy, and then each dot in the box plot denotes the average adjusted Rand index value for one line, calculated from 435 of these pairwise comparisons.
(a) Scatter plot of the fraction of cells assigned in each cell line using cardelino (at posterior probability > 0.5) as a function of the minimum number of clone-specific variants for the corresponding line (minimum Hamming distance between clones for a given donor), for 32 fibroblast lines. Total number of cells that were considered for this analysis (QC passed) per line indicated by colour. (b) Scatter plot of recall (assignment rate) versus precision (assignment accuracy) when assigning cells using cardelino (at posterior probability > 0.5). Shown are data from for 32 simulated lines, using parameters that match the observed data characteristics in the set of 32 real fibroblast lines (Methods). The average number of variants per clonal branch (i.e., #variant/(#clone - 1)) is shown by point colour (slightly different from Supplementary Fig. 4 which uses the minimum number of variants distinguishing between pairs of clones, as shown in Fig. 3a). Lines with fewer informative variants per branch tend to have lower assignment rates, but the precision remains high.
Supplementary Figure 10 Clone prevalence estimates from WES data (x-axis; using Canopy) versus the fraction of single-cell transcriptomes assigned to the clone (y-axis; using cardelino), for each clone across lines.
Points are coloured by the overall fraction of single-cell transcriptomes assigned for a given line (i.e. cells with posterior P>0.5 for assignment).
Volcano plot showing negative log P values versus log2-fold change from testing differential expression for genes with a somatic mutation between cells with the mutation and cells without the mutation, faceted by VEP annotation category (Methods). Each point represents a gene, and box plots show the overall log2-fold change distribution for each annotation category. DE tests (two-sided QL F test in edgeR) are conducted within each line (donor) separately, and the results shown here are aggregated across n=32 lines. Genes are categorised by simplified functional annotations from VEP of the somatic mutation, and genes significantly DE at an FDR threshold of 20% are shown in red.
(a) Heatmap showing Spearman correlation between gene set enrichment results for the 16 most frequently enriched MSigDB Hallmark gene sets across 31 lines. Colour indicates the correlation between pairs of gene sets and is only shown if the correlation is significant (P < 0.05). (b) Heatmap showing proportion of overlap in genes between pairs of gene sets (matching those in left panel). (c) Heatmap showing the direction (first listed clone relative to second listed clone; in colour) and strength of enrichment (-log10(P) as degree of shading) for Hallmark gene sets tested with camera (Methods) for all pairwise comparisons between clones across n=31 lines. Gene sets that are significantly enriched at an FDR threshold of 5% are indicated with dots. Gene sets are shown if significant in at least one line and are ordered by number of lines in which they are significant.
(a) Number of cells assigned by cardelino to each inferred clone for five melanoma patients, stratified by cell type identified using gene expression of marker genes as in the original publication 37. (b) Gene set enrichment analysis results when comparing gene expression in clone1 cells to cells in other clones, within each patient, including cells from all cell types. Given that immune cells and cancer-associated fibroblast (CAF) cells are almost all assigned to clone1, this comparison effectively reflects expression differences between melanoma and immune cells. (c) Gene set enrichment analysis results when considering all pairwise comparisons between clones consisting of melanoma cells only. The heatmaps in (b) and (c) depict signed P-values of gene set enrichment (n=31 cell lines; two-sided test using camera) for Hallmark gene sets found to be significantly enriched (FDR<0.05) in at least one comparison. Dots denote significant enrichments. For details on the cell assignment and gene set enrichment analyses see Supplementary Note. (d) Heatmap showing correlations between gene set enrichment results when using all cells (across melanoma, immune and cancer-associated fibroblast cell types) assigned to clones across five melanoma patients and comparing expression of cells assigned to clone1 to those assigned to other clones. (e) Heatmap showing correlations between gene set enrichment results when using all melanoma cells assigned to clones across five melanoma patients and comparing expression of cells between all pairs of clones (for which the clones have sufficiently many cells assigned). For both (d) and (e), the eatmap shows Spearman correlation between gene set enrichment results for the 16 most frequently enriched MSigDB Hallmark gene sets across n=5 patients. Colour indicates the correlation between pairs of gene sets and is only shown if the correlation is significant (P < 0.05).
About this article
Cite this article
McCarthy, D.J., Rostom, R., Huang, Y. et al. Cardelino: computational integration of somatic clonal substructure and single-cell transcriptomes. Nat Methods 17, 414–421 (2020). https://doi.org/10.1038/s41592-020-0766-3
This article is cited by
Nature Communications (2022)
Substantial somatic genomic variation and selection for BCOR mutations in human induced pluripotent stem cells
Nature Genetics (2022)
Genome Biology (2021)
Genome Biology (2021)