Single-cell transcriptomic analysis is widely used to study human tumors. However, it remains challenging to distinguish normal cell types in the tumor microenvironment from malignant cells and to resolve clonal substructure within the tumor. To address these challenges, we developed an integrative Bayesian segmentation approach called copy number karyotyping of aneuploid tumors (CopyKAT) to estimate genomic copy number profiles at an average genomic resolution of 5 Mb from read depth in high-throughput single-cell RNA sequencing (scRNA-seq) data. We applied CopyKAT to analyze 46,501 single cells from 21 tumors, including triple-negative breast cancer, pancreatic ductal adenocarcinoma, anaplastic thyroid cancer, invasive ductal carcinoma and glioblastoma, to accurately (98%) distinguish cancer cells from normal cell types. In three breast tumors, CopyKAT resolved clonal subpopulations that differed in the expression of cancer genes, such as KRAS, and signatures, including epithelial-to-mesenchymal transition, DNA repair, apoptosis and hypoxia. These data show that CopyKAT can aid in the analysis of scRNA-seq data in a variety of solid human tumors.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
scRNA-seq data from this study were deposited in the Gene Expression Omnibus (GEO; GSE148673).
Software is available at GitHub (https://github.com/navinlabcode/copykat).
Peng, J. et al. Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma. Cell Res. 29, 725–738 (2019).
Ma, L. et al. Tumor cell biodiversity drives microenvironmental reprogramming in liver cancer. Cancer Cell 36, 418–430 (2019).
Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014).
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
Zheng, G. X. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Gao, R. et al. Nanogrid single-nucleus RNA sequencing reveals phenotypic diversity in breast cancer. Nat. Commun. 8, 228 (2017).
Gierahn, T. M. et al. Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nat. Methods 14, 395–398 (2017).
Taylor, A. M. et al. Genomic and functional approaches to understanding cancer aneuploidy. Cancer Cell 33, 676–689 (2018).
Fan, J. et al. Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data. Genome Res. 28, 1217–1227 (2018).
Freeman, M. F. & Tukey, J. W. Transformations related to the angular and the square root. Ann. Math. Stat. 21, 607–611 (1950).
Petris, G. An R package for dynamic linear models. J. Stat. Softw. 36, 1–16 (2010).
Baslan, T. et al. Genome-wide copy number analysis of single cells. Nat. Protoc. 7, 1024–1041 (2012).
Harada, T. et al. Genome-wide DNA copy number analysis in pancreatic cancer using high-density single nucleotide polymorphism arrays. Oncogene 27, 1951–1960 (2008).
Samuel, N. et al. Integrated genomic, transcriptomic, and RNA-interference analysis of genes in somatic copy number gains in pancreatic ductal adenocarcinoma. Pancreas 42, 1016–1026 (2013).
Cancer Genome Atlas Research Network. Integrated genomic characterization of pancreatic ductal adenocarcinoma. Cancer Cell 32, 185–203 (2017).
Yao, H. et al. Glypican-3 and KRT19 are markers associating with metastasis and poor prognosis of pancreatic ductal adenocarcinoma. Cancer Biomark. 17, 397–404 (2016).
Girgis, A. H., Bui, A., White, N. M. & Yousef, G. M. Integrated genomic characterization of the kallikrein gene locus in cancer. Anticancer Res. 32, 957–963 (2012).
Dijk, F. et al. Unsupervised class discovery in pancreatic ductal adenocarcinoma reveals cell-intrinsic mesenchymal features and high concordance between existing classification systems. Sci. Rep. 10, 337 (2020).
Heid, I. et al. Co-clinical assessment of tumor cellularity in pancreatic cancer. Clin. Cancer Res. 23, 1461–1470 (2017).
Ravi, N. et al. Identification of targetable lesions in anaplastic thyroid cancer by genome profiling. Cancers 11, 402 (2019).
Ribeiro, F. R., Meireles, A. M., Rocha, A. S. & Teixeira, M. R. Conventional and molecular cytogenetics of human non-medullary thyroid carcinoma: characterization of eight cell line models and review of the literature on clinical samples. BMC Cancer 8, 371 (2008).
Guo, D. et al. Cytokeratin-8 in anaplastic thyroid carcinoma: more than a simple structural cytoskeletal protein. Int. J. Mol. Sci. 19, 577 (2018).
Hunt, J. L. Molecular Pathology of Endocrine Diseases (Springer, 2010).
Barletta, J. A. Endocrine pathology: advances, updates, and diagnostic pearls. Surg. Pathol. Clin. 12, xi–xii (2019).
Asa, S. L. & LiVolsi, V. A. New diagnostic and management approaches in endocrine pathology. Arch. Pathol. Lab. Med. 132, 1228–1230 (2008).
Turner, N. et al. Integrative molecular profiling of triple negative breast cancers identifies amplicon drivers and potential therapeutic targets. Oncogene 29, 2013–2023 (2010).
Gao, R. et al. Punctuated copy number evolution and clonal stasis in triple-negative breast cancer. Nat. Genet. 48, 1119–1130 (2016).
Andre, F. et al. Molecular characterization of breast cancer with high-resolution oligonucleotide comparative genomic hybridization array. Clin. Cancer Res. 15, 441–451 (2009).
Neftel, C. et al. An integrative model of cellular states, plasticity, and genetics for glioblastoma. Cell 178, 835–849 (2019).
Brennan, C. W. et al. The somatic genomic landscape of glioblastoma. Cell 155, 462–477 (2013).
Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Forbes, S. A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2017).
Hanzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 14, 7 (2013).
Xin, Y. et al. Use of the Fluidigm C1 platform for RNA sequencing of single mouse pancreatic islet cells. Proc. Natl Acad. Sci. USA 113, 3293–3298 (2016).
Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
Martin, A. D., Quinn, K. M. & Park, J. H. MCMCpack: Markov chain Monte Carlo in R. J. Stat. Softw. 42, 1–21 (2011).
Kim, C. et al. Chemoresistance evolution in triple-negative breast cancer delineated by single-cell sequencing. Cell 173, 879–893 (2018).
Olshen, A. B., Venkatraman, E. S., Lucito, R. & Wigler, M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5, 557–572 (2004).
Willenbrock, H. & Fridlyand, J. A comparison study: applying segmentation to array CGH data for downstream analyses. Bioinformatics 21, 4084–4091 (2005).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).
Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 4, 1184–1191 (2009).
Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019).
This work was supported by grants to N.E.N. from the American Cancer Society (129098-RSG-16-092-01-TBG), the National Cancer Institute (RO1CA240526, RO1CA236864), the Emerson Collective Cancer Research Fund (20200619153514) and the CPRIT Single Cell Genomics Center (RP180684). N.E.N. is an AAAS Wachtel Scholar, AAAS Fellow, Andrew Sabin Family Fellow and Jack & Beverly Randall Innovator. This study was supported by the MD Anderson Breast Cancer Moonshot Program. This study was supported by the MD Anderson Sequencing Core Facility Grant (CA016672). This project was also supported by a Susan Komen Postdoctoral Fellowship to R.G. (PDF17487910). Other grant support includes the Anaplastic Thyroid Cancer Research Fund (S.Y.L. and J.R.W.) and an institutional multi-investigator research program grant to S.Y.L.
The authors declare no competing interests.
Peer review information Nature Biotechnology thanks Elana Fertig, Jan Korbel and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Gao, R., Bai, S., Henderson, Y.C. et al. Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes. Nat Biotechnol 39, 599–608 (2021). https://doi.org/10.1038/s41587-020-00795-2
Embryonic Origin and Subclonal Evolution of Tumor-Associated Macrophages Imply Preventive Care for Cancer
Journal of Human Genetics (2021)