Large panels of comprehensively characterized human cancer models, including the Cancer Cell Line Encyclopedia (CCLE), have provided a rigorous framework with which to study genetic variants, candidate targets, and small-molecule and biological therapeutics and to identify new marker-driven cancer dependencies. To improve our understanding of the molecular features that contribute to cancer phenotypes, including drug responses, here we have expanded the characterizations of cancer cell lines to include genetic, RNA splicing, DNA methylation, histone H3 modification, microRNA expression and reverse-phase protein array data for 1,072 cell lines from individuals of various lineages and ethnicities. Integration of these data with functional characterizations such as drug-sensitivity, short hairpin RNA knockdown and CRISPR–Cas9 knockout data reveals potential targets for cancer drugs and associated biomarkers. Together, this dataset and an accompanying public data portal provide a resource for the acceleration of cancer research using model cancer cell lines.
Your institute does not have access to this article
Open Access articles citing this article.
Genome Biology Open Access 27 June 2022
Genome Biology Open Access 30 May 2022
classifieR a flexible interactive cloud-application for functional annotation of cancer transcriptomes
BMC Bioinformatics Open Access 31 March 2022
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
All the CCLE processed datasets are available at the CCLE portal (www.broadinstitute.org/ccle) and DepMap portal (http://www.depmap.org). Raw sequencing data are available at Sequence Read Archive (SRA) under accession number PRJNA523380. Achilles RNAi data (DEMETER scores) were downloaded from https://portals.broadinstitute.org/achilles. The Project Achilles CRISPR Avana 18Q3 public dataset (gene effects, CERES scores) was downloaded from https://figshare.com/articles/DepMap_Achilles_18Q3_public/6931364/1. Novartis Project DRIVE RNAi dataset (ATARiS scores) was obtained from the Project DRIVE authors. CTRP AUC scores was downloaded from the NCI website (ftp://caftpd.nci.nih.gov/pub/OCG-DCC/CTD2/Broad/CTRPv2.0_2015_ctd2_ExpandedDataset). Sanger GDSC drug sensitivity (AUC and IC50 scores) were downloaded from the Sanger website (https://www.cancerrxgene.org/downloads).
Most of the statistical analyses were performed in R (version 3.5.2). Source codes are available upon request.
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
Li, H. et al. The landscape of cancer cell line metabolism. Nat. Med. https://doi.org/10.1038/s41591-019-0404-8 (2019).
Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).
Ben-David, U. et al. Genetic and transcriptional evolution alters cancer cell line drug response. Nature 560, 325–330 (2018).
Tsherniak, A. et al. Defining a cancer dependency Map. Cell 170, 564–576 (2017).
McDonald, E. R. III et al. Project DRIVE: a compendium of cancer dependencies and synthetic lethal relationships uncovered by large-scale, deep RNAi screening. Cell 170, 577–592 (2017).
Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).
Huang, F. W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).
Diouf, B. et al. Somatic deletions of genes regulating MSH2 protein stability cause DNA mismatch repair deficiency and drug resistance in human leukemia cells. Nat. Med. 17, 1298–1303 (2011).
Marra, G. et al. Mismatch repair deficiency associated with overexpression of the MSH3 gene. Proc. Natl Acad. Sci. USA 95, 8568–8573 (1998).
Esakova, O. & Krasilnikov, A. S. Of proteins and RNA: the RNase P/MRP family. RNA 16, 1725–1747 (2010).
Hands-Taylor, K. L. et al. Heterodimerization of the human RNase P/MRP subunits Rpp20 and Rpp25 is a prerequisite for interaction with the P3 arm of RNase MRP RNA. Nucleic Acids Res. 38, 4052–4066 (2010).
Doherty, J. R. & Cleveland, J. L. Targeting lactate metabolism for cancer therapeutics. J. Clin. Invest. 123, 3685–3692 (2013).
Herman, J. G. et al. Silencing of the VHL tumor-suppressor gene by DNA methylation in renal carcinoma. Proc. Natl Acad. Sci. USA 91, 9700–9704 (1994).
Jaffe, J. D. et al. Global chromatin profiling reveals NSD2 mutations in pediatric acute lymphoblastic leukemia. Nat. Genet. 45, 1386–1391 (2013).
Creech, A. L. et al. Building the Connectivity Map of epigenetics: chromatin profiling by quantitative targeted mass spectrometry. Methods 72, 57–64 (2015).
Sveen, A., Kilpinen, S., Ruusulehto, A., Lothe, R. A. & Skotheim, R. I. Aberrant RNA splicing in cancer; expression changes and driver mutations of splicing factor genes. Oncogene 35, 2413–2427 (2016).
Dewaele, M. et al. Antisense oligonucleotide-mediated MDM4 exon 6 skipping impairs tumor growth. J. Clin. Invest. 126, 68–84 (2016).
Rallapalli, R., Strachan, G., Cho, B., Mercer, W. E. & Hall, D. J. A novel MDMX transcript expressed in a variety of transformed cell lines encodes a truncated protein with potent p53 repressive activity. J. Biol. Chem. 274, 8299–8308 (1999).
Gembarska, A. et al. MDM4 is a key therapeutic target in cutaneous melanoma. Nat. Med. 18, 1239–1247 (2012).
Boutz, P. L., Bhutkar, A. & Sharp, P. A. Detained introns are a novel, widespread class of post-transcriptionally spliced introns. Genes Dev. 29, 63–80 (2015).
Maruvka, Y. E. et al. Analysis of somatic microsatellite indels identifies driver events in human tumors. Nat. Biotechnol. 35, 951–959 (2017).
Zhang, Y. et al. Ribosomal proteins Rpl22 and Rpl22l1 control morphogenesis by regulating pre-mRNA splicing. Cell Reports 18, 545–556 (2017).
Lu, J. et al. MicroRNA expression profiles classify human cancers. Nature 435, 834–838 (2005).
Li, J. et al. Characterization of human cancer cell lines by reverse-phase protein arrays. Cancer Cell 31, 225–239 (2017).
Chen, Y. N. et al. Allosteric inhibition of SHP2 phosphatase inhibits cancers driven by receptor tyrosine kinases. Nature 535, 148–152 (2016).
Wylie, A. A. et al. The allosteric inhibitor ABL001 enables dual targeting of BCR–ABL1. Nature 543, 733–737 (2017).
Haibe-Kains, B. et al. Inconsistency in large pharmacogenomic studies. Nature 504, 389–393 (2013).
The Cancer Cell Line Encyclopedia Consortium & The Genomics of Drug Sensitivity in Cancer Consortium. Pharmacogenomic agreement between two cancer cell line datasets. Nature 528, 84–87 (2015).
Haverty, P. M. et al. Reproducible pharmacogenomic profiling of cancer cell line panels. Nature 533, 333–337 (2016).
Geeleher, P., Gamazon, E. R., Seoighe, C., Cox, N. J. & Huang, R. S. Consistency in large pharmacogenomic studies. Nature 540, E1–E2 (2016).
Bouhaddou, M. et al. Drug response consistency in CCLE and CGP. Nature 540, E9–E10 (2016).
Mpindi, J. P. et al. Consistency in drug response profiling. Nature 540, E5–E6 (2016).
Yu, C. et al. High-throughput identification of genotype-specific cancer vulnerabilities in mixtures of barcoded tumor cell lines. Nat. Biotechnol. 34, 419–423 (2016).
King, A. J. et al. Abstract 2116: Combining the power of different profiling approaches to better understand the activity of kinase inhibitor drugs. Cancer Res. 77, 2116–2116 (2017).
Fisher, S. et al. A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries. Genome Biol. 12, R1 (2011).
Johannessen, C. M. et al. A melanocyte lineage program confers resistance to MAP kinase pathway inhibition. Nature 504, 138–142 (2013).
Boyle, P. et al. Gel-free multiplexed reduced representation bisulfite sequencing for large-scale DNA methylation profiling. Genome Biol. 13, R92 (2012).
Brat, D. J. et al. Comprehensive, integrative genomic analysis of diffuse lower-grade gliomas. N. Engl. J. Med. 372, 2481–2498 (2015).
Cancer Genome Atlas Research Network. Integrated genomic characterization of papillary thyroid carcinoma. Cell 159, 676–690 (2014).
Huang, F. W. et al. TERT promoter mutations and monoallelic activation of TERT in cancer. Oncogenesis 4, e176 (2015).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.11–11.10.33 (2013).
Wala, J. A. et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 28, 581–591 (2018).
Haas, B. et al. STAR-Fusion: fast and accurate fusion transcript detection from RNA-seq. Preprint at https://www.bioRxiv.org/content/10.1101/120295v1 (2017).
Kasar, S. et al. Whole-genome sequencing reveals activation-induced cytidine deaminase signatures during indolent chronic lymphocytic leukaemia evolution. Nat. Commun. 6, 8866 (2015).
Kim, J. et al. Somatic ERCC2 mutations are associated with a distinct genomic signature in urothelial tumors. Nat. Genet. 48, 600–606 (2016).
The Cancer Genome Atlas Research Network. Comprehensive and integrated genomic characterization of adult soft tissue sarcomas. Cell 171, 950–965 (2017).
Haradhvala, N. J. et al. Distinct mutational signatures characterize concurrent loss of polymerase proofreading and mismatch repair. Nat. Commun. 9, 1746 (2018).
Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012).
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).
Akalin, A. et al. methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol. 13, R87 (2012).
Ziller, M. J. et al. Charting a dynamic DNA methylation landscape of the human genome. Nature 500, 477–481 (2013).
van der Maaten, L. & Hinton, G. Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Szklarczyk, D. et al. STRINGv10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43, D447–D452 (2015).
Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).
Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 6, pl1 (2013).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
DeLuca, D. S. et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics 28, 1530–1532 (2012).
GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Smyth, G. K. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, 1–25 (2004).
Akbani, R. et al. A pan-cancer proteomic perspective on The Cancer Genome Atlas. Nat. Commun. 5, 3887 (2014).
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
We thank the Broad Genomics Platform, C. Clish, H. Bitter, A. Najafi and E. Orlando for their contribution. This work was supported by grants from Novartis and partially by NIH/NCI grants 1U01CA217842-01, 1P50CA217685-01, 5P50CA098258, 1U24CA180922-01, 1R50CA211461-01, CA16672, 1R01CA219943-01, 1U54CA224068-01, NIH U01 CA176058 and R21 DA025720. F.W.H. was supported by the Prostate Cancer Foundation. M.I. was supported by a Burroughs Wellcome Fund Career Award. G.G. was partially supported by the Paul C. Zamecnik, MD, Chair in Oncology at MGH. G.B.M. was supported by the Adelson medical research fund. Drug sensitivity results are in part based on data generated by Cancer Target Discovery and Development (CTD2) Network (https://ocg.cancer.gov/programs/ctd2/data-portal) established by the National Cancer Institute’s Office of Cancer Genomics.
Nature thanks Nevan Krogan, Christoph Plass and the other anonymous reviewer(s) for their contribution to the peer review of this work.
J.M.K., M.D.J., D.A.P., F.S., E.R.M., J.L., R.S., J.B., A.D., K.V., A.R., J.G., G.C., M.L., A.K., M.P.M. and W.R.S. are current or former Novartis employees and/or stock holders. W.R.S. is a Board or SAB member and holds equity in Peloton Therapeutics and Ideaya Biosciences and has consulted for Array, Astex, Ipsen, Sanofi and Servier. B.A.W. is a J&J employee. F.S. and G.V.K. are KSQ Therapeutics employees. N.S. is an employee of Celsius Therapeutics. A.D.C. receives research support from Bayer AG. G.G. receives research support from IBM and Pharmacyclics and is an inventor on patent applications related to MuTect and ABSOLUTE. G.B.M. consults with AstraZeneca, ImmunoMET, Ionis, Nuevolution, PDX Bio, Signalchem Lifesciences, Symphogen and Tarveda, has stock options with Catena Pharmaceuticals, ImmunoMet, SignalChem, Spindle Top Ventures and Tarveda, has sponsored research funding from Adelson Medical Research Foundation, AstraZeneca, Breast Cancer Research Foundation, Immunomet, Komen Research Foundation, Pfizer, Nanostring, Tesaro, travel support from Chrysallis Bio and has licensed technology to Nanostring and Myriad Genetics. T.R.G. is an advisor to GlaxoSmithKline, is a co-founder of Sherlock Biosciences and was a co-founder and advisor to Foundation Medicine. J.K. is a Tesaro employee. W.C.H. is a consultant for Thermo Fisher, AjuIB, Paraxel and MPM Capital, and is a founder and consultant for KSQ Therapeutics. L.A.G. is an employee of Eli Lilly.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
a, The existing and new CCLE datasets as indicated are depicted. b, Distribution of cell lines by lineage and ancestry across CCLE. c, Visual representation of the number of cell lines in each dataset. New CCLE datasets are shown in red. Functional genomics datasets are shown in blue.
a, Unified pipeline integrating mutation and indel calls from different platforms was used to generate a set of high confidence genomic alterations across 1,063 cancer cell lines. Identified variants were cross-referenced with the ExAC and TCGA databases and a panel of normals (PoN) to exclude germline variants/artefacts and generate the finalized high-confidence variant call set. b–d, Comparison of variant calls between CCLE and Sanger GDSC cell lines for germline (b; n = 1,250,562), TCGA hotspot somatic (c; n = 281) and non-hotspot somatic (d; n = 82,572) variants using WES data. Pearson’s correlation coefficients are shown. e, Comparison of TCGA hotspot variant calls between CCLE Hybrid Capture (HC) data and Sanger GDSC WES data. Variants with allelic fraction >0.4 in one dataset and greater than fourfold difference in allelic fractions between the two datasets are shown as open circles (n = 980). f, g, Comparison of Pearson’s correlation coefficients between CCLE WES and Sanger GDSC WES data versus Pearson’s correlation coefficients between CCLE HC and Sanger GDSC WES data for germline (f; n = 107) and somatic (g; n = 93) variants. Cell lines with fewer than 30 variants were excluded. h, Comparison of allelic fraction Pearson’s correlations between CCLE cell lines and Sanger cell lines using CCLE HC and Sanger GDSC WES data (n = 558 common cell lines between the two datasets; Supplementary Table 3). Cell lines with low germline correlation (sample mismatch) and low somatic correlation (genetic drift) are highlighted.
a, Structural variant burden in CCLE whole genomes. Structural variants detected by SvABA in cell lines grouped by tissue type are plotted in the order of mean structural variant burden (red bar in each facet). b, Bar plot of recurrent COSMIC fusions detected in CCLE RNA-seq data coloured by cell line lineage. c, Volcano plot of Achilles RNAi gene dependencies versus CCLE fusions for cell lines (n = 478) common between CCLE and Achilles datasets. P values determined by two-sided t-test. Genes with significant adjusted P values (false discovery rate (FDR) < 0.1) are highlighted. d, e, Examples of fusions associated with gene dependency: cell lines with ESR1-CCDC170 fusion (n = 4) are sensitive to ESR1 shRNA knockdown (d), and cell lines with AFF1-KMT2A fusion (n = 3) are sensitive to AFF1 shRNA knockdown (e). The x axis shows mRNA expression, and the y axis shows Achilles RNAi gene dependency DEMETER score5.
a, Mutational signature activity for CCLE cell lines and TCGA tumours averaged for each cancer type. For each sample, we computed a fraction of mutations attributed to 30 COSMIC signatures and took average across samples in each cancer type. Tumour types selected for representation have at least 20 samples in CCLE. b, Scatterplots for the mutational signature activities for CCLE and TCGA (n = 168). P value determined by linear regression analysis and corrected for COSMIC signature number. c, Volcano plot for comparison of COSMIC mutational signatures and CCLE or GDSC genetic drift estimates using two-sided Pearson’s correlation test (n = 3–459; Supplementary Table 6). d, Scatter plot for COSMIC6 mutational activity signatures versus CCLE or GDSC genetic drift estimates (n = 354). Colour coding as in b. P value determined by Pearson’s correlation test.
Extended Data Fig. 5 Determination of MSI status in the CCLE and interrogation of mismatch repair genes.
a, Identification of MSI cell lines. Number of deletions in microsatellite regions is plotted versus percentage of deletions in microsatellite regions for all cell lines in CCLE HC sequencing, CCLE WGS, CCLE WES, and Sanger GDSC WES datasets (see Methods). The x axis denotes the number of short deletions in microsatellite regions, and the y axis denotes percentage microsatellite as measured by the percentage of short deletions that lie within microsatellite regions. Inferred MSI cell lines are outlined by the green rectangle. b, Heat plot of inferred MSI status and selected CCLE annotations for DNA mismatch repair genes MLH1, MSH2 and MSH6 genes for all cell lines (top) and the MSI subset (bottom). Highlighted red boxes show differences in mRNA and protein expression levels in MSH2 and MSH6. MLH1 hypermethylation is defined as average promoter methylation greater than 0.5. c, d, Scatterplot of CCLE cell lines comparing MSH6 mRNA expression levels (x axis) from RNA-seq versus MSH6 protein abundance (y axis) as quantified by RPPA in inferred-MSI (c) and inferred-MSS (d) cell lines. Red and blue denotes cell lines containing truncating mutations or copy number loss in MSH6 and MSH2, respectively. Purple denotes cell lines containing truncating mutation or copy number loss in both MSH2 and MSH6. The black box highlights the MSH6 high mRNA low protein (HL) category. e–g, Bar plots of percentages of cell lines containing truncating mutations in MSH6 (e) or MSH2 (f), and MLH1 expression loss (g) in different MSH6 mRNA and protein categories among inferred-MSI cell lines (LL: n = 11; HL: n = 17; HH: n = 44). P = 4 × 10−4 (e), P = 1 × 10−3 (f) and P = 1 × 10−4 (g), two-sided Fisher test. h, MSH2 protein levels in different MSH6 mRNA and protein categories. ***P < 1 × 10−6, two-sided Wilcoxon rank-sum test. P = 8 × 10−14, difference between the HH and HL set; P = 1 × 10−8, difference between the HH and LL set. Box plots as defined in Fig. 4d.
Extended Data Fig. 6 Examples of DNA methylation associated with gene expression and dependencies in cell lines.
a, t-SNE plot for DNA methylation data across all CCLE cell lines. Each dot represents a cell line coloured by cell lineage. b, Distribution of mean CpG methylation in CCLE cell lines (n = 843) grouped by cancer type. Box plots as defined in Fig. 4d. c, Correlation of promoter methylation and gene expression for all genes corrected for cancer type (n = 836 cell lines, 18,296 genes). The y axis represents the number of genes, and the x axis is the linear regression coefficient corresponding to normalized promoter DNA methylation. Cancer types were used as covariates in the linear regression analysis. A subset of genes show significant correlation between higher promoter methylation and lower gene expression (n = 7,388; permutation test P < 0.05; Methods). Dotted line shows the empirical null distribution. d, Cell lines with higher levels of RPP25 methylation show decreased RPP25 mRNA expression (Pearson’s r = −0.79, n = 834 cell lines; P < 2.2 × 10−16). e, Comparison of Achilles RNAi RPP25 gene dependency scores for cell lines with and without truncating mutation or copy number loss in POP7 or RPP25L genes (n = 458 cell lines; P = 0.74, two-sided Wilcoxon rank-sum test). Box plots as defined in Fig. 4d. f, Cell lines with higher levels of LDHB methylation show decreased LDHB mRNA expression (Pearson’s r = −0.80, n = 815 cell lines; P < 2.2 × 10−16). g, Cell lines with higher levels of LDHA methylation show decreased LDHA expression. Two cell lines, SK-N-BE2 and U-251-MG, show markedly higher LDHA methylation and decreased LDHA expression (Pearson’s r = −0.27, n = 836; P = 5.34 × 10−16). h, Cell lines with high levels of LDHA methylation display sensitivity to LDHB knockout by CRISPR–Cas9 screening (Pearson’s r = −0.53, n = 371, P < 2.2 × 10−16). i, Promoter methylation versus mRNA expression correlations in TCGA tumour types (sample sizes shown in parentheses). *P < 0.001, Pearson’s correlation test. j, Scatterplot of CCLE lines comparing expression of tumour suppressor VHL (Von Hippel-Landau) mRNA versus VHL methylation (left, all cell lines) and copy number (right, kidney subset). VHL hypermethylation in three kidney cell lines is associated with marked loss of VHL expression. VHL is inactivated by DNA copy number loss, somatic mutation, and promoter hypermethylation.
a, Unsupervised clustering of global chromatin profiling data for 897 CCLE cell lines. Each column corresponds to an individual cell line and each row corresponds to a specific combination of chromatin post-translational modifications (‘marks’). For each mark, the fold change relative to the median of cell lines is depicted on the heat map. EZH2, NSD2, CREBBP and EP300 status are annotated. Previously described clusters (associated with EZH2 gain of function, EZH2 loss of function, and NSD2 alterations), as well as the newly identified cluster associated with p300 and CBP gain-of-function alterations, are annotated. b, Volcano plot for truncating mutation enrichment analysis in the newly identified cluster, characterized by marked increases in H3K18 and H3K27 acetylation is shown (n = 893 cell lines; adjusted P values determined by two-sided Fisher’s exact test). EP300 and CREBBP are the top two genes with truncating mutations enriched in this cluster. Only genes with at least 20 affected cell lines (n = 684 genes) were included. c, Distribution of truncating mutations affecting EP300 and CREBBP in the 10 cell lines in the newly identified p300/CBP cluster. Truncating mutations predicted to affect the TAZ2 (CH3) domain specifically are highlighted. Two other truncating mutations not specific to TAZ2 (CH3) are OVCAR-8 (S893*) and COLO-704 (K1469fs).
Extended Data Fig. 8 Comparison of CCLE gene expression data with primary tumour (TCGA) and normal tissue (GTEx) gene expression datasets.
a, Comparison of gene expression profiles between the CCLE cell lines (n = 1,019) and TCGA primary tumours (n = 10,535). For every gene in each dataset, expression values were averaged per cancer type and then mean centred across types. Pearson correlation values were calculated between the CCLE and TCGA cancer types using the (n = 5,000) most highly variable genes. b, Comparison of average gene expression profiles between the CCLE cell lines (n = 1,019) and the GTEx normal tissues (n = 11,688). Similar to a, expression profiles for each tissue type in GTEx was correlated with the CCLE expression profiles (n = 5,000 genes). c, Gene expression comparison of eight prostate cell lines and TCGA primary tumour samples (n = 5,000 genes). d, Gene expression comparison of eight prostate cell lines and GTEx normal tissue samples (n = 5,000 genes).
a, Distribution of MDM4 exon 6 inclusion (left) and MDM4 mRNA expression (right) correlation with all gene dependencies in the Achilles RNAi dataset (n = 189–478; Supplementary Table 10). b, Correlation of MDM4 exon 6 inclusion with sensitivity to all small molecules in the CTRP AUC dataset using all cell lines. Nutlin-3a is the top drug sensitivity correlated with MDM4 exon 6 inclusion (n = 79–810; Supplementary Table 10). c, Example of nutlin-3a sensitivity versus MDM4 exon 6 inclusion in the AML cell lines (Spearman correlation ρ = −0.64, P = 3 × 10−4, n = 28). The y axis shows the AUC for nutlin-3a in the CTRP dataset. d, Scatterplot of MDM4 exon 6 inclusion versus RPL22L1 expression for all p53-mutant (left, n = 711) and p53 wild-type (right, n = 288) CCLE cell lines. P values determined by Pearson’s correlation test. e, Frequency of RPL22 recurrent frameshift mutations (left) and copy number deletions (right) in TCGA. f, Frequency of RPL22 recurrent frameshift mutations (left) and copy number deletions (right) in CCLE. g, Correlation of RPL22L1 mRNA expression with RPL22 copy number loss and RPL22 frameshift deletions in TCGA. P value determined by two-sided Kruskal–Wallis rank-sum test. Box plots as defined in Fig. 4d. Values in parentheses denote sample size in each category. h, Correlation of MDM4 exon 6 inclusion with RPL22 copy number loss and RPL22 frameshift deletions in TCGA. P value determined by two-sided Kruskal–Wallis rank-sum test. Box plots are as defined in Fig. 4d. Values in parentheses denote sample size in each category. i, Selected genomic features that correlate with sensitivity to MDM4 shRNA knockdown. mRNA expression of MDM4 and TP53 are shown for comparison.
Extended Data Fig. 10 Examples of microRNA expression associated with gene dependencies in cell lines.
a, t-SNE plot for miRNA data across all CCLE cell lines. Each dot represents a cell line. Each colour represents a different cell lineage. Colour coding is as in Fig. 1. b, Scatter plot of pairwise Pearson’s correlation of gene dependency and miRNA expression (n = 420 cell lines), normalized for each microRNA (z1, x axis) and each gene dependency (z2, y axis). Strong outlier pairs with |z1| > 6 or |z2| > 6 are highlighted. c, Distribution of Pearson’s correlations of mir-215 expression with Achilles RNAi gene dependencies for 16,871 genes (n = 162–420 cell lines; Supplementary Table 13). CTNNB1 knockdown is the top negative correlate with mir-215 expression. d, Distribution of Pearson’s correlations of CTNNB1 gene dependency with all 734 measured miRNAs (n = 420 cell lines). The expression of mir-215 is the top gene negatively correlated with CTNNB1 dependency. mir-215 and mir-194-1 cluster together at 1q41, whereas mir-192 and mir-194-2 cluster at 11q13.1. mir-215 and mir-192 are close homologues. e, Scatterplot of mir-215 expression versus CTNNB1 dependency of all CCLE cell lines. Colon and stomach lineages are shown in blue and red, respectively. f, Scaled mir-215 expression in TCGA and CCLE datasets (n = 14; mean ± s.e.m.). Stomach and colorectal lineages in both datasets have high mir-215 expression. g, Single-sample gene set enrichment analysis identifies TGFB1 and WNT3A pathway gene sets correlated with mir-215 expression using CCLE RNA-seq data. The gene set ‘Labbe targets of TGFB1 and WNT3A’ of downstream targets of TGF-β and WNT ligands is negatively correlated with mir-215 expression. h, The gene set ‘Labbe targets of TGFB1 and WNT3A’ is negatively correlated with mir-215 expression in the TCGA stomach mRNA expression dataset. i, The gene set ‘Vecchi gastric advanced vs early dn’ of downregulated genes distinguishing between advanced and early gastric cancer subtypes is positively correlated with mir-215 expression in the CCLE. j, mir-215 expression in the stomach TCGA mRNA expression dataset is positively correlated with the ‘Vecchi gastric advanced vs early dn’ gene set.
a, Distribution of Pearson’s correlation coefficient between total protein levels as measured by RPPA and mRNA expression levels measured by RNA-seq (n = 890 cell lines, 154 genes). The empirical null distribution for correlation of mRNA and protein for two random genes is shown for comparison (P < 2.2 × 10−16, two-sided Wilcoxon rank-sum test). b, Effect of RPPA dynamic range on mRNA and protein correlation (n = 96). mRNA and protein correlation is plotted against dynamic range for each validated total protein antibody. Most antibodies with low mRNA and protein correlation tend to have low dynamic range with the exception of the gene VEGFR2, which despite high dynamic range, exhibits very low mRNA and protein correlation. P values determined by two-sided Pearson’s correlation test. c, Effect of RPPA antibody quality and target type on mRNA/protein correlation. On the left, mRNA/protein Pearson correlation is plotted for ‘validated’ (n = 96) and ‘with caution’ (n = 58) antibodies for antibodies against total proteins. On the right, mRNA and protein Pearson’s correlation is plotted for antibodies against total protein (n = 154) and antibodies against phospho-protein (n = 50). Median correlations are 0.62 (validated), 0.48 (caution), 0.54 (total protein), 0.21 (phospho-protein). P values determined by two-sided Wilcoxon rank-sum test. Box plots are as defined in Fig. 4d. d, Comparison of mRNA and protein correlations in CCLE and TCGA (n = 152). The Pearson’s correlation between mRNA and protein levels is calculated for each RPPA antibody in CCLE and TCGA separately. Each dot represents an antibody. Generally, the antibodies with low mRNA and protein correlation in CCLE also have low mRNA and protein correlation in TCGA data. P values determined by two-sided Pearson’s correlation test. e, Distribution of gene dependency (Achilles RNAi) correlations with RPPA pSHP2 level (left, n = 161–411, Supplementary Table 14) and PTPN11 mRNA expression (right, n = 192–478, Supplementary Table 14). PTPN11 dependency is strongly correlated with pSHP2 level, whereas there is no significant correlation with PTPN11 mRNA level. f, Comparison of pSHP2 levels in SHP099-sensitive and -resistant cell lines (n = 60). P value determined by two-sided Wilcoxon rank-sum test. SHP099 sensitivity data were obtained from a previous study26. Box plots are as defined in Fig. 4d. g, Pearson’s correlation of pSHP2 and Sanger GDSC drug sensitivity AUC dataset (n = 265 drugs and 198–588 overlapping cell lines). h, Model error for elastic net model of sensitivity to ponatinib with and without using RPPA data as predictive features. The y axis shows the cross-validation error (fivefold cross-validation) against parameter λ of elastic net (parameter α is fixed at 0.2). Data are mean ± s.d. for the five cross-validation sets. The minimum cross-validation error for models with and without using RPPA data are shown by arrows. i, Elastic net results for sensitivity to ponatinib. pSHP2 is the top feature selected by elastic net. On the left, elastic net weights (averaged over 200 bootstrapping trials) and colour-coded by the frequency each feature was selected by elastic net. The numbers in parentheses are the frequency each feature was selected. Each column is a cell line and each row is a feature. The cell lines are sorted by their sensitivity to ponatinib (shown at the bottom). j, Western blot analysis of pSHP2 and total SHP2 levels across AML and select CML cell lines. Western blots were performed twice independently with similar results. k, Validation of RPPA data for pSHP2. pSHP2 levels measured by western blot are plotted against pSHP2 levels measured by RPPA for the tested AML and control CML cell lines (n = 19). The cell lines are colour-coded by their sensitivity to ponatinib. P values determined by two-sided Pearson’s correlation test. l, In vivo mouse xenograft experiment survival curves of ponatinib-treated and control mice for the low pSHP2 primagraft DFAM-68555 (n = 7 mice in each treatment group). P values determined by log-rank (Mantle–Cox) test. m, Immunohistochemistry of spleen specimens from mice treated with control or ponatinib for 5 days using anti-CD45. Similar results were found using the other two independent sets of mice.
Supplementary Methods, Computational Analysis and Supplementary References.
Uncropped scans with size marker for (Extended Data Fig. 11j).
Cell lines annotations and available datasets Cell Line Annotations: List of the cell lines with CCLE IDs, DepMap IDs, and the available annotations. Datasets: Data used to generate Extended Data Fig. 1c. Overlapping cell lines: Number of overlapping cell lines between different datasets. Cell line name changes: List of cell lines with changed CCLE ID.
RainDance targets List of genomic loci and primer sequences used in RainDance sequencing data.
CCLE GDSC comparison Data used to generate Extended Data Fig. 2h. r_somatic_CCLE_HC_vs_GDSC_WES: Pearson correlation of somatic variants' allelic fractions between CCLE hybrid capture and Sanger GDSC whole exome sequencing. n_datapoints.somatic: Number of data points used to calculate the correlation (somatic) r_germline_CCLE_HC_vs_GDSC_WES: Pearson correlation of germline variants' allelic fractions between CCLE hybrid capture and Sanger GDSC whole exome sequencing. n_datapoints.germline: Number of data points used to calculate the correlation (germline). comments: Cell lines classification based on CCLE/GDSC concordance.
Fusion vs dependency analysis Data corresponding to the Extended Data Fig. 3c. This table includes fusions associated with dependencies based on Achilles RNAi, Achilles CRISPR, and Novartis RNAi datasets. Two-sided t-test was used. Sample size (n) is provided for each row.
TERT promoter mutations List of cell lines and corresponding TERT promoter genotype profiled by whole genome sequencing or targeted sequencing.
COSMIC mutational signatures analysis Data used in mutational signature analysis (Extended Data Fig. 4). CancerTypes_analyzed: List of COSMIC signatures analyzed in each cancer type. CCLE_perCellLine: Mutational signature activity in each CCLE cell line. CCLE_perTumorType: Average mutational activity in each cancer type in CCLE. TCGA_perTumorType: Average mutational activity in each cancer type in TCGA. Signature_vs_drift_volcanoPlot: Correlation of mutational signatures with genetic drift (Data used in Extended Data Fig4c); Two-sided Pearson correlation test was used. Sample size (n) is given for each row.
Microsatellite instability (MSI) annotation CCLE.hc.msi_del: Number of short deletions in microsatellite regions in CCLE hybrid capture dataset. CCLE.hc.total_del: Total number of short deletions in CCLE hybrid capture dataset. CCLE.wes.msi_del: Number of short deletions in microsatellite regions in CCLE whole exome sequencing dataset. CCLE.wes.total_del: Total number of short deletions in CCLE whole exome sequencing dataset. CCLE.wgs.msi_del: Number of short deletions in microsatellite regions in CCLE whole genome sequencing dataset. CCLE.wgs.total_del: Total number of short deletions in CCLE whole genome sequencing dataset. GDSC.wes.msi_del: Number of short deletions in microsatellite regions in Sanger GDSC whole exome sequencing dataset. GDSC.wes.total_del: Total number of short deletions in Sanger GDSC whole exome sequencing dataset. GDSC.msi.call: MSI call in Sanger GDSC dataset. CCLE.MSI.call: MSI call in CCLE dataset.
DNA methylation analysis methylation_vs_dependency: List of gene dependencies and associated promoter methylations (Fig. 2a). average_methylation: Average DNA methylation for each cell line (Extended Data Fig. 6b). mRNA_methylation_correlation: Correlation of promoter methylation and mRNA expression (Extended Data Fig. 6c).
Comparison of CCLE gene expression with TCGA and GTEx gene expression profiles Data used to generate Extended Data Fig. 8. corr_w_GTEx_tissueTypes: Correlation (n=5000) between CCLE samples and average expression of GTEx tissue types; Rows are individual cell lines and columns are GTEx tissue types. TCGA_CCLE_avg_cor_plot: Correlation (n=5000) between average expression for CCLE cancer types (columns) and average expression of TCGA cancer types (rows). TCGA_CCLE_mean_expressions: Average expression values of TCGA cancer types and CCLE cancer types (rows) across 5000 genes (columns). GTEx_CCLE_avg_cor_plot: Correlation (n=5000) between average expression for CCLE cancer types (columns) and average expression of GTEx tissue types (rows). GTEx_CCLE_mean_expressions: Average expression values of GTEx tissue types and CCLE cancer types (rows) across 5000 genes (columns).
Alternative splicing analysis Splicing_vs_dependency: Correlation of splicing with gene dependencies (Fig. 4a). Expression_vs_MDM4_splicing: Correlation of MDM4 exon 6 splicing with mRNA expressions (Fig. 4f). Splicing_vs_RPL22L1_expression: Correlation of RPL22L1 expression with mRNA splicing of different exons (Fig. 4g). MDM4_mRNA_vs_dependencies: Correlation of MDM4 mRNA expression with gene dependencies (Extended Data Fig. 9a). MDM4splicing_vs_dependencies: Correlation of MDM4 exon 6 splicing with gene dependencies (Extended Data Fig. 9a). MDM4splicing_vs_drugs_CTRP: Correlation of MDM4 exon 6 splicing with drug sensitivities (Extended Data Fig. 9b).
MDM4 splicing validation Primer_sequences: list of qPCR primers used in MDM4 splicing validation experiment. RNAseq_vs_qPCR: Data used in Fig. 4c.
TP53 status in CCLE cell lines p53 mutation, copy number, expression and splicing status in CCLE.
miRNA expression analysis miRNA_vs_dependency: Correlation of miRNA expressions with gene dependencies (Extended Data Fig. 10b). mir215_vs_dependencies: Correlation of miR215 expression with gene dependencies (Extended Data Fig. 10c). CTNNB1_vs_miRNA: Correlation of CTNNB1 dependency with miRNA expressions (Extended Data Fig. 6c).
Reverse phase protein array (RPPA) analysis RPPA_Ab_information: List of antibodies used in RPPA analysis. Batch info: Annotates which cell lines were included in each batch of the RPPA profiling. RPPA_vs_Achilles: Correlation of protein expressions with gene dependencies (Fig. 5a). RPPA_pSHP2_vs_Achilles: Correlation of pSHP2 with gene dependencies (Extended Data Fig. 11e). RPPA_PTPN11mRNA_vs_Achilles: Correlation of PTPN11 mRNA expression with gene dependencies (Extended Data Fig. 11e). RPPA_pSHP2_vs_drugs: Correlation of pSHP2 with drug sensitivities (Extended Data Fig. 11g).
About this article
Cite this article
Ghandi, M., Huang, F.W., Jané-Valbuena, J. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019). https://doi.org/10.1038/s41586-019-1186-3
Pan-cancer quantitation of epithelial-mesenchymal transition dynamics using parallel reaction monitoring-based targeted proteomics approach
Journal of Translational Medicine (2022)
Cancer Cell International (2022)
Journal of Experimental & Clinical Cancer Research (2022)
BMC Genomics (2022)
Genome Biology (2022)