Next-generation characterization of the Cancer Cell Line Encyclopedia

Ghandi, Mahmoud; Huang, Franklin W.; Jané-Valbuena, Judit; Kryukov, Gregory V.; Lo, Christopher C.; McDonald, E. Robert; Barretina, Jordi; Gelfand, Ellen T.; Bielski, Craig M.; Li, Haoxin; Hu, Kevin; Andreev-Drakhlin, Alexander Y.; Kim, Jaegil; Hess, Julian M.; Haas, Brian J.; Aguet, François; Weir, Barbara A.; Rothberg, Michael V.; Paolella, Brenton R.; Lawrence, Michael S.; Akbani, Rehan; Lu, Yiling; Tiv, Hong L.; Gokhale, Prafulla C.; de Weck, Antoine; Mansour, Ali Amin; Oh, Coyin; Shih, Juliann; Hadi, Kevin; Rosen, Yanay; Bistline, Jonathan; Venkatesan, Kavitha; Reddy, Anupama; Sonkin, Dmitriy; Liu, Manway; Lehar, Joseph; Korn, Joshua M.; Porter, Dale A.; Jones, Michael D.; Golji, Javad; Caponigro, Giordano; Taylor, Jordan E.; Dunning, Caitlin M.; Creech, Amanda L.; Warren, Allison C.; McFarland, James M.; Zamanighomi, Mahdi; Kauffmann, Audrey; Stransky, Nicolas; Imielinski, Marcin; Maruvka, Yosef E.; Cherniack, Andrew D.; Tsherniak, Aviad; Vazquez, Francisca; Jaffe, Jacob D.; Lane, Andrew A.; Weinstock, David M.; Johannessen, Cory M.; Morrissey, Michael P.; Stegmeier, Frank; Schlegel, Robert; Hahn, William C.; Getz, Gad; Mills, Gordon B.; Boehm, Jesse S.; Golub, Todd R.; Garraway, Levi A.; Sellers, William R.

doi:10.1038/s41586-019-1186-3

Article
Published: 08 May 2019

Next-generation characterization of the Cancer Cell Line Encyclopedia

Mahmoud Ghandi¹^na1,
Franklin W. Huang^1,2^na1^nAff13,
Judit Jané-Valbuena^1,2,
Gregory V. Kryukov¹,
Christopher C. Lo¹,
E. Robert McDonald III³,
Jordi Barretina³^nAff16,
Ellen T. Gelfand¹,
Craig M. Bielski¹,
Haoxin Li^1,2,
Kevin Hu¹,
Alexander Y. Andreev-Drakhlin¹,
Jaegil Kim¹,
Julian M. Hess¹,
Brian J. Haas¹,
François Aguet¹,
Barbara A. Weir¹,
Michael V. Rothberg¹,
Brenton R. Paolella¹,
Michael S. Lawrence^1,4,5,6,
Rehan Akbani⁷,
Yiling Lu⁷,
Hong L. Tiv⁸,
Prafulla C. Gokhale⁸,
Antoine de Weck⁹,
Ali Amin Mansour¹,
Coyin Oh¹,
Juliann Shih¹,
Kevin Hadi^10,11,
Yanay Rosen¹,
Jonathan Bistline¹,
Kavitha Venkatesan³,
Anupama Reddy³,
Dmitriy Sonkin³^nAff14,
Manway Liu³,
Joseph Lehar³,
Joshua M. Korn³,
Dale A. Porter³,
Michael D. Jones³,
Javad Golji³,
Giordano Caponigro³,
Jordan E. Taylor¹,
Caitlin M. Dunning¹,
Amanda L. Creech¹,
Allison C. Warren¹,
James M. McFarland¹,
Mahdi Zamanighomi¹,
Audrey Kauffmann⁹,
Nicolas Stransky¹,
Marcin Imielinski^10,11,
Yosef E. Maruvka^1,4,
Andrew D. Cherniack^1,2,
Aviad Tsherniak¹,
Francisca Vazquez¹,
Jacob D. Jaffe¹,
Andrew A. Lane²,
David M. Weinstock²,
Cory M. Johannessen¹,
Michael P. Morrissey³,
Frank Stegmeier³,
Robert Schlegel³,
William C. Hahn^1,2,
Gad Getz^1,4,5,6,
Gordon B. Mills⁷,
Jesse S. Boehm¹,
Todd R. Golub^1,2,12,
Levi A. Garraway^1,2^na2 &
…
William R. Sellers³^na2^nAff15

Nature volume 569, pages 503–508 (2019)Cite this article

121k Accesses
1620 Citations
418 Altmetric
Metrics details

Subjects

Abstract

Large panels of comprehensively characterized human cancer models, including the Cancer Cell Line Encyclopedia (CCLE), have provided a rigorous framework with which to study genetic variants, candidate targets, and small-molecule and biological therapeutics and to identify new marker-driven cancer dependencies. To improve our understanding of the molecular features that contribute to cancer phenotypes, including drug responses, here we have expanded the characterizations of cancer cell lines to include genetic, RNA splicing, DNA methylation, histone H3 modification, microRNA expression and reverse-phase protein array data for 1,072 cell lines from individuals of various lineages and ethnicities. Integration of these data with functional characterizations such as drug-sensitivity, short hairpin RNA knockdown and CRISPR–Cas9 knockout data reveals potential targets for cancer drugs and associated biomarkers. Together, this dataset and an accompanying public data portal provide a resource for the acceleration of cancer research using model cancer cell lines.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: DNA methylation and cancer dependence.**

**Fig. 3: Global chromatin profiling reveals activating mutations in p300 and CBP.**

**Fig. 4: *MDM4* exon 6 inclusion is associated with *MDM4* dependency and *RPL22* or *RPL22L1* status.**

**Fig. 5: High pSHP2 is a marker of SHP2 dependence and sensitivity to RTK inhibitors.**

Integrated cross-study datasets of genetic dependencies in cancer

Article Open access 12 March 2021

Systematic transcriptional analysis of human cell lines for gene expression landscape and tumor representation

Article Open access 05 September 2023

Pan-cancer single-cell RNA-seq identifies recurring programs of cellular heterogeneity

Article 30 October 2020

Data availability

All the CCLE processed datasets are available at the CCLE portal (www.broadinstitute.org/ccle) and DepMap portal (http://www.depmap.org). Raw sequencing data are available at Sequence Read Archive (SRA) under accession number PRJNA523380. Achilles RNAi data (DEMETER scores) were downloaded from https://portals.broadinstitute.org/achilles. The Project Achilles CRISPR Avana 18Q3 public dataset (gene effects, CERES scores) was downloaded from https://figshare.com/articles/DepMap_Achilles_18Q3_public/6931364/1. Novartis Project DRIVE RNAi dataset (ATARiS scores) was obtained from the Project DRIVE authors. CTRP AUC scores was downloaded from the NCI website (ftp://caftpd.nci.nih.gov/pub/OCG-DCC/CTD2/Broad/CTRPv2.0_2015_ctd2_ExpandedDataset). Sanger GDSC drug sensitivity (AUC and IC₅₀ scores) were downloaded from the Sanger website (https://www.cancerrxgene.org/downloads).

Code availability

Most of the statistical analyses were performed in R (version 3.5.2). Source codes are available upon request.

References

Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
Article CAS ADS Google Scholar
Li, H. et al. The landscape of cancer cell line metabolism. Nat. Med. https://doi.org/10.1038/s41591-019-0404-8 (2019).
Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).
Article CAS Google Scholar
Ben-David, U. et al. Genetic and transcriptional evolution alters cancer cell line drug response. Nature 560, 325–330 (2018).
Article CAS ADS Google Scholar
Tsherniak, A. et al. Defining a cancer dependency Map. Cell 170, 564–576 (2017).
Article CAS Google Scholar
McDonald, E. R. III et al. Project DRIVE: a compendium of cancer dependencies and synthetic lethal relationships uncovered by large-scale, deep RNAi screening. Cell 170, 577–592 (2017).
Article Google Scholar
Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).
Article CAS Google Scholar
Huang, F. W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).
Article CAS ADS Google Scholar
Diouf, B. et al. Somatic deletions of genes regulating MSH2 protein stability cause DNA mismatch repair deficiency and drug resistance in human leukemia cells. Nat. Med. 17, 1298–1303 (2011).
Article CAS Google Scholar
Marra, G. et al. Mismatch repair deficiency associated with overexpression of the MSH3 gene. Proc. Natl Acad. Sci. USA 95, 8568–8573 (1998).
Article CAS ADS Google Scholar
Esakova, O. & Krasilnikov, A. S. Of proteins and RNA: the RNase P/MRP family. RNA 16, 1725–1747 (2010).
Article CAS Google Scholar
Hands-Taylor, K. L. et al. Heterodimerization of the human RNase P/MRP subunits Rpp20 and Rpp25 is a prerequisite for interaction with the P3 arm of RNase MRP RNA. Nucleic Acids Res. 38, 4052–4066 (2010).
Article CAS Google Scholar
Doherty, J. R. & Cleveland, J. L. Targeting lactate metabolism for cancer therapeutics. J. Clin. Invest. 123, 3685–3692 (2013).
Article CAS Google Scholar
Herman, J. G. et al. Silencing of the VHL tumor-suppressor gene by DNA methylation in renal carcinoma. Proc. Natl Acad. Sci. USA 91, 9700–9704 (1994).
Article CAS ADS Google Scholar
Jaffe, J. D. et al. Global chromatin profiling reveals NSD2 mutations in pediatric acute lymphoblastic leukemia. Nat. Genet. 45, 1386–1391 (2013).
Article CAS Google Scholar
Creech, A. L. et al. Building the Connectivity Map of epigenetics: chromatin profiling by quantitative targeted mass spectrometry. Methods 72, 57–64 (2015).
Article CAS Google Scholar
Sveen, A., Kilpinen, S., Ruusulehto, A., Lothe, R. A. & Skotheim, R. I. Aberrant RNA splicing in cancer; expression changes and driver mutations of splicing factor genes. Oncogene 35, 2413–2427 (2016).
Article CAS Google Scholar
Dewaele, M. et al. Antisense oligonucleotide-mediated MDM4 exon 6 skipping impairs tumor growth. J. Clin. Invest. 126, 68–84 (2016).
Article Google Scholar
Rallapalli, R., Strachan, G., Cho, B., Mercer, W. E. & Hall, D. J. A novel MDMX transcript expressed in a variety of transformed cell lines encodes a truncated protein with potent p53 repressive activity. J. Biol. Chem. 274, 8299–8308 (1999).
Article CAS Google Scholar
Gembarska, A. et al. MDM4 is a key therapeutic target in cutaneous melanoma. Nat. Med. 18, 1239–1247 (2012).
Article CAS Google Scholar
Boutz, P. L., Bhutkar, A. & Sharp, P. A. Detained introns are a novel, widespread class of post-transcriptionally spliced introns. Genes Dev. 29, 63–80 (2015).
Article Google Scholar
Maruvka, Y. E. et al. Analysis of somatic microsatellite indels identifies driver events in human tumors. Nat. Biotechnol. 35, 951–959 (2017).
Article CAS Google Scholar
Zhang, Y. et al. Ribosomal proteins Rpl22 and Rpl22l1 control morphogenesis by regulating pre-mRNA splicing. Cell Reports 18, 545–556 (2017).
Article CAS Google Scholar
Lu, J. et al. MicroRNA expression profiles classify human cancers. Nature 435, 834–838 (2005).
Article CAS ADS Google Scholar
Li, J. et al. Characterization of human cancer cell lines by reverse-phase protein arrays. Cancer Cell 31, 225–239 (2017).
Article CAS Google Scholar
Chen, Y. N. et al. Allosteric inhibition of SHP2 phosphatase inhibits cancers driven by receptor tyrosine kinases. Nature 535, 148–152 (2016).
Article CAS ADS Google Scholar
Wylie, A. A. et al. The allosteric inhibitor ABL001 enables dual targeting of BCR–ABL1. Nature 543, 733–737 (2017).
Article CAS ADS Google Scholar
Haibe-Kains, B. et al. Inconsistency in large pharmacogenomic studies. Nature 504, 389–393 (2013).
Article CAS ADS Google Scholar
The Cancer Cell Line Encyclopedia Consortium & The Genomics of Drug Sensitivity in Cancer Consortium. Pharmacogenomic agreement between two cancer cell line datasets. Nature 528, 84–87 (2015).
Article ADS Google Scholar
Haverty, P. M. et al. Reproducible pharmacogenomic profiling of cancer cell line panels. Nature 533, 333–337 (2016).
Article CAS ADS Google Scholar
Geeleher, P., Gamazon, E. R., Seoighe, C., Cox, N. J. & Huang, R. S. Consistency in large pharmacogenomic studies. Nature 540, E1–E2 (2016).
Article CAS ADS Google Scholar
Bouhaddou, M. et al. Drug response consistency in CCLE and CGP. Nature 540, E9–E10 (2016).
Article CAS Google Scholar
Mpindi, J. P. et al. Consistency in drug response profiling. Nature 540, E5–E6 (2016).
Article CAS Google Scholar
Yu, C. et al. High-throughput identification of genotype-specific cancer vulnerabilities in mixtures of barcoded tumor cell lines. Nat. Biotechnol. 34, 419–423 (2016).
Article CAS Google Scholar
King, A. J. et al. Abstract 2116: Combining the power of different profiling approaches to better understand the activity of kinase inhibitor drugs. Cancer Res. 77, 2116–2116 (2017).
Google Scholar
Fisher, S. et al. A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries. Genome Biol. 12, R1 (2011).
Article Google Scholar
Johannessen, C. M. et al. A melanocyte lineage program confers resistance to MAP kinase pathway inhibition. Nature 504, 138–142 (2013).
Article CAS ADS Google Scholar
Boyle, P. et al. Gel-free multiplexed reduced representation bisulfite sequencing for large-scale DNA methylation profiling. Genome Biol. 13, R92 (2012).
Article CAS Google Scholar
Brat, D. J. et al. Comprehensive, integrative genomic analysis of diffuse lower-grade gliomas. N. Engl. J. Med. 372, 2481–2498 (2015).
Article CAS Google Scholar
Cancer Genome Atlas Research Network. Integrated genomic characterization of papillary thyroid carcinoma. Cell 159, 676–690 (2014).
Article Google Scholar
Huang, F. W. et al. TERT promoter mutations and monoallelic activation of TERT in cancer. Oncogenesis 4, e176 (2015).
Article CAS Google Scholar
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Article CAS Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article CAS Google Scholar
Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.11–11.10.33 (2013).
Google Scholar
Wala, J. A. et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 28, 581–591 (2018).
Article CAS Google Scholar
Haas, B. et al. STAR-Fusion: fast and accurate fusion transcript detection from RNA-seq. Preprint at https://www.bioRxiv.org/content/10.1101/120295v1 (2017).
Kasar, S. et al. Whole-genome sequencing reveals activation-induced cytidine deaminase signatures during indolent chronic lymphocytic leukaemia evolution. Nat. Commun. 6, 8866 (2015).
Article CAS Google Scholar
Kim, J. et al. Somatic ERCC2 mutations are associated with a distinct genomic signature in urothelial tumors. Nat. Genet. 48, 600–606 (2016).
Article CAS Google Scholar
The Cancer Genome Atlas Research Network. Comprehensive and integrated genomic characterization of adult soft tissue sarcomas. Cell 171, 950–965 (2017).
Article Google Scholar
Haradhvala, N. J. et al. Distinct mutational signatures characterize concurrent loss of polymerase proofreading and mismatch repair. Nat. Commun. 9, 1746 (2018).
Article CAS ADS Google Scholar
Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012).
Article CAS Google Scholar
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).
Article CAS Google Scholar
Akalin, A. et al. methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol. 13, R87 (2012).
Article Google Scholar
Ziller, M. J. et al. Charting a dynamic DNA methylation landscape of the human genome. Nature 500, 477–481 (2013).
Article CAS ADS Google Scholar
van der Maaten, L. & Hinton, G. Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
MATH Google Scholar
Szklarczyk, D. et al. STRINGv10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43, D447–D452 (2015).
Article CAS Google Scholar
Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).
Article Google Scholar
Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 6, pl1 (2013).
Article Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Google Scholar
DeLuca, D. S. et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics 28, 1530–1532 (2012).
Article CAS Google Scholar
GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Article Google Scholar
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Article Google Scholar
Smyth, G. K. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, 1–25 (2004).
Akbani, R. et al. A pan-cancer proteomic perspective on The Cancer Genome Atlas. Nat. Commun. 5, 3887 (2014).
Article CAS Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
Article Google Scholar

Download references

Acknowledgements

We thank the Broad Genomics Platform, C. Clish, H. Bitter, A. Najafi and E. Orlando for their contribution. This work was supported by grants from Novartis and partially by NIH/NCI grants 1U01CA217842-01, 1P50CA217685-01, 5P50CA098258, 1U24CA180922-01, 1R50CA211461-01, CA16672, 1R01CA219943-01, 1U54CA224068-01, NIH U01 CA176058 and R21 DA025720. F.W.H. was supported by the Prostate Cancer Foundation. M.I. was supported by a Burroughs Wellcome Fund Career Award. G.G. was partially supported by the Paul C. Zamecnik, MD, Chair in Oncology at MGH. G.B.M. was supported by the Adelson medical research fund. Drug sensitivity results are in part based on data generated by Cancer Target Discovery and Development (CTD2) Network (https://ocg.cancer.gov/programs/ctd2/data-portal) established by the National Cancer Institute’s Office of Cancer Genomics.

Reviewer information

Nature thanks Nevan Krogan, Christoph Plass and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Franklin W. Huang
Present address: University of California San Francisco, San Francisco, CA, USA
Dmitriy Sonkin
Present address: National Cancer Institute, Rockville, MD, USA
William R. Sellers
Present address: Broad Institute of Harvard and MIT, Cambridge, MA, USA
Jordi Barretina
Present address: Girona Biomedical Research Institute (IDIBGI), Girona, Spain
These authors contributed equally: Mahmoud Ghandi, Franklin W. Huang.
These authors jointly supervised this work: Levi A. Garraway, William R. Sellers.

Authors and Affiliations

Broad Institute of Harvard and MIT, Cambridge, MA, USA
Mahmoud Ghandi, Franklin W. Huang, Judit Jané-Valbuena, Gregory V. Kryukov, Christopher C. Lo, Ellen T. Gelfand, Craig M. Bielski, Haoxin Li, Kevin Hu, Alexander Y. Andreev-Drakhlin, Jaegil Kim, Julian M. Hess, Brian J. Haas, François Aguet, Barbara A. Weir, Michael V. Rothberg, Brenton R. Paolella, Michael S. Lawrence, Ali Amin Mansour, Coyin Oh, Juliann Shih, Yanay Rosen, Jonathan Bistline, Jordan E. Taylor, Caitlin M. Dunning, Amanda L. Creech, Allison C. Warren, James M. McFarland, Mahdi Zamanighomi, Nicolas Stransky, Yosef E. Maruvka, Andrew D. Cherniack, Aviad Tsherniak, Francisca Vazquez, Jacob D. Jaffe, Cory M. Johannessen, William C. Hahn, Gad Getz, Jesse S. Boehm, Todd R. Golub & Levi A. Garraway
Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
Franklin W. Huang, Judit Jané-Valbuena, Haoxin Li, Andrew D. Cherniack, Andrew A. Lane, David M. Weinstock, William C. Hahn, Todd R. Golub & Levi A. Garraway
Novartis Institutes for Biomedical Research, Cambridge, MA, USA
E. Robert McDonald III, Jordi Barretina, Kavitha Venkatesan, Anupama Reddy, Dmitriy Sonkin, Manway Liu, Joseph Lehar, Joshua M. Korn, Dale A. Porter, Michael D. Jones, Javad Golji, Giordano Caponigro, Michael P. Morrissey, Frank Stegmeier, Robert Schlegel & William R. Sellers
Massachusetts General Hospital Cancer Center, Boston, MA, USA
Michael S. Lawrence, Yosef E. Maruvka & Gad Getz
Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
Michael S. Lawrence & Gad Getz
Harvard Medical School, Boston, MA, USA
Michael S. Lawrence & Gad Getz
The University of Texas MD Anderson Cancer Center, Houston, TX, USA
Rehan Akbani, Yiling Lu & Gordon B. Mills
Belfer Center for Applied Cancer Science, Boston, MA, USA
Hong L. Tiv & Prafulla C. Gokhale
Novartis Institutes for Biomedical Research, Basel, Switzerland
Antoine de Weck & Audrey Kauffmann
New York Genome Center, New York, NY, USA
Kevin Hadi & Marcin Imielinski
Department of Pathology and Laboratory Medicine, Englander Institute for Precision Medicine, Institute for Computational Biomedicine, and Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
Kevin Hadi & Marcin Imielinski
Howard Hughes Medical Institute, Chevy Chase, MD, USA
Todd R. Golub

Authors

Mahmoud Ghandi
View author publications
You can also search for this author in PubMed Google Scholar
Franklin W. Huang
View author publications
You can also search for this author in PubMed Google Scholar
Judit Jané-Valbuena
View author publications
You can also search for this author in PubMed Google Scholar
Gregory V. Kryukov
View author publications
You can also search for this author in PubMed Google Scholar
Christopher C. Lo
View author publications
You can also search for this author in PubMed Google Scholar
E. Robert McDonald III
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Barretina
View author publications
You can also search for this author in PubMed Google Scholar
Ellen T. Gelfand
View author publications
You can also search for this author in PubMed Google Scholar
Craig M. Bielski
View author publications
You can also search for this author in PubMed Google Scholar
Haoxin Li
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Y. Andreev-Drakhlin
View author publications
You can also search for this author in PubMed Google Scholar
Jaegil Kim
View author publications
You can also search for this author in PubMed Google Scholar
Julian M. Hess
View author publications
You can also search for this author in PubMed Google Scholar
Brian J. Haas
View author publications
You can also search for this author in PubMed Google Scholar
François Aguet
View author publications
You can also search for this author in PubMed Google Scholar
Barbara A. Weir
View author publications
You can also search for this author in PubMed Google Scholar
Michael V. Rothberg
View author publications
You can also search for this author in PubMed Google Scholar
Brenton R. Paolella
View author publications
You can also search for this author in PubMed Google Scholar
Michael S. Lawrence
View author publications
You can also search for this author in PubMed Google Scholar
Rehan Akbani
View author publications
You can also search for this author in PubMed Google Scholar
Yiling Lu
View author publications
You can also search for this author in PubMed Google Scholar
Hong L. Tiv
View author publications
You can also search for this author in PubMed Google Scholar
Prafulla C. Gokhale
View author publications
You can also search for this author in PubMed Google Scholar
Antoine de Weck
View author publications
You can also search for this author in PubMed Google Scholar
Ali Amin Mansour
View author publications
You can also search for this author in PubMed Google Scholar
Coyin Oh
View author publications
You can also search for this author in PubMed Google Scholar
Juliann Shih
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Hadi
View author publications
You can also search for this author in PubMed Google Scholar
Yanay Rosen
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Bistline
View author publications
You can also search for this author in PubMed Google Scholar
Kavitha Venkatesan
View author publications
You can also search for this author in PubMed Google Scholar
Anupama Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Dmitriy Sonkin
View author publications
You can also search for this author in PubMed Google Scholar
Manway Liu
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Lehar
View author publications
You can also search for this author in PubMed Google Scholar
Joshua M. Korn
View author publications
You can also search for this author in PubMed Google Scholar
Dale A. Porter
View author publications
You can also search for this author in PubMed Google Scholar
Michael D. Jones
View author publications
You can also search for this author in PubMed Google Scholar
Javad Golji
View author publications
You can also search for this author in PubMed Google Scholar
Giordano Caponigro
View author publications
You can also search for this author in PubMed Google Scholar
Jordan E. Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Caitlin M. Dunning
View author publications
You can also search for this author in PubMed Google Scholar
Amanda L. Creech
View author publications
You can also search for this author in PubMed Google Scholar
Allison C. Warren
View author publications
You can also search for this author in PubMed Google Scholar
James M. McFarland
View author publications
You can also search for this author in PubMed Google Scholar
Mahdi Zamanighomi
View author publications
You can also search for this author in PubMed Google Scholar
Audrey Kauffmann
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Stransky
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Imielinski
View author publications
You can also search for this author in PubMed Google Scholar
Yosef E. Maruvka
View author publications
You can also search for this author in PubMed Google Scholar
Andrew D. Cherniack
View author publications
You can also search for this author in PubMed Google Scholar
Aviad Tsherniak
View author publications
You can also search for this author in PubMed Google Scholar
Francisca Vazquez
View author publications
You can also search for this author in PubMed Google Scholar
Jacob D. Jaffe
View author publications
You can also search for this author in PubMed Google Scholar
Andrew A. Lane
View author publications
You can also search for this author in PubMed Google Scholar
David M. Weinstock
View author publications
You can also search for this author in PubMed Google Scholar
Cory M. Johannessen
View author publications
You can also search for this author in PubMed Google Scholar
Michael P. Morrissey
View author publications
You can also search for this author in PubMed Google Scholar
Frank Stegmeier
View author publications
You can also search for this author in PubMed Google Scholar
Robert Schlegel
View author publications
You can also search for this author in PubMed Google Scholar
William C. Hahn
View author publications
You can also search for this author in PubMed Google Scholar
Gad Getz
View author publications
You can also search for this author in PubMed Google Scholar
Gordon B. Mills
View author publications
You can also search for this author in PubMed Google Scholar
Jesse S. Boehm
View author publications
You can also search for this author in PubMed Google Scholar
Todd R. Golub
View author publications
You can also search for this author in PubMed Google Scholar
Levi A. Garraway
View author publications
You can also search for this author in PubMed Google Scholar
William R. Sellers
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

For the work described herein, M.G., F.W.H., G.V.K., E.R.M., J.B., G.C., N.S., J.D.J., A.A.L., C.M.J., M.P.M., F.S., R.S., W.C.H., T.R.G., L.A.G. and W.R.S. conceived the studies; M.G., G.V.K., C.C.L., C.M.B., H.L., K. Hu, J.K., J.M.H., B.J.H., F.A., B.A.W., M.S.L., R.A., A.D., A.A.M., C.O., J.S., K. Hadi, K.V., A.R., D.S., M.L., J.L., J.M.K., M.D.J., J.G., A.C.W., J.M.M., M.Z., A.K., N.S., M.I., Y.E.M., A.D.C., A.T. and G.G. performed computational biology analysis; F.W.H., J.J.-V., E.R.M., J.B., A.Y.A.-D., M.V.R., B.R.P., Y.L., H.L.T., P.C.G., D.A.P., G.C., J.E.T., C.M.D., A.L.C., F.V., J.D.J., A.A.L., C.M.J. and F.S. performed biological analysis and interpretation; M.G., G.V.K., Y.R. and J.B. contributed to software development; M.G., F.W.H., J.J.-V., G.V.K., C.C.L., H.L., K. Hu, A.Y.A.-D., M.V.R., R.A., H.L.T. and K. Hadi prepared figures and tables for the main text and Supplementary Information; M.G., F.W.H., J.J.-V. and W.R.S. wrote the paper; E.R.M., J.B., M.V.R., B.R.P., R.A., P.C.G., A.K., A.T., A.A.L., D.M.W., R.S., W.C.H., G.B.M., J.S.B. and L.A.G. commented on and edited the manuscript; E.T.G. performed project management; D.M.W., M.P.M., R.S., W.C.H., G.G., G.B.M., J.S.B. and T.R.G. contributed project oversight and advisory roles; M.G. and G.V.K. were the lead computational biologists; F.S., L.A.G. and W.R.S. provided overall project leadership and developed the concepts and strategy for the project; L.A.G. and W.R.S. were the senior authors.

Corresponding author

Correspondence to William R. Sellers.

Ethics declarations

Competing interests

J.M.K., M.D.J., D.A.P., F.S., E.R.M., J.L., R.S., J.B., A.D., K.V., A.R., J.G., G.C., M.L., A.K., M.P.M. and W.R.S. are current or former Novartis employees and/or stock holders. W.R.S. is a Board or SAB member and holds equity in Peloton Therapeutics and Ideaya Biosciences and has consulted for Array, Astex, Ipsen, Sanofi and Servier. B.A.W. is a J&J employee. F.S. and G.V.K. are KSQ Therapeutics employees. N.S. is an employee of Celsius Therapeutics. A.D.C. receives research support from Bayer AG. G.G. receives research support from IBM and Pharmacyclics and is an inventor on patent applications related to MuTect and ABSOLUTE. G.B.M. consults with AstraZeneca, ImmunoMET, Ionis, Nuevolution, PDX Bio, Signalchem Lifesciences, Symphogen and Tarveda, has stock options with Catena Pharmaceuticals, ImmunoMet, SignalChem, Spindle Top Ventures and Tarveda, has sponsored research funding from Adelson Medical Research Foundation, AstraZeneca, Breast Cancer Research Foundation, Immunomet, Komen Research Foundation, Pfizer, Nanostring, Tesaro, travel support from Chrysallis Bio and has licensed technology to Nanostring and Myriad Genetics. T.R.G. is an advisor to GlaxoSmithKline, is a co-founder of Sherlock Biosciences and was a co-founder and advisor to Foundation Medicine. J.K. is a Tesaro employee. W.C.H. is a consultant for Thermo Fisher, AjuIB, Paraxel and MPM Capital, and is a founder and consultant for KSQ Therapeutics. L.A.G. is an employee of Eli Lilly.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Overview of CCLE cell lines and datasets.

a, The existing and new CCLE datasets as indicated are depicted. b, Distribution of cell lines by lineage and ancestry across CCLE. c, Visual representation of the number of cell lines in each dataset. New CCLE datasets are shown in red. Functional genomics datasets are shown in blue.

Extended Data Fig. 2 CCLE variant calling pipeline and CCLE and GDSC comparison.

a, Unified pipeline integrating mutation and indel calls from different platforms was used to generate a set of high confidence genomic alterations across 1,063 cancer cell lines. Identified variants were cross-referenced with the ExAC and TCGA databases and a panel of normals (PoN) to exclude germline variants/artefacts and generate the finalized high-confidence variant call set. b–d, Comparison of variant calls between CCLE and Sanger GDSC cell lines for germline (b; n = 1,250,562), TCGA hotspot somatic (c; n = 281) and non-hotspot somatic (d; n = 82,572) variants using WES data. Pearson’s correlation coefficients are shown. e, Comparison of TCGA hotspot variant calls between CCLE Hybrid Capture (HC) data and Sanger GDSC WES data. Variants with allelic fraction >0.4 in one dataset and greater than fourfold difference in allelic fractions between the two datasets are shown as open circles (n = 980). f, g, Comparison of Pearson’s correlation coefficients between CCLE WES and Sanger GDSC WES data versus Pearson’s correlation coefficients between CCLE HC and Sanger GDSC WES data for germline (f; n = 107) and somatic (g; n = 93) variants. Cell lines with fewer than 30 variants were excluded. h, Comparison of allelic fraction Pearson’s correlations between CCLE cell lines and Sanger cell lines using CCLE HC and Sanger GDSC WES data (n = 558 common cell lines between the two datasets; Supplementary Table 3). Cell lines with low germline correlation (sample mismatch) and low somatic correlation (genetic drift) are highlighted.

Extended Data Fig. 3 Annotation of structural variants and fusions in CCLE cell lines.

a, Structural variant burden in CCLE whole genomes. Structural variants detected by SvABA in cell lines grouped by tissue type are plotted in the order of mean structural variant burden (red bar in each facet). b, Bar plot of recurrent COSMIC fusions detected in CCLE RNA-seq data coloured by cell line lineage. c, Volcano plot of Achilles RNAi gene dependencies versus CCLE fusions for cell lines (n = 478) common between CCLE and Achilles datasets. P values determined by two-sided t-test. Genes with significant adjusted P values (false discovery rate (FDR) < 0.1) are highlighted. d, e, Examples of fusions associated with gene dependency: cell lines with ESR1-CCDC170 fusion (n = 4) are sensitive to ESR1 shRNA knockdown (d), and cell lines with AFF1-KMT2A fusion (n = 3) are sensitive to AFF1 shRNA knockdown (e). The x axis shows mRNA expression, and the y axis shows Achilles RNAi gene dependency DEMETER score⁵.

Extended Data Fig. 4 Comparison of COSMIC mutational signatures in CCLE and TCGA datasets.

a, Mutational signature activity for CCLE cell lines and TCGA tumours averaged for each cancer type. For each sample, we computed a fraction of mutations attributed to 30 COSMIC signatures and took average across samples in each cancer type. Tumour types selected for representation have at least 20 samples in CCLE. b, Scatterplots for the mutational signature activities for CCLE and TCGA (n = 168). P value determined by linear regression analysis and corrected for COSMIC signature number. c, Volcano plot for comparison of COSMIC mutational signatures and CCLE or GDSC genetic drift estimates using two-sided Pearson’s correlation test (n = 3–459; Supplementary Table 6). d, Scatter plot for COSMIC6 mutational activity signatures versus CCLE or GDSC genetic drift estimates (n = 354). Colour coding as in b. P value determined by Pearson’s correlation test.

Extended Data Fig. 5 Determination of MSI status in the CCLE and interrogation of mismatch repair genes.

a, Identification of MSI cell lines. Number of deletions in microsatellite regions is plotted versus percentage of deletions in microsatellite regions for all cell lines in CCLE HC sequencing, CCLE WGS, CCLE WES, and Sanger GDSC WES datasets (see Methods). The x axis denotes the number of short deletions in microsatellite regions, and the y axis denotes percentage microsatellite as measured by the percentage of short deletions that lie within microsatellite regions. Inferred MSI cell lines are outlined by the green rectangle. b, Heat plot of inferred MSI status and selected CCLE annotations for DNA mismatch repair genes MLH1, MSH2 and MSH6 genes for all cell lines (top) and the MSI subset (bottom). Highlighted red boxes show differences in mRNA and protein expression levels in MSH2 and MSH6. MLH1 hypermethylation is defined as average promoter methylation greater than 0.5. c, d, Scatterplot of CCLE cell lines comparing MSH6 mRNA expression levels (x axis) from RNA-seq versus MSH6 protein abundance (y axis) as quantified by RPPA in inferred-MSI (c) and inferred-MSS (d) cell lines. Red and blue denotes cell lines containing truncating mutations or copy number loss in MSH6 and MSH2, respectively. Purple denotes cell lines containing truncating mutation or copy number loss in both MSH2 and MSH6. The black box highlights the MSH6 high mRNA low protein (HL) category. e–g, Bar plots of percentages of cell lines containing truncating mutations in MSH6 (e) or MSH2 (f), and MLH1 expression loss (g) in different MSH6 mRNA and protein categories among inferred-MSI cell lines (LL: n = 11; HL: n = 17; HH: n = 44). P = 4 × 10⁻⁴ (e), P = 1 × 10⁻³ (f) and P = 1 × 10⁻⁴ (g), two-sided Fisher test. h, MSH2 protein levels in different MSH6 mRNA and protein categories. ***P < 1 × 10⁻⁶, two-sided Wilcoxon rank-sum test. P = 8 × 10⁻¹⁴, difference between the HH and HL set; P = 1 × 10⁻⁸, difference between the HH and LL set. Box plots as defined in Fig. 4d.

Extended Data Fig. 6 Examples of DNA methylation associated with gene expression and dependencies in cell lines.

a, t-SNE plot for DNA methylation data across all CCLE cell lines. Each dot represents a cell line coloured by cell lineage. b, Distribution of mean CpG methylation in CCLE cell lines (n = 843) grouped by cancer type. Box plots as defined in Fig. 4d. c, Correlation of promoter methylation and gene expression for all genes corrected for cancer type (n = 836 cell lines, 18,296 genes). The y axis represents the number of genes, and the x axis is the linear regression coefficient corresponding to normalized promoter DNA methylation. Cancer types were used as covariates in the linear regression analysis. A subset of genes show significant correlation between higher promoter methylation and lower gene expression (n = 7,388; permutation test P < 0.05; Methods). Dotted line shows the empirical null distribution. d, Cell lines with higher levels of RPP25 methylation show decreased RPP25 mRNA expression (Pearson’s r = −0.79, n = 834 cell lines; P < 2.2 × 10⁻¹⁶). e, Comparison of Achilles RNAi RPP25 gene dependency scores for cell lines with and without truncating mutation or copy number loss in POP7 or RPP25L genes (n = 458 cell lines; P = 0.74, two-sided Wilcoxon rank-sum test). Box plots as defined in Fig. 4d. f, Cell lines with higher levels of LDHB methylation show decreased LDHB mRNA expression (Pearson’s r = −0.80, n = 815 cell lines; P < 2.2 × 10⁻¹⁶). g, Cell lines with higher levels of LDHA methylation show decreased LDHA expression. Two cell lines, SK-N-BE2 and U-251-MG, show markedly higher LDHA methylation and decreased LDHA expression (Pearson’s r = −0.27, n = 836; P = 5.34 × 10⁻¹⁶). h, Cell lines with high levels of LDHA methylation display sensitivity to LDHB knockout by CRISPR–Cas9 screening (Pearson’s r = −0.53, n = 371, P < 2.2 × 10⁻¹⁶). i, Promoter methylation versus mRNA expression correlations in TCGA tumour types (sample sizes shown in parentheses). *P < 0.001, Pearson’s correlation test. j, Scatterplot of CCLE lines comparing expression of tumour suppressor VHL (Von Hippel-Landau) mRNA versus VHL methylation (left, all cell lines) and copy number (right, kidney subset). VHL hypermethylation in three kidney cell lines is associated with marked loss of VHL expression. VHL is inactivated by DNA copy number loss, somatic mutation, and promoter hypermethylation.

Extended Data Fig. 7 Global chromatin profiling dataset.

a, Unsupervised clustering of global chromatin profiling data for 897 CCLE cell lines. Each column corresponds to an individual cell line and each row corresponds to a specific combination of chromatin post-translational modifications (‘marks’). For each mark, the fold change relative to the median of cell lines is depicted on the heat map. EZH2, NSD2, CREBBP and EP300 status are annotated. Previously described clusters (associated with EZH2 gain of function, EZH2 loss of function, and NSD2 alterations), as well as the newly identified cluster associated with p300 and CBP gain-of-function alterations, are annotated. b, Volcano plot for truncating mutation enrichment analysis in the newly identified cluster, characterized by marked increases in H3K18 and H3K27 acetylation is shown (n = 893 cell lines; adjusted P values determined by two-sided Fisher’s exact test). EP300 and CREBBP are the top two genes with truncating mutations enriched in this cluster. Only genes with at least 20 affected cell lines (n = 684 genes) were included. c, Distribution of truncating mutations affecting EP300 and CREBBP in the 10 cell lines in the newly identified p300/CBP cluster. Truncating mutations predicted to affect the TAZ2 (CH3) domain specifically are highlighted. Two other truncating mutations not specific to TAZ2 (CH3) are OVCAR-8 (S893*) and COLO-704 (K1469fs).

Extended Data Fig. 8 Comparison of CCLE gene expression data with primary tumour (TCGA) and normal tissue (GTEx) gene expression datasets.

a, Comparison of gene expression profiles between the CCLE cell lines (n = 1,019) and TCGA primary tumours (n = 10,535). For every gene in each dataset, expression values were averaged per cancer type and then mean centred across types. Pearson correlation values were calculated between the CCLE and TCGA cancer types using the (n = 5,000) most highly variable genes. b, Comparison of average gene expression profiles between the CCLE cell lines (n = 1,019) and the GTEx normal tissues (n = 11,688). Similar to a, expression profiles for each tissue type in GTEx was correlated with the CCLE expression profiles (n = 5,000 genes). c, Gene expression comparison of eight prostate cell lines and TCGA primary tumour samples (n = 5,000 genes). d, Gene expression comparison of eight prostate cell lines and GTEx normal tissue samples (n = 5,000 genes).

Extended Data Fig. 9 MDM4 alternative splicing and association with RPL22 and RPL22L1.

a, Distribution of MDM4 exon 6 inclusion (left) and MDM4 mRNA expression (right) correlation with all gene dependencies in the Achilles RNAi dataset (n = 189–478; Supplementary Table 10). b, Correlation of MDM4 exon 6 inclusion with sensitivity to all small molecules in the CTRP AUC dataset using all cell lines. Nutlin-3a is the top drug sensitivity correlated with MDM4 exon 6 inclusion (n = 79–810; Supplementary Table 10). c, Example of nutlin-3a sensitivity versus MDM4 exon 6 inclusion in the AML cell lines (Spearman correlation ρ = −0.64, P = 3 × 10⁻⁴, n = 28). The y axis shows the AUC for nutlin-3a in the CTRP dataset. d, Scatterplot of MDM4 exon 6 inclusion versus RPL22L1 expression for all p53-mutant (left, n = 711) and p53 wild-type (right, n = 288) CCLE cell lines. P values determined by Pearson’s correlation test. e, Frequency of RPL22 recurrent frameshift mutations (left) and copy number deletions (right) in TCGA. f, Frequency of RPL22 recurrent frameshift mutations (left) and copy number deletions (right) in CCLE. g, Correlation of RPL22L1 mRNA expression with RPL22 copy number loss and RPL22 frameshift deletions in TCGA. P value determined by two-sided Kruskal–Wallis rank-sum test. Box plots as defined in Fig. 4d. Values in parentheses denote sample size in each category. h, Correlation of MDM4 exon 6 inclusion with RPL22 copy number loss and RPL22 frameshift deletions in TCGA. P value determined by two-sided Kruskal–Wallis rank-sum test. Box plots are as defined in Fig. 4d. Values in parentheses denote sample size in each category. i, Selected genomic features that correlate with sensitivity to MDM4 shRNA knockdown. mRNA expression of MDM4 and TP53 are shown for comparison.

Extended Data Fig. 10 Examples of microRNA expression associated with gene dependencies in cell lines.

a, t-SNE plot for miRNA data across all CCLE cell lines. Each dot represents a cell line. Each colour represents a different cell lineage. Colour coding is as in Fig. 1. b, Scatter plot of pairwise Pearson’s correlation of gene dependency and miRNA expression (n = 420 cell lines), normalized for each microRNA (z₁, x axis) and each gene dependency (z₂, y axis). Strong outlier pairs with |z₁| > 6 or |z₂| > 6 are highlighted. c, Distribution of Pearson’s correlations of mir-215 expression with Achilles RNAi gene dependencies for 16,871 genes (n = 162–420 cell lines; Supplementary Table 13). CTNNB1 knockdown is the top negative correlate with mir-215 expression. d, Distribution of Pearson’s correlations of CTNNB1 gene dependency with all 734 measured miRNAs (n = 420 cell lines). The expression of mir-215 is the top gene negatively correlated with CTNNB1 dependency. mir-215 and mir-194-1 cluster together at 1q41, whereas mir-192 and mir-194-2 cluster at 11q13.1. mir-215 and mir-192 are close homologues. e, Scatterplot of mir-215 expression versus CTNNB1 dependency of all CCLE cell lines. Colon and stomach lineages are shown in blue and red, respectively. f, Scaled mir-215 expression in TCGA and CCLE datasets (n = 14; mean ± s.e.m.). Stomach and colorectal lineages in both datasets have high mir-215 expression. g, Single-sample gene set enrichment analysis identifies TGFB1 and WNT3A pathway gene sets correlated with mir-215 expression using CCLE RNA-seq data. The gene set ‘Labbe targets of TGFB1 and WNT3A’ of downstream targets of TGF-β and WNT ligands is negatively correlated with mir-215 expression. h, The gene set ‘Labbe targets of TGFB1 and WNT3A’ is negatively correlated with mir-215 expression in the TCGA stomach mRNA expression dataset. i, The gene set ‘Vecchi gastric advanced vs early dn’ of downregulated genes distinguishing between advanced and early gastric cancer subtypes is positively correlated with mir-215 expression in the CCLE. j, mir-215 expression in the stomach TCGA mRNA expression dataset is positively correlated with the ‘Vecchi gastric advanced vs early dn’ gene set.

Extended Data Fig. 11 RPPA analysis.

a, Distribution of Pearson’s correlation coefficient between total protein levels as measured by RPPA and mRNA expression levels measured by RNA-seq (n = 890 cell lines, 154 genes). The empirical null distribution for correlation of mRNA and protein for two random genes is shown for comparison (P < 2.2 × 10⁻¹⁶, two-sided Wilcoxon rank-sum test). b, Effect of RPPA dynamic range on mRNA and protein correlation (n = 96). mRNA and protein correlation is plotted against dynamic range for each validated total protein antibody. Most antibodies with low mRNA and protein correlation tend to have low dynamic range with the exception of the gene VEGFR2, which despite high dynamic range, exhibits very low mRNA and protein correlation. P values determined by two-sided Pearson’s correlation test. c, Effect of RPPA antibody quality and target type on mRNA/protein correlation. On the left, mRNA/protein Pearson correlation is plotted for ‘validated’ (n = 96) and ‘with caution’ (n = 58) antibodies for antibodies against total proteins. On the right, mRNA and protein Pearson’s correlation is plotted for antibodies against total protein (n = 154) and antibodies against phospho-protein (n = 50). Median correlations are 0.62 (validated), 0.48 (caution), 0.54 (total protein), 0.21 (phospho-protein). P values determined by two-sided Wilcoxon rank-sum test. Box plots are as defined in Fig. 4d. d, Comparison of mRNA and protein correlations in CCLE and TCGA (n = 152). The Pearson’s correlation between mRNA and protein levels is calculated for each RPPA antibody in CCLE and TCGA separately. Each dot represents an antibody. Generally, the antibodies with low mRNA and protein correlation in CCLE also have low mRNA and protein correlation in TCGA data. P values determined by two-sided Pearson’s correlation test. e, Distribution of gene dependency (Achilles RNAi) correlations with RPPA pSHP2 level (left, n = 161–411, Supplementary Table 14) and PTPN11 mRNA expression (right, n = 192–478, Supplementary Table 14). PTPN11 dependency is strongly correlated with pSHP2 level, whereas there is no significant correlation with PTPN11 mRNA level. f, Comparison of pSHP2 levels in SHP099-sensitive and -resistant cell lines (n = 60). P value determined by two-sided Wilcoxon rank-sum test. SHP099 sensitivity data were obtained from a previous study²⁶. Box plots are as defined in Fig. 4d. g, Pearson’s correlation of pSHP2 and Sanger GDSC drug sensitivity AUC dataset (n = 265 drugs and 198–588 overlapping cell lines). h, Model error for elastic net model of sensitivity to ponatinib with and without using RPPA data as predictive features. The y axis shows the cross-validation error (fivefold cross-validation) against parameter λ of elastic net (parameter α is fixed at 0.2). Data are mean ± s.d. for the five cross-validation sets. The minimum cross-validation error for models with and without using RPPA data are shown by arrows. i, Elastic net results for sensitivity to ponatinib. pSHP2 is the top feature selected by elastic net. On the left, elastic net weights (averaged over 200 bootstrapping trials) and colour-coded by the frequency each feature was selected by elastic net. The numbers in parentheses are the frequency each feature was selected. Each column is a cell line and each row is a feature. The cell lines are sorted by their sensitivity to ponatinib (shown at the bottom). j, Western blot analysis of pSHP2 and total SHP2 levels across AML and select CML cell lines. Western blots were performed twice independently with similar results. k, Validation of RPPA data for pSHP2. pSHP2 levels measured by western blot are plotted against pSHP2 levels measured by RPPA for the tested AML and control CML cell lines (n = 19). The cell lines are colour-coded by their sensitivity to ponatinib. P values determined by two-sided Pearson’s correlation test. l, In vivo mouse xenograft experiment survival curves of ponatinib-treated and control mice for the low pSHP2 primagraft DFAM-68555 (n = 7 mice in each treatment group). P values determined by log-rank (Mantle–Cox) test. m, Immunohistochemistry of spleen specimens from mice treated with control or ponatinib for 5 days using anti-CD45. Similar results were found using the other two independent sets of mice.

Supplementary information

Supplementary Information

Supplementary Methods, Computational Analysis and Supplementary References.

Reporting Summary

Supplementary Figure 1

Uncropped scans with size marker for (Extended Data Fig. 11j).

Supplementary Table 1

Cell lines annotations and available datasets Cell Line Annotations: List of the cell lines with CCLE IDs, DepMap IDs, and the available annotations. Datasets: Data used to generate Extended Data Fig. 1c. Overlapping cell lines: Number of overlapping cell lines between different datasets. Cell line name changes: List of cell lines with changed CCLE ID.

Supplementary Table 2

RainDance targets List of genomic loci and primer sequences used in RainDance sequencing data.

Supplementary Table 3

CCLE GDSC comparison Data used to generate Extended Data Fig. 2h. r_somatic_CCLE_HC_vs_GDSC_WES: Pearson correlation of somatic variants' allelic fractions between CCLE hybrid capture and Sanger GDSC whole exome sequencing. n_datapoints.somatic: Number of data points used to calculate the correlation (somatic) r_germline_CCLE_HC_vs_GDSC_WES: Pearson correlation of germline variants' allelic fractions between CCLE hybrid capture and Sanger GDSC whole exome sequencing. n_datapoints.germline: Number of data points used to calculate the correlation (germline). comments: Cell lines classification based on CCLE/GDSC concordance.

Supplementary Table 4

Fusion vs dependency analysis Data corresponding to the Extended Data Fig. 3c. This table includes fusions associated with dependencies based on Achilles RNAi, Achilles CRISPR, and Novartis RNAi datasets. Two-sided t-test was used. Sample size (n) is provided for each row.

Supplementary Table 5

TERT promoter mutations List of cell lines and corresponding TERT promoter genotype profiled by whole genome sequencing or targeted sequencing.

Supplementary Table 6

COSMIC mutational signatures analysis Data used in mutational signature analysis (Extended Data Fig. 4). CancerTypes_analyzed: List of COSMIC signatures analyzed in each cancer type. CCLE_perCellLine: Mutational signature activity in each CCLE cell line. CCLE_perTumorType: Average mutational activity in each cancer type in CCLE. TCGA_perTumorType: Average mutational activity in each cancer type in TCGA. Signature_vs_drift_volcanoPlot: Correlation of mutational signatures with genetic drift (Data used in Extended Data Fig4c); Two-sided Pearson correlation test was used. Sample size (n) is given for each row.

Supplementary Table 7

Microsatellite instability (MSI) annotation CCLE.hc.msi_del: Number of short deletions in microsatellite regions in CCLE hybrid capture dataset. CCLE.hc.total_del: Total number of short deletions in CCLE hybrid capture dataset. CCLE.wes.msi_del: Number of short deletions in microsatellite regions in CCLE whole exome sequencing dataset. CCLE.wes.total_del: Total number of short deletions in CCLE whole exome sequencing dataset. CCLE.wgs.msi_del: Number of short deletions in microsatellite regions in CCLE whole genome sequencing dataset. CCLE.wgs.total_del: Total number of short deletions in CCLE whole genome sequencing dataset. GDSC.wes.msi_del: Number of short deletions in microsatellite regions in Sanger GDSC whole exome sequencing dataset. GDSC.wes.total_del: Total number of short deletions in Sanger GDSC whole exome sequencing dataset. GDSC.msi.call: MSI call in Sanger GDSC dataset. CCLE.MSI.call: MSI call in CCLE dataset.

Supplementary Table 8

DNA methylation analysis methylation_vs_dependency: List of gene dependencies and associated promoter methylations (Fig. 2a). average_methylation: Average DNA methylation for each cell line (Extended Data Fig. 6b). mRNA_methylation_correlation: Correlation of promoter methylation and mRNA expression (Extended Data Fig. 6c).

Supplementary Table 9

Comparison of CCLE gene expression with TCGA and GTEx gene expression profiles Data used to generate Extended Data Fig. 8. corr_w_GTEx_tissueTypes: Correlation (n=5000) between CCLE samples and average expression of GTEx tissue types; Rows are individual cell lines and columns are GTEx tissue types. TCGA_CCLE_avg_cor_plot: Correlation (n=5000) between average expression for CCLE cancer types (columns) and average expression of TCGA cancer types (rows). TCGA_CCLE_mean_expressions: Average expression values of TCGA cancer types and CCLE cancer types (rows) across 5000 genes (columns). GTEx_CCLE_avg_cor_plot: Correlation (n=5000) between average expression for CCLE cancer types (columns) and average expression of GTEx tissue types (rows). GTEx_CCLE_mean_expressions: Average expression values of GTEx tissue types and CCLE cancer types (rows) across 5000 genes (columns).

Supplementary Table 10

Alternative splicing analysis Splicing_vs_dependency: Correlation of splicing with gene dependencies (Fig. 4a). Expression_vs_MDM4_splicing: Correlation of MDM4 exon 6 splicing with mRNA expressions (Fig. 4f). Splicing_vs_RPL22L1_expression: Correlation of RPL22L1 expression with mRNA splicing of different exons (Fig. 4g). MDM4_mRNA_vs_dependencies: Correlation of MDM4 mRNA expression with gene dependencies (Extended Data Fig. 9a). MDM4splicing_vs_dependencies: Correlation of MDM4 exon 6 splicing with gene dependencies (Extended Data Fig. 9a). MDM4splicing_vs_drugs_CTRP: Correlation of MDM4 exon 6 splicing with drug sensitivities (Extended Data Fig. 9b).

Supplementary Table 11

MDM4 splicing validation Primer_sequences: list of qPCR primers used in MDM4 splicing validation experiment. RNAseq_vs_qPCR: Data used in Fig. 4c.

Supplementary Table 12

TP53 status in CCLE cell lines p53 mutation, copy number, expression and splicing status in CCLE.

Supplementary Table 13

miRNA expression analysis miRNA_vs_dependency: Correlation of miRNA expressions with gene dependencies (Extended Data Fig. 10b). mir215_vs_dependencies: Correlation of miR215 expression with gene dependencies (Extended Data Fig. 10c). CTNNB1_vs_miRNA: Correlation of CTNNB1 dependency with miRNA expressions (Extended Data Fig. 6c).

Supplementary Table 14

Reverse phase protein array (RPPA) analysis RPPA_Ab_information: List of antibodies used in RPPA analysis. Batch info: Annotates which cell lines were included in each batch of the RPPA profiling. RPPA_vs_Achilles: Correlation of protein expressions with gene dependencies (Fig. 5a). RPPA_pSHP2_vs_Achilles: Correlation of pSHP2 with gene dependencies (Extended Data Fig. 11e). RPPA_PTPN11mRNA_vs_Achilles: Correlation of PTPN11 mRNA expression with gene dependencies (Extended Data Fig. 11e). RPPA_pSHP2_vs_drugs: Correlation of pSHP2 with drug sensitivities (Extended Data Fig. 11g).

Source data

Source Data Fig. 5

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghandi, M., Huang, F.W., Jané-Valbuena, J. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019). https://doi.org/10.1038/s41586-019-1186-3

Download citation

Received: 02 June 2018
Accepted: 09 April 2019
Published: 08 May 2019
Issue Date: 23 May 2019
DOI: https://doi.org/10.1038/s41586-019-1186-3

This article is cited by

Genomic and transcriptomic analysis of breast cancer identifies novel signatures associated with response to neoadjuvant chemotherapy
- Gengshen Yin
- Liyuan Liu
- Zhigang Yu
Genome Medicine (2024)
Integration of single-cell RNA-seq and bulk RNA-seq data to construct and validate a cancer-associated fibroblast-related prognostic signature for patients with ovarian cancer
- Liang Shen
- Aihua Li
- Shiqian Zhang
Journal of Ovarian Research (2024)
Identification of a Notch transcriptomic signature for breast cancer
- Eike-Benjamin Braune
- Felix Geist
- Urban Lendahl
Breast Cancer Research (2024)
COSMIC-based mutation database enhances identification efficiency of HLA-I immunopeptidome
- Fangzhou Wang
- Zhenpeng Zhang
- Shichun Lu
Journal of Translational Medicine (2024)
Activation of the PI3K/AKT signaling pathway by ARNTL2 enhances cellular glycolysis and sensitizes pancreatic adenocarcinoma to erlotinib
- Weiyu Ge
- Yanling Wang
- Jiujie Cui
Molecular Cancer (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.