Cell line misidentification, contamination and poor annotation affect scientific reproducibility. Here we outline simple measures to detect or avoid cross-contamination, present a framework for cell line annotation linked to short tandem repeat and single nucleotide polymorphism profiles, and provide a catalogue of synonymous cell lines. This resource will enable our community to eradicate the use of misidentified lines and generate credible cell-based data.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
American Type Culture Collection Standards Development Organization Workgroup ASN-0002. Cell line misidentification: the beginning of the end. Nature Rev. Cancer 10 441–448 (2010)
Editorial. Identity crisis. Nature 457, 935–936 (2009)
Capes-Davis, A. et al. Match criteria for human cell line authentication: where do we draw the line? Int. J. Cancer 132 2510–2519 (2013)
Dirks, W. G. & Drexler, H. G. STR DNA typing of human cell lines: detection of intra- and interspecies cross-contamination. Methods Mol. Biol. 946 27–38 (2013)
Editorial. Announcement: Reducing our irreproducibility. Nature 496, 398 (2013).
Lorsch, J. R., Collins, F. S. & Lippincott-Schwartz, J. Fixing problems with cell lines. Science 346, 1452–1453 (2014)
Lacroix, M. Persistent use of “false” cell lines. Int. J. Cancer 122 1–4 (2008)
ICLAC. Naming a Cell Line http://iclac.org/resources/cell-line-names/ (2014)
Sarntivijai, S., Ade, A. S., Athey, B. D. & States, D. J. A bioinformatics analysis of the cell line nomenclature. Bioinformatics 24 2760–2766 (2008)
Hunter, L. & Cohen, K. B. Biomedical language processing: what's beyond PubMed? Mol. Cell 21 589–594 (2006)
Forbes, S. A. et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 39 D945–D950 (2011)
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483 603–607 (2012)
Romano, P. et al. Cell Line Data Base: structure and recent improvements towards molecular authentication of human cell lines. Nucleic Acids Res. 37 D925–D932 (2009)
Buehring, G. C., Eby, E. A. & Eby, M. J. Cell line cross-contamination: how aware are mammalian cell culturists of the problem and how to monitor it? In Vitro Cell. Dev. Biol. Anim. 40 211–215 (2004)
Barallon, R. et al. Recommendation of short tandem repeat profiling for authenticating human cell lines, stem cells, and tissues. In Vitro Cell. Dev. Biol. Anim. 46 727–732 (2010)
Parson, W. et al. Cancer cell line identification by short tandem repeat profiling: power and limitations. FASEB J. 19 434–436 (2005)
Santos, F. R., Pandya, A. & Tyler-Smith, C. Reliability of DNA-based sex tests. Nature Genet. 18 103 (1998)
Tanabe, H. et al. Cell line individualization by STR multiplex system in the cell bank found cross-contamination between ECV304 and EJ-1/T24. Tiss. Cult. Res. Commun. 18, 329–338 (1999)
Masters, J. R. et al. Short tandem repeat profiling provides an international reference standard for human cell lines. Proc. Natl Acad. Sci. USA 98 8012–8017 (2001)
Castro, F. et al. High-throughput SNP-based authentication of human cell lines. Int. J. Cancer 132 308–314 (2013)
Much, M., Buza, N. & Hui, P. Tissue identity testing of cancer by short tandem repeat polymorphism: pitfalls of interpretation in the presence of microsatellite instability. Hum. Pathol. 45 549–555 (2014)
Didion, J. P. et al. SNP array profiling of mouse cell lines identifies their strains of origin and reveals cross-contamination and widespread aneuploidy. BMC Genomics 15 847 (2014)
Capes-Davis, A. et al. Check your cultures! A list of cross-contaminated or misidentified cell lines. Int. J. Cancer 127 1–8 (2010)
Cooper, J. K. et al. Species identification in cell culture: a two-pronged molecular approach. In Vitro Cell. Dev. Biol. Anim. 43 344–351 (2007)
Masters, J. R. & Stacey, G. N. Changing medium and passaging cell lines. Nature Protocols 2 2276–2284 (2007)
Zhang, J. et al. Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing. Science. 346, 256–259 (2014)
Masters, J. R. Cell-line authentication: end the scandal of false cell lines. Nature 492 186 (2012)
Nardone, R. M. Eradication of cross-contaminated cell lines: a call for action. Cell Biol. Toxicol. 23 367–372 (2007)
Wellcome Trust Sanger Institute. The Cell Lines Project http://cancer.sanger.ac.uk/cancergenome/projects/cell_lines/about (2015)
Centers for Disease Control and Prevention. International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). (2011)
ICLAC. Database of Cross-contaminated or Misidentified Cell Lines http://iclac.org/databases/cross-contaminations/ (version 7 2, released 10 October 2014).
Wang, J. et al. High-throughput single nucleotide polymorphism genotyping using nanofluidic Dynamic Arrays. BMC Genomics 10 561 (2009)
Parodi, B. et al. Species identification and confirmation of human and animal cell lines: a PCR-based method. Biotechniques 32 432–434,–436, 438–440 (2002)
Steube, K. G., Meyer, C., Uphoff, C. C. & Drexler, H. G. A simple method using beta-globin polymerase chain reaction for the species identification of animal cell lines–a progress report. In Vitro Cell. Dev. Biol. Anim. 39 468–475 (2003)
Hebert, P. D., Cywinska, A., Ball, S. L. & deWaard, J. R. Biological identifications through DNA barcodes. Proc. R. Soc. Lond. B 270 313–321 (2003)
We thank S. Ghosh for bioinformatics support, E. Hall and Y. Reid (ATCC) for their intellectual input and expertise in genetic testing. M. Kline for supplying STR profiles. J. Settleman and D. Stokoe for discussions.
The majority of authors are employees of Genentech Inc. and/or hold stock in Roche.
Extended data figures and tables
a, Comparison of STR and SNP frequency distributions of pairwise identity alignment scores for 836 lines. Identity scores are computed using the Tanabe algorithm for both 8-locus STR and 48-locus SNP genotype results (compare with Fig. 2a). Total number of comparisons was 349,030 (348,953 non-synonymous and 77 synonymous pairs of cell lines). For plotting purposes, a random subset of 25,000 non-synonymous pairs is displayed. As a consequence of using fewer STR loci, non-synonymous STR standard deviation increased from 0.083 to 0.113, and more truly synonymous pairs now fall below the mean-plus-4-s.d. cutoff. b, Univariate distribution of SNP Tanabe identity scores for data shown in Fig. 2. Results for 2,862 replicate pairs are shown as black dots. (Synonymous pairs are included in density computation, but are so rare compared to non-synonymous pairs that they make no visible change in plotted curve.) Vertical scale is such that total area under curve is 1 unit. Reference lines were computed using non-synonymous pairs only. c, As for b, but showing 16-locus STR identity scores. True replicate pairs are shown in black; pairwise identity scores for a set of seven HeLa-derived lines—which are closely related genetically, but do not constitute true replicates—are shown in red. A mean ± 4s.d. reference line corresponding to a P value of 3.2 × 10−5, is shown for both graphs. Note that reference line is better separated from true replicate results for STR data than for SNP data.
Extended Data Figure 2 Impact of changing the confidence threshold on detecting cell line contamination by SNP profiling.
a, SNP detection using the Fluidigm system was performed on DNA extracted from differing ratios of AU565:Panc 08.13 cells. The raw data was analysed using confidence thresholds of 65 (Th65), 85 (Th85), 90 (Th90) and 95 (Th95). Examples of data are shown for Th65 and Th95. For each SNP XX, XY and YY allele calls are represented by green, blue and red, respectively, and no calls are in grey. b, Table showing percent identity when SNP calls were compared with the database of SNPs. As the confidence threshold increased, a lower level of contamination could be detected as evidenced by decreased correlation values. Ratios depict the relative abundance of AU565:Panc 08.13 cells (for example, 99:2 = 99% AU565 mixed with 2% Panc 08.13). Data are representative of at least two independent experiments.
Extended Data Figure 3 Electropherograms and table of results for STR profiling of DNA extracted from differing ratios of AU565:Panc 08.13 cells.
STRs were determined (see Methods) for DNA extracted from differing ratios of AU565:Panc 08.13 cells. a, Example electropherograms for five (D3S1358, THO1, D21S11, D18S51 and Penta E) of the 16 STR markers are shown. Ratios depict the relative abundance of AU565:Panc 08.13 cells (for example, 99:2 = 99% AU565 mixed with 2% Panc 08.13). Data are representative of at least two independent experiments. b, Table showing STR calls for all STR loci and the top matches when compared to the database of STR calls (Supplementary Table 3).
a, Images of early (p4) and later (p8) passage CoCM-1 cells in culture showing a subpopulation of small, round, loosely attached cells overwhelming the culture over time. b, c, PCR-based detection of human (left panel) and mouse (right panel) cytochrome b oxidase I (COX1) in cell lines (b) and in titrated mixtures of human (MOLT4) and mouse (STV2) cell lines (c) to determine limit of detection. 18S, PCR loading control. d, Flow cytometric analysis of mouse and human CD29 staining in contaminated CoCM-1 cell line. Data are representative of at least two independent experiments.
About this article
Cite this article
Yu, M., Selvaraj, S., Liang-Chu, M. et al. A resource for cell line authentication, annotation and quality control. Nature 520, 307–311 (2015). https://doi.org/10.1038/nature14397
Molecular Carcinogenesis (2020)
ACS Omega (2020)
Molecular Biology Reports (2020)
CCLA: an accurate method and web server for cancer cell line authentication using gene expression profiles
Briefings in Bioinformatics (2020)