Reproducibility in research can be compromised by both biological and technical variation, but most of the focus is on removing the latter. Here we investigate the effects of biological variation in HeLa cell lines using a systems-wide approach. We determine the degree of molecular and phenotypic variability across 14 stock HeLa samples from 13 international laboratories. We cultured cells in uniform conditions and profiled genome-wide copy numbers, mRNAs, proteins and protein turnover rates in each cell line. We discovered substantial heterogeneity between HeLa variants, especially between lines of the CCL2 and Kyoto varieties, and observed progressive divergence within a specific cell line over 50 successive passages. Genomic variability has a complex, nonlinear effect on transcriptome, proteome and protein turnover profiles, and proteotype patterns explain the varying phenotypic response of different cell lines to Salmonella infection. These findings have implications for the interpretation and reproducibility of research results obtained from human cultured cells.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Data availability

RNA-seq data are available on GEO (GSE111485). The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE70 partner repository with the dataset identifier PXD009273. The full dataset is available at https://HelaProt.shinyapps.io/Crosslab/.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Capes-Davis, A. et al. Check your cultures! A list of cross-contaminated or misidentified cell lines. Int. J. Cancer 127, 1–8 (2010).

  2. 2.

    Zhao, M. et al. Assembly and initial characterization of a panel of 85 genomically validated cell lines from diverse head and neck tumor sites. Clin. Cancer Res. 17, 7248–7264 (2011).

  3. 3.

    Lorsch, J. R., Collins, F. S. & Lippincott-Schwartz, J. Fixing problems with cell lines. Science 346, 1452–1453 (2014).

  4. 4.

    Yu, M. et al. A resource for cell line authentication, annotation and quality control. Nature 520, 307–311 (2015).

  5. 5.

    Almeida, J. L., Cole, K. D. & Plant, A. L. Standards for cell line authentication and beyond. PLoS Biol. 14, e1002476 (2016).

  6. 6.

    Muff, R. et al. Genomic instability of osteosarcoma cell lines in culture: impact on the prediction of metastasis relevant genes. PLoS One 10, e0125611 (2015).

  7. 7.

    Frattini, A. et al. High variability of genomic instability and gene expression profiling in different HeLa clones. Sci. Rep. 5, 15377 (2015).

  8. 8.

    Ben-David, U. et al. Genetic and transcriptional evolution alters cancer cell line drug response. Nature 560, 325–330 (2018).

  9. 9.

    Bottomley, R. H., Trainer, A. L. & Griffin, M. J. Enzymatic and chromosomal characterization of HeLa variants. J. Cell Biol. 41, 806–815 (1969).

  10. 10.

    Nelson-Rees, W. A., Hunter, L., Darlington, G. J. & O’Brien, S. J. Characteristics of HeLa strains: permanent vs. variable features. Cytogenet. Cell Genet. 27, 216–231 (1980).

  11. 11.

    Macville, M. et al. Comprehensive and definitive molecular cytogenetic characterization of HeLa cells by spectral karyotyping. Cancer Res. 59, 141–150 (1999).

  12. 12.

    Rutledge, S. What HeLa cells are you using? The Winnower https://doi.org/10.15200/winn.143896.65158 (2014).

  13. 13.

    Landry, J. J. et al. The genomic and transcriptomic landscape of a HeLa cell line. G3 (Bethesda) 3, 1213–1224 (2013).

  14. 14.

    Adey, A. et al. The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line. Nature 500, 207–211 (2013).

  15. 15.

    Williams, E. G. et al. Systems proteomics of liver mitochondria function. Science 352, aad0189 (2016).

  16. 16.

    Gillet, L. C. et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, O111.016717 (2012).

  17. 17.

    Rosenberger, G. et al. Statistical control of peptide and protein error rates in large-scale targeted data-independent acquisition analyses. Nat. Methods 14, 921–927 (2017).

  18. 18.

    Rosenberger, G. et al. A repository of assays to quantify 10,000 human proteins by SWATH-MS. Sci. Data 1, 140031 (2014).

  19. 19.

    Röst, H. L. et al. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat. Biotechnol. 32, 219–223 (2014).

  20. 20.

    Röst, H. L. et al. TRIC: an automated alignment strategy for reproducible protein quantification in targeted proteomics. Nat. Methods 13, 777–783 (2016).

  21. 21.

    Schwanhäusser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011).

  22. 22.

    Jovanovic, M. et al. Dynamic profiling of the protein life cycle in response to pathogens. Science 347, 1259038 (2015).

  23. 23.

    Liu, Y. et al. Systematic proteome and proteostasis profiling in human trisomy 21 fibroblast cells. Nat. Commun. 8, 1212 (2017).

  24. 24.

    Fasterius, E. et al. A novel RNA sequencing data analysis method for cell line authentication. PLoS One 12, e0171435 (2017).

  25. 25.

    Forbes, S. A. et al. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 43, D805–D811 (2015).

  26. 26.

    Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).

  27. 27.

    Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

  28. 28.

    Liu, Y., Beyer, A. & Aebersold, R. On the dependency of cellular protein levels on mRNA abundance. Cell 165, 535–550 (2016).

  29. 29.

    Fortelny, N., Overall, C. M., Pavlidis, P. & Freue, G. V. C. Can we predict protein from mRNA levels? Nature 547, E19–E20 (2017).

  30. 30.

    Lundberg, E. et al. Defining the transcriptome and proteome in three functionally different human cell lines. Mol. Syst. Biol. 6, 450 (2010).

  31. 31.

    Claydon, A. J. & Beynon, R. Proteome dynamics: revisiting turnover with a global perspective. Mol. Cell. Proteomics 11, 1551–1565 (2012).

  32. 32.

    Ruepp, A. et al. CORUM: the comprehensive resource of mammalian protein complexes—2009. Nucleic Acids Res. 38, D497–D501 (2010).

  33. 33.

    Stingele, S. et al. Global analysis of genome, transcriptome and proteome reveals the response to aneuploidy in human cells. Mol. Syst. Biol. 8, 608 (2012).

  34. 34.

    Dephoure, N. et al. Quantitative proteomic analysis reveals posttranslational responses to aneuploidy in yeast. eLife 3, e03023 (2014).

  35. 35.

    Thul, P. J. et al. A subcellular map of the human proteome. Science 356, eaal3321 (2017).

  36. 36.

    Ambros, V. The functions of animal microRNAs. Nature 431, 350–355 (2004).

  37. 37.

    Roush, S. & Slack, F. J. The let-7 family of microRNAs. Trends Cell Biol. 18, 505–516 (2008).

  38. 38.

    Schulte, L. N., Eulalio, A., Mollenkopf, H. J., Reinhardt, R. & Vogel, J. Analysis of the host microRNA response to Salmonella uncovers the control of major cytokines by the let-7 family. EMBO J. 30, 1977–1989 (2011).

  39. 39.

    Agarwal, V., Bell, G. W., Nam, J. W. & Bartel, D. P. Predicting effective microRNA target sites in mammalian mRNAs. eLife 4, 05005 (2015).

  40. 40.

    Misselwitz, B. et al. RNAi screen of Salmonella invasion shows role of COPI in membrane targeting of cholesterol and Cdc42. Mol. Syst. Biol. 7, 474 (2011).

  41. 41.

    Kreibich, S. et al. Autophagy proteins promote repair of endosomal membranes damaged by the Salmonella type three secretion system 1. Cell Host Microbe 18, 527–537 (2015).

  42. 42.

    Criss, A. K. & Casanova, J. E. Coordinate regulation of Salmonella enterica serovar Typhimurium invasion of epithelial cells by the Arp2/3 complex and Rho GTPases. Infect. Immun. 71, 2885–2891 (2003).

  43. 43.

    Cossart, P. & Helenius, A. Endocytosis of viruses and bacteria. Cold Spring Harb. Perspect. Biol. 6, a016972 (2014).

  44. 44.

    Misselwitz, B. et al. Near surface swimming of Salmonella Typhimurium explains target-site selection and cooperative invasion. PLoS Pathog. 8, e1002810 (2012).

  45. 45.

    Kleensang, A. et al. Genetic variability in a frozen batch of MCF-7 cells invisible in routine authentication affecting cell function. Sci. Rep. 6, 28994 (2016).

  46. 46.

    Leung, E., Kim, J. E., Askarian-Amiri, M., Finlay, G. J. & Baguley, B. C. Evidence for the existence of triple-negative variants in the MCF-7 breast cancer cell population. Biomed. Res. Int. 2014, 836769 (2014).

  47. 47.

    Lin, Y. C. et al. Genome dynamics of the human embryonic kidney 293 lineage in response to cell biology manipulations. Nat. Commun. 5, 4767 (2014).

  48. 48.

    Geraghty, R. J. et al. Guidelines for the use of cell lines in biomedical research. Br. J. Cancer 111, 1021–1046 (2014).

  49. 49.

    Pamies, D. & Hartung, T. 21st century cell culture for 21st century toxicology. Chem. Res. Toxicol. 30, 43–52 (2017).

  50. 50.

    Lancaster, M. A. & Knoblich, J. A. Organogenesis in a dish: modeling development and disease using organoid technologies. Science 345, 1247125 (2014).

  51. 51.

    Drubin, D. G. & Hyman, A. A. Stem cells: the new “model organism”. Mol. Biol. Cell. 28, 1409–1411 (2017).

  52. 52.

    Venkatraman, E. S. & Olshen, A. B. A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 23, 657–663 (2007).

  53. 53.

    Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25, 1105–1111 (2009).

  54. 54.

    Leinonen, R., Sugawara, H. & Shumway, M. The sequence read archive. Nucleic Acids Res. 39, D19–D21 (2011).

  55. 55.

    Andrews, S. FastQC: a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2018).

  56. 56.

    Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

  57. 57.

    Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

  58. 58.

    Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome AnalysisToolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33 (2013).

  59. 59.

    Cirulli, E. T. et al. Screening the human exome: a comparison of whole genome and whole transcriptome sequencing. Genome. Biol. 11, R57 (2010).

  60. 60.

    Liu, Y. et al. Quantitative variability of 342 plasma proteins in a human twin population. Mol. Syst. Biol. 11, 786 (2015).

  61. 61.

    Collins, B. C. et al. Quantifying protein interaction dynamics by SWATH mass spectrometry: application to the 14-3-3 system. Nat. Methods 10, 1246–1253 (2013).

  62. 62.

    Ludwig, C., Claassen, M., Schmidt, A. & Aebersold, R. Estimation of absolute protein quantities of unlabeled samples by selected reaction monitoring mass spectrometry. Mol. Cell. Proteomics 11, M111.013987 (2012).

  63. 63.

    Kunszt, P. et al. iPortal: the Swiss grid proteomics portal: requirements and new features based on experience and usability considerations. Concurr. Comput. 27, 433–445 (2015).

  64. 64.

    Shteynberg, D. et al. iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol. Cell. Proteomics 10, M111.007690 (2011).

  65. 65.

    Lam, H. et al. Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7, 655–667 (2007).

  66. 66.

    Pratt, J. M. et al. Dynamics of protein turnover, a missing dimension in proteomics. Mol. Cell. Proteomics 1, 579–591 (2002).

  67. 67.

    Boisvert, F. M. et al. A quantitative spatial proteomics analysis of proteome turnover in human cells. Mol. Cell. Proteomics 11, M111.011429 (2012).

  68. 68.

    Zeiler, M., Straube, W. L., Lundberg, E., Uhlen, M. & Mann, M. A protein epitope signature tag (PrEST) library allows SILAC-based absolute quantification and multiplexed determination of protein copy numbers in cell lines. Mol. Cell. Proteomics 11, O111.009613 (2012).

  69. 69.

    Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 45, D362–D368 (2017).

  70. 70.

    Vizcaíno, J. A. et al. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 44, D447–D456 (2016).

Download references


We thank G. Rosenberger, A. Beyer, B. Collins and S. Nikolaev for discussions. We thank L. Reiter, R. Bruderer and O. Rinner from Biognosys AG for sharing their thoughts about cell line proteome analysis from a commercial perspective. We thank H. Zhang and J. Chen from Johns Hopkins University, D. Pflieger and O. Filhol-Cochet from CEA Grenoble, M. Riwanto from University Hospital Zurich, U. Greber and M. Suomalainen from the University of Zurich, C. Arrieumerlou from the University of Basel (through InfectX), M. Beck and M.-T. Mackmull from the European Molecular Biology Laboratory, C. Jorgensen and J. Worboys from the Cancer Research UK Manchester Institute, M. Peter and C. Barnes from ETH Zurich, and A. Venkitaraman and C. Williams from the University of Cambridge for providing us their HeLa cells.

The work was supported by the SystemsX.ch project PhosphoNetX PPM (to R.A.), TargetInfectX (to C.D.), the Swiss National Science Foundation (grant 3100A0-688 107679 to R.A.), the European Research Council (ERC-20140AdG 670821 to R.A.), the JRC for Computational Biomedicine (which was partially funded by Bayer AG, to J.S.-R.), the Swiss National Science Foundation (grant 163180 to S.E.A.), the European Research Council (grants AdG 249968 to S.E.A. and 616441-DISEASEAVATARS to G.T.), the Umberto Veronesi Foundation (fellowship to P.-L.G.), the ERA-NET Neuron Program (P.-L.G.), Regione Lombardia (Ricerca Indipendente 2012 to G.T.) and the Italian Ministry of Health (Ricerca Corrente to G.T.) E.G.W. was supported by an NIH F32 Ruth Kirchstein Fellowship (F32GM119190).

Author information

Author notes

  1. These authors contributed equally: Yang Mi, Torsten Mueller, Saskia Kreibich.


  1. Department of Pharmacology, Yale University School of Medicine, New Haven, CT, USA

    • Yansheng Liu
  2. Yale Cancer Biology Institute, Yale University, West Haven, CT, USA

    • Yansheng Liu
  3. Heidelberg University, Faculty of Biosciences, Heidelberg, Germany

    • Yang Mi
  4. Joint Research Center for Computational Biomedicine (JRC-COMBINE), Faculty of Medicine, RWTH Aachen University, Aachen, Germany

    • Yang Mi
    •  & Julio Saez-Rodriguez
  5. Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland

    • Torsten Mueller
    • , Evan G. Williams
    • , Audrey Van Drogen
    • , Max Frank
    • , Isabell Bludau
    • , Martin Mehnert
    •  & Ruedi Aebersold
  6. Institute of Microbiology, ETH Zurich, Zurich, Switzerland

    • Saskia Kreibich
    •  & Wolf-Dietrich Hardt
  7. Department of Genetic Medicine and Development, University of Geneva Medical School, and University Hospitals of Geneva, Geneva, Switzerland

    • Christelle Borel
    • , Fedor Bezrukov
    •  & Stylianos E. Antonarakis
  8. IEO, European Institute of Oncology IRCCS, Milan, Italy

    • Pierre-Luc Germain
    •  & Giuseppe Testa
  9. Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany

    • Michael Seifert
  10. National Center for Tumor Diseases, Dresden, Germany

    • Michael Seifert
  11. Biozentrum, University of Basel, Basel, Switzerland

    • Mario Emmenlauer
    • , Isabel Sorg
    •  & Christoph Dehio
  12. Service of Genetic Medicine, University Hospitals of Geneva, Geneva, Switzerland

    • Frederique Sloan Bena
    •  & Stylianos E. Antonarakis
  13. Department of Analytical Chemistry and CAS Key Laboratory of Receptor Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China

    • Hu Zhou
  14. Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy

    • Giuseppe Testa
  15. Institute for Computational Biomedicine, Heidelberg University, Faculty of Medicine, Bioquant Heidelberg, Germany

    • Julio Saez-Rodriguez
  16. iGE3 Institute of Genetics and Genomics of Geneva, Geneva, Switzerland

    • Stylianos E. Antonarakis
  17. Faculty of Science, University of Zurich, Zurich, Switzerland

    • Ruedi Aebersold


  1. Search for Yansheng Liu in:

  2. Search for Yang Mi in:

  3. Search for Torsten Mueller in:

  4. Search for Saskia Kreibich in:

  5. Search for Evan G. Williams in:

  6. Search for Audrey Van Drogen in:

  7. Search for Christelle Borel in:

  8. Search for Max Frank in:

  9. Search for Pierre-Luc Germain in:

  10. Search for Isabell Bludau in:

  11. Search for Martin Mehnert in:

  12. Search for Michael Seifert in:

  13. Search for Mario Emmenlauer in:

  14. Search for Isabel Sorg in:

  15. Search for Fedor Bezrukov in:

  16. Search for Frederique Sloan Bena in:

  17. Search for Hu Zhou in:

  18. Search for Christoph Dehio in:

  19. Search for Giuseppe Testa in:

  20. Search for Julio Saez-Rodriguez in:

  21. Search for Stylianos E. Antonarakis in:

  22. Search for Wolf-Dietrich Hardt in:

  23. Search for Ruedi Aebersold in:


Y.L. and R.A. designed and supervised the whole project. Y.L., Y.M., E.G.W., P.-L.G., M.F., I.B., M.S., M.E. and F.B. analyzed the data and performed the bioinformatics analysis. Y.M. developed the HeLa Proteome website. T.M. performed the pSILAC experiment. S.K. and Y.L. performed the Let7 experiment. S.K. performed the S.Tm infection experiment. A.V.D., C.B, I.S., C.D. and H.Z. established and cultured the cell lines. Y.L. and M.M. performed the mass spectrometry experiments. I.B. performed pyProphet analysis. F.S.B. generated CNV data. M.S. processed the CNV data. C.B. generated RNA-seq data. M.F. performed sequence variation analysis. F.B. and P.-L.G. analyzed RNA-seq data. M.E. analyzed the microscopy phenotypic data. G.T. and J.S.-R. supervised data interpretation. S.E.A. supervised the genomics data generation. W.-D.H. supervised all the microbiology experiments and provided critical inputs. Y.L., E.G.W. and R.A. wrote the paper.

Competing interests

R.A. holds shares of Biognosys AG, which operates in the field covered by the article.

Corresponding authors

Correspondence to Yansheng Liu or Ruedi Aebersold.

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1–19, Supplementary Table 1 and Supplementary Notes 1–6

  2. Reporting Summary

About this article

Publication history