Abstract
Reproducibility in research can be compromised by both biological and technical variation, but most of the focus is on removing the latter. Here we investigate the effects of biological variation in HeLa cell lines using a systems-wide approach. We determine the degree of molecular and phenotypic variability across 14 stock HeLa samples from 13 international laboratories. We cultured cells in uniform conditions and profiled genome-wide copy numbers, mRNAs, proteins and protein turnover rates in each cell line. We discovered substantial heterogeneity between HeLa variants, especially between lines of the CCL2 and Kyoto varieties, and observed progressive divergence within a specific cell line over 50 successive passages. Genomic variability has a complex, nonlinear effect on transcriptome, proteome and protein turnover profiles, and proteotype patterns explain the varying phenotypic response of different cell lines to Salmonella infection. These findings have implications for the interpretation and reproducibility of research results obtained from human cultured cells.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
RNA-seq data are available on GEO (GSE111485). The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE70 partner repository with the dataset identifier PXD009273. The full dataset is available at https://HelaProt.shinyapps.io/Crosslab/.
References
Capes-Davis, A. et al. Check your cultures! A list of cross-contaminated or misidentified cell lines. Int. J. Cancer 127, 1–8 (2010).
Zhao, M. et al. Assembly and initial characterization of a panel of 85 genomically validated cell lines from diverse head and neck tumor sites. Clin. Cancer Res. 17, 7248–7264 (2011).
Lorsch, J. R., Collins, F. S. & Lippincott-Schwartz, J. Fixing problems with cell lines. Science 346, 1452–1453 (2014).
Yu, M. et al. A resource for cell line authentication, annotation and quality control. Nature 520, 307–311 (2015).
Almeida, J. L., Cole, K. D. & Plant, A. L. Standards for cell line authentication and beyond. PLoS Biol. 14, e1002476 (2016).
Muff, R. et al. Genomic instability of osteosarcoma cell lines in culture: impact on the prediction of metastasis relevant genes. PLoS One 10, e0125611 (2015).
Frattini, A. et al. High variability of genomic instability and gene expression profiling in different HeLa clones. Sci. Rep. 5, 15377 (2015).
Ben-David, U. et al. Genetic and transcriptional evolution alters cancer cell line drug response. Nature 560, 325–330 (2018).
Bottomley, R. H., Trainer, A. L. & Griffin, M. J. Enzymatic and chromosomal characterization of HeLa variants. J. Cell Biol. 41, 806–815 (1969).
Nelson-Rees, W. A., Hunter, L., Darlington, G. J. & O’Brien, S. J. Characteristics of HeLa strains: permanent vs. variable features. Cytogenet. Cell Genet. 27, 216–231 (1980).
Macville, M. et al. Comprehensive and definitive molecular cytogenetic characterization of HeLa cells by spectral karyotyping. Cancer Res. 59, 141–150 (1999).
Rutledge, S. What HeLa cells are you using? The Winnower https://doi.org/10.15200/winn.143896.65158 (2014).
Landry, J. J. et al. The genomic and transcriptomic landscape of a HeLa cell line. G3 (Bethesda) 3, 1213–1224 (2013).
Adey, A. et al. The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line. Nature 500, 207–211 (2013).
Williams, E. G. et al. Systems proteomics of liver mitochondria function. Science 352, aad0189 (2016).
Gillet, L. C. et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, O111.016717 (2012).
Rosenberger, G. et al. Statistical control of peptide and protein error rates in large-scale targeted data-independent acquisition analyses. Nat. Methods 14, 921–927 (2017).
Rosenberger, G. et al. A repository of assays to quantify 10,000 human proteins by SWATH-MS. Sci. Data 1, 140031 (2014).
Röst, H. L. et al. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat. Biotechnol. 32, 219–223 (2014).
Röst, H. L. et al. TRIC: an automated alignment strategy for reproducible protein quantification in targeted proteomics. Nat. Methods 13, 777–783 (2016).
Schwanhäusser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011).
Jovanovic, M. et al. Dynamic profiling of the protein life cycle in response to pathogens. Science 347, 1259038 (2015).
Liu, Y. et al. Systematic proteome and proteostasis profiling in human trisomy 21 fibroblast cells. Nat. Commun. 8, 1212 (2017).
Fasterius, E. et al. A novel RNA sequencing data analysis method for cell line authentication. PLoS One 12, e0171435 (2017).
Forbes, S. A. et al. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 43, D805–D811 (2015).
Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Liu, Y., Beyer, A. & Aebersold, R. On the dependency of cellular protein levels on mRNA abundance. Cell 165, 535–550 (2016).
Fortelny, N., Overall, C. M., Pavlidis, P. & Freue, G. V. C. Can we predict protein from mRNA levels? Nature 547, E19–E20 (2017).
Lundberg, E. et al. Defining the transcriptome and proteome in three functionally different human cell lines. Mol. Syst. Biol. 6, 450 (2010).
Claydon, A. J. & Beynon, R. Proteome dynamics: revisiting turnover with a global perspective. Mol. Cell. Proteomics 11, 1551–1565 (2012).
Ruepp, A. et al. CORUM: the comprehensive resource of mammalian protein complexes—2009. Nucleic Acids Res. 38, D497–D501 (2010).
Stingele, S. et al. Global analysis of genome, transcriptome and proteome reveals the response to aneuploidy in human cells. Mol. Syst. Biol. 8, 608 (2012).
Dephoure, N. et al. Quantitative proteomic analysis reveals posttranslational responses to aneuploidy in yeast. eLife 3, e03023 (2014).
Thul, P. J. et al. A subcellular map of the human proteome. Science 356, eaal3321 (2017).
Ambros, V. The functions of animal microRNAs. Nature 431, 350–355 (2004).
Roush, S. & Slack, F. J. The let-7 family of microRNAs. Trends Cell Biol. 18, 505–516 (2008).
Schulte, L. N., Eulalio, A., Mollenkopf, H. J., Reinhardt, R. & Vogel, J. Analysis of the host microRNA response to Salmonella uncovers the control of major cytokines by the let-7 family. EMBO J. 30, 1977–1989 (2011).
Agarwal, V., Bell, G. W., Nam, J. W. & Bartel, D. P. Predicting effective microRNA target sites in mammalian mRNAs. eLife 4, 05005 (2015).
Misselwitz, B. et al. RNAi screen of Salmonella invasion shows role of COPI in membrane targeting of cholesterol and Cdc42. Mol. Syst. Biol. 7, 474 (2011).
Kreibich, S. et al. Autophagy proteins promote repair of endosomal membranes damaged by the Salmonella type three secretion system 1. Cell Host Microbe 18, 527–537 (2015).
Criss, A. K. & Casanova, J. E. Coordinate regulation of Salmonella enterica serovar Typhimurium invasion of epithelial cells by the Arp2/3 complex and Rho GTPases. Infect. Immun. 71, 2885–2891 (2003).
Cossart, P. & Helenius, A. Endocytosis of viruses and bacteria. Cold Spring Harb. Perspect. Biol. 6, a016972 (2014).
Misselwitz, B. et al. Near surface swimming of Salmonella Typhimurium explains target-site selection and cooperative invasion. PLoS Pathog. 8, e1002810 (2012).
Kleensang, A. et al. Genetic variability in a frozen batch of MCF-7 cells invisible in routine authentication affecting cell function. Sci. Rep. 6, 28994 (2016).
Leung, E., Kim, J. E., Askarian-Amiri, M., Finlay, G. J. & Baguley, B. C. Evidence for the existence of triple-negative variants in the MCF-7 breast cancer cell population. Biomed. Res. Int. 2014, 836769 (2014).
Lin, Y. C. et al. Genome dynamics of the human embryonic kidney 293 lineage in response to cell biology manipulations. Nat. Commun. 5, 4767 (2014).
Geraghty, R. J. et al. Guidelines for the use of cell lines in biomedical research. Br. J. Cancer 111, 1021–1046 (2014).
Pamies, D. & Hartung, T. 21st century cell culture for 21st century toxicology. Chem. Res. Toxicol. 30, 43–52 (2017).
Lancaster, M. A. & Knoblich, J. A. Organogenesis in a dish: modeling development and disease using organoid technologies. Science 345, 1247125 (2014).
Drubin, D. G. & Hyman, A. A. Stem cells: the new “model organism”. Mol. Biol. Cell. 28, 1409–1411 (2017).
Venkatraman, E. S. & Olshen, A. B. A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 23, 657–663 (2007).
Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25, 1105–1111 (2009).
Leinonen, R., Sugawara, H. & Shumway, M. The sequence read archive. Nucleic Acids Res. 39, D19–D21 (2011).
Andrews, S. FastQC: a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2018).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome AnalysisToolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33 (2013).
Cirulli, E. T. et al. Screening the human exome: a comparison of whole genome and whole transcriptome sequencing. Genome. Biol. 11, R57 (2010).
Liu, Y. et al. Quantitative variability of 342 plasma proteins in a human twin population. Mol. Syst. Biol. 11, 786 (2015).
Collins, B. C. et al. Quantifying protein interaction dynamics by SWATH mass spectrometry: application to the 14-3-3 system. Nat. Methods 10, 1246–1253 (2013).
Ludwig, C., Claassen, M., Schmidt, A. & Aebersold, R. Estimation of absolute protein quantities of unlabeled samples by selected reaction monitoring mass spectrometry. Mol. Cell. Proteomics 11, M111.013987 (2012).
Kunszt, P. et al. iPortal: the Swiss grid proteomics portal: requirements and new features based on experience and usability considerations. Concurr. Comput. 27, 433–445 (2015).
Shteynberg, D. et al. iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol. Cell. Proteomics 10, M111.007690 (2011).
Lam, H. et al. Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7, 655–667 (2007).
Pratt, J. M. et al. Dynamics of protein turnover, a missing dimension in proteomics. Mol. Cell. Proteomics 1, 579–591 (2002).
Boisvert, F. M. et al. A quantitative spatial proteomics analysis of proteome turnover in human cells. Mol. Cell. Proteomics 11, M111.011429 (2012).
Zeiler, M., Straube, W. L., Lundberg, E., Uhlen, M. & Mann, M. A protein epitope signature tag (PrEST) library allows SILAC-based absolute quantification and multiplexed determination of protein copy numbers in cell lines. Mol. Cell. Proteomics 11, O111.009613 (2012).
Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 45, D362–D368 (2017).
Vizcaíno, J. A. et al. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 44, D447–D456 (2016).
Acknowledgements
We thank G. Rosenberger, A. Beyer, B. Collins and S. Nikolaev for discussions. We thank L. Reiter, R. Bruderer and O. Rinner from Biognosys AG for sharing their thoughts about cell line proteome analysis from a commercial perspective. We thank H. Zhang and J. Chen from Johns Hopkins University, D. Pflieger and O. Filhol-Cochet from CEA Grenoble, M. Riwanto from University Hospital Zurich, U. Greber and M. Suomalainen from the University of Zurich, C. Arrieumerlou from the University of Basel (through InfectX), M. Beck and M.-T. Mackmull from the European Molecular Biology Laboratory, C. Jorgensen and J. Worboys from the Cancer Research UK Manchester Institute, M. Peter and C. Barnes from ETH Zurich, and A. Venkitaraman and C. Williams from the University of Cambridge for providing us their HeLa cells.
The work was supported by the SystemsX.ch project PhosphoNetX PPM (to R.A.), TargetInfectX (to C.D.), the Swiss National Science Foundation (grant 3100A0-688 107679 to R.A.), the European Research Council (ERC-20140AdG 670821 to R.A.), the JRC for Computational Biomedicine (which was partially funded by Bayer AG, to J.S.-R.), the Swiss National Science Foundation (grant 163180 to S.E.A.), the European Research Council (grants AdG 249968 to S.E.A. and 616441-DISEASEAVATARS to G.T.), the Umberto Veronesi Foundation (fellowship to P.-L.G.), the ERA-NET Neuron Program (P.-L.G.), Regione Lombardia (Ricerca Indipendente 2012 to G.T.) and the Italian Ministry of Health (Ricerca Corrente to G.T.) E.G.W. was supported by an NIH F32 Ruth Kirchstein Fellowship (F32GM119190).
Author information
Authors and Affiliations
Contributions
Y.L. and R.A. designed and supervised the whole project. Y.L., Y.M., E.G.W., P.-L.G., M.F., I.B., M.S., M.E. and F.B. analyzed the data and performed the bioinformatics analysis. Y.M. developed the HeLa Proteome website. T.M. performed the pSILAC experiment. S.K. and Y.L. performed the Let7 experiment. S.K. performed the S.Tm infection experiment. A.V.D., C.B, I.S., C.D. and H.Z. established and cultured the cell lines. Y.L. and M.M. performed the mass spectrometry experiments. I.B. performed pyProphet analysis. F.S.B. generated CNV data. M.S. processed the CNV data. C.B. generated RNA-seq data. M.F. performed sequence variation analysis. F.B. and P.-L.G. analyzed RNA-seq data. M.E. analyzed the microscopy phenotypic data. G.T. and J.S.-R. supervised data interpretation. S.E.A. supervised the genomics data generation. W.-D.H. supervised all the microbiology experiments and provided critical inputs. Y.L., E.G.W. and R.A. wrote the paper.
Corresponding authors
Ethics declarations
Competing interests
R.A. holds shares of Biognosys AG, which operates in the field covered by the article.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–19, Supplementary Table 1 and Supplementary Notes 1–6
Rights and permissions
About this article
Cite this article
Liu, Y., Mi, Y., Mueller, T. et al. Multi-omic measurements of heterogeneity in HeLa cells across laboratories. Nat Biotechnol 37, 314–322 (2019). https://doi.org/10.1038/s41587-019-0037-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41587-019-0037-y
This article is cited by
-
Impact of mechanical cues on key cell functions and cell-nanoparticle interactions
Discover Nano (2024)
-
Mass spectrometry-based proteomics data from thousands of HeLa control samples
Scientific Data (2024)
-
Selective gene expression maintains human tRNA anticodon pools during differentiation
Nature Cell Biology (2024)
-
The CUL5 E3 ligase complex negatively regulates central signaling pathways in CD8+ T cells
Nature Communications (2024)
-
A uniform data processing pipeline enables harmonized nanoparticle protein corona analysis across proteomics core facilities
Nature Communications (2024)