High variability of genomic instability and gene expression profiling in different HeLa clones

The HeLa cell line is one of the most popular cell lines in biomedical research, despite its well-known chromosomal instability. We compared the genomic and transcriptomic profiles of 4 different HeLa batches and showed that the gain and loss of genomic material varies widely between batches, drastically affecting basal gene expression. Moreover, different pathways were activated in response to a hypoxic stimulus. Our study emphasizes the large genomic and transcriptomic variability among different batches, to the point that the same experiment performed with different batches can lead to distinct conclusions and irreproducible results. The HeLa cell line is thought to be a unique cell line but it is clear that substantial differences between the primary tumour and the human genome exist and that an indeterminate number of HeLa cell lines may exist, each with a unique genomic profile.

Scientific RepoRts | 5:15377 | DOi: 10.1038/srep15377 To our knowledge, all of the reports regarding HeLa CIN have been conducted on a single cell clone or on sub-clones 20,24 . Recently, the HeLa genome and transcriptome was exhaustively characterized by deep RNA and DNA sequencing performed on a single clone from CLS Cell Lines Service GmbH (the HeLa Kyoto cell line) 25 . To investigate the reported instability of the HeLa cell lines and to verify the possibility that erroneous conclusions could be generated by the use of HeLa cells obtained from different laboratories, we compared different "batches" obtained from four Italian laboratories (HeLaSR, HeLaV, HeLaP, and HeLaH). HeLaP and HeLaH were derived from the same batch but had been cultured in two different laboratories for approximately 8 years; similarly, HeLaV was derived from the same batch as HeLaSR but had been cultured in two different laboratories for approximately 12 years.

Results
HeLa karyotyping and a-CGH show genomic instability. We evaluated the genomic stability of these "batches" by karyotyping, array-comparative genomic hybridization (a-CGH) and fluorescent in situ hybridization (FISH described in the next section).
Cytogenetic analysis was performed on 20 metaphase spreads stained using the quinacrine-based Q-banding technique (QFQ). The HeLa cells were near-triploid, containing a range of 60-66 chromosomes in HeLaSR,  in HeLaV, 71-79 in HeLaP, and 63-79 in HeLaH. In addition, approximately 10% of the metaphase spreads of clone HeLaSR exhibited chromosome numbers ranging from 101 to 148, whereas HeLaV (derived from the HeLaSR batch) exhibited a range of 124-198 chromosomes in approximately 30% of its metaphase spreads. We excluded the coexistence of two different sub-clones by monoclonal sub-culturing.
Ten HeLaV clones, derived from a monoclonal sub-culture, showed the presence of both metaphase populations (with a range of 49-81 and a range of 92-240 chromosomes), suggesting that a single cell could carry the inherent defect, possibly generated by new mutation events. For our FISH and transcriptomic analysis, we used the batch HeLaSR because it showed fewer mitoses with double the modal chromosome number compared to HeLaV.
To characterize all of the genomic imbalances of the four "batches", we performed a-CGH. As showed in the Circos plot 26 (Fig. 1A), a comparison between the HeLa DNA content and the diploid human genome evinced a substantial difference among the four lines analysed. The comparison of the 4 HeLa clones underscored that although the amount of gain and loss of genomic material compared to the human diploid genome is highly variant among the 4 clones, there is similarity within the pairs with a common origin, i.e., HeLaH/HeLaP vs HeLaSR/HeLaV (Fig. 1B).
To express the similarities among the HeLa clones, we calculated the percentage of the genomes with the same annotation (diploid, deleted or amplified). The results, depicted in Fig. 1C, show high similarity within the two pairs HeLaSR/HeLaV and HeLaH/HeLaP. We emphasize that even recently split-out batches (HeLaSR vs HeLaV and HeLaH vs HeLaP) present specific gains or losses and that these new events are acquired and stably maintained. A possible hypothesis to explain the different chromosome imbalances between the two pairs could be ancestral mutational events that conferred an evolutionary advantage in DNA losses or gains.

FISH analysis highlights new specific markers.
To identify chromosomal rearrangements and to verify the presence of HeLa-specific markers observed by karyotyping 17,20 , we performed a FISH analysis to paint chromosomes 1, 3, 5, 9, 13 and 19 in HeLaSR and HeLaP because they appeared to be the two most different clones by a-CGH. As shown in Table 1, HeLaP displays almost all of the HeLa-specific markers previously identified 12,20,25 , whereas HeLaSR has lost most of them. In addition, we found in each clone the presence of new specific markers ( Supplementary Fig. S1), confirming an independent evolution of each batches.
Transcriptomic analysis reveals that different batches exposed to hypoxic stimulus display differences in gene expression. Our genomic analysis uncovered deep variability between thebatches , and because genomic gains or losses or whole-chromosome aneuploidy can have drastic consequences on gene expression 27 , we performed a whole-transcriptome analysis to verify the transcriptional effects of this differential chromosome instability. The microarray expression analysis was performed on HeLaSR and HeLaP, the batches most different in terms of genomic content. These cell lines were exposed to hypoxic conditions, chosen because the induced hypoxic pathway is well characterized 28,29 .
A principal-component analysis (PCA) was performed to determine the expression trends within the dataset. PCA is a useful technique to reduce the dimensionality of large data sets and to visually assess similarities and differences between samples 30 . PCA was used to identify trends in the regulation of genes induced by hypoxic exposure and to map the entire dataset on a two-axis graph (principal components, PC1 and PC2) where the distances account for similarity ( Fig. 2A); the closer the distance between samples, the more they are similar. The main divergence was due to two different clones, HeLaSR vs HeLaP (triangles and circles, respectively), as demonstrated by the first principal component (PC1). The second component (PC2), summarizing the effects of hypoxia, highlighted strong changes in HeLaSR and mild changes in HeLaP. Indeed, untreated HeLaP and HeLaP after hypoxia are closer to each other than untreated HeLaSR is to HeLaSR after hypoxia, meaning that hypoxia has a stronger effect in HeLaSR than in HeLaP ( Fig. 2A). Furthermore, we identified the genes regulated by hypoxic treatment in these These three gene lists were analysed for gene ontology (GO) enrichment (Fig. 2C). The more significant GO classes those that are differentially expressed between HeLaSR and HeLaP, demonstrating the large differences in the expression profiles of the two cell lines in response to hypoxia.
The classes related to hypoxia were present in both, but they were not the most significant (Fig. 2C). On the contrary, when we analysed the 89 mis-regulated genes common to both cell lines, the most enriched classes are related to hypoxia (Fig. 2B,C). These results suggest that the different genomic contents of the two clones have a remarkable influence on basal gene expression and, consequently, on the response to a hypoxic stimulus. The cells activate different pathways, but interestingly, the regulation of hypoxia-related genes is preserved.
Real-time RT-PCR confirmed the microarray data. A subset of regulated genes identified for their relevance in the hypoxic response were validated by real-time RT-PCR (qPCR) (Supplementary Table  S2). The trend of the array data was confirmed by qRT-PCR (Supplementary Table S3).
Copy number variation influences gene expression. Because significant chromosome copy number alterations are frequently associated with gene expression changes in the affected regions 31 , we evaluated the possible effects of copy number variation on gene expression 32 in our samples. Based on the data obtained with a-CGH and the expression array performed on HeLaP and HeLaSR samples as previously described, a general lack of dosage compensation was observed in HeLaP (Fig. 3A) and in HeLaSR (Fig. 3B), meaning that gene dosage is correlated to gene expression.

Discussion
Genetic aberrations, such as gene amplification, deletions, and loss of heterozygosity, are hallmarks of cancer and are major contributors to the neoplastic process through the accumulation of mutations in Chromosome markers previously reported [17][18][19][20] HeLaSR HeLaP  Supplementary Fig. S1). Ten metaphases for each HeLa clone were analysed.
Scientific RepoRts | 5:15377 | DOi: 10.1038/srep15377 specific genes. The mechanisms through which these mutations are generated are the subject of continued debate [33][34][35][36][37] , and several cancer-predisposing mutations affect genes that are responsible for maintaining the integrity and number of chromosomes during cell division.
The main factors that control chromosome stability are telomere maintenance, cell division mechanisms, and the mitotic checkpoint that ensures correct chromosome segregation [38][39][40] .
Consequently, the archetypical transformation of tumour cells results in CIN 41,42 . Established cell lines have been traditionally used to characterize the biological significance of specific genomic aberrations identified in primary tumours 43 and to test the therapeutic efficacy of anticancer agents 44 .
HeLa was the first cultured cancer line 45 . Although its CIN has already been extensively investigated 19 (see Supplementary Table S1), demonstrating the low degree of similarity with the initial tumour and with the human diploid genome, this cell line is largely used in biomedical research. The HeLa cell line is among the most frequently utilized models in research to validate drugs for cancer therapy 46 and for genetic/epigenetic manipulation using demethylation agents 47 , siRNA and expression vectors to study gene function [48][49][50] .
Here, we report the high level of CIN in HeLa clones obtained from different laboratories. We demonstrated that each clone accumulates genomic variability in a time-dependent manner. We speculate that some chromosomal rearrangements or point mutations that randomly arose in each clone increased survival, leading to a kind of "genetic drift" in clonal variants.
This conclusion is evident in the differing DNA content of the clones HeLaSR and HeLaV, which showed a loss of genomic DNA, whereas HeLaH and HeLaP exhibited a gain of genomic DNA in comparison with the diploid human genome. We speculate that even slightly different culture conditions may produce this divergence. The number of passages that a cell line undergoes can certainly lead to extensive modifications in growth rate, morphology, aneuploidy, chromosome alterations, gene expression and drug sensitivity, depending on the culture environment [51][52][53][54][55] . In addition, we demonstrated that the genomic gains or losses or whole-chromosome aneuploidy is specific to each analysed clone (Fig. 1A) and that these differences in genomic content have a remarkable influence on basal gene expression ( Fig. 2A). We showed that in response to a hypoxic stimulus, each cell line activates different pathways, although the regulation of genes related to hypoxia is conserved ( Fig. 2A,C).
These results suggest that different HeLa batches used for the same experiment may yield different results. The large differences in the basal gene expression profiles suggest that the use of uncharacterized clones may lead to faulty conclusions and to irreproducible results in studies of gene function and pathway analysis. With respect to genomic content, the use of uncharacterized batches of HeLa cells 56 may result in different behaviours between clones in response to specific stimuli, such as cancer drugs, used in biomedical research.
Our results suggest that not a single HeLa cell line, or even a set of similar HeLa cell lines, exists. Rather, an indeterminate number of clones exist, each carrying large genomic differences that lead to different expression profiles. The HeLa cell recalls the main character of the novel by Pirandello (the Italian poet awarded the 1934 Nobel Prize in Literature): "One, no one, one hundred thousand" 57 : the HeLa cell line is thought to be a unique cell line ("one"), but it is clear that large differences exist between the tumour from which it was derived and the human genome ("no one") and that an indeterminate number of HeLa cell lines are scattered in laboratories worldwide ("one hundred thousand" or more), each with a unique genomic profile.

Methods
Cell lines. The cell lines HeLaP, HeLaH, HeLaSR and HeLaV were obtained from four Italian laboratories. HeLaP and HeLaH were derived from the same cell-line batch but were cultured in two different laboratories for approximately 8 years. Similarly, HeLaSR and HeLaV were derived from the same cell-line batch but were cultured in two different laboratories for approximately 12 years. In our lab, the mycoplasma-free cell lines were cultured in DMEM-F12 (Euroclone SpA, Milan, Italy) with 10% heat-inactivated foetal bovine serum (Euroclone), 1 mM glutamine, 100 U/ml penicillin and 100 μ g/ml streptomycin and were incubated at 37 °C and 5% CO 2 . The cells were cultured to reach a confluence of 80-90%.
Cytogenetic analysis. Metaphase spreads were obtained from the four cell lines. After culturing, the cells were treated with 0.04 μ g/ml colcemid for 2 hours. The cells were harvested by treatment with a 1× trypsin/EDTA solution for approximately 5 min. A hypotonic treatment was performed with 0.96% Na citrate for 15 min at 37 °C, and the cells were subsequently fixed in a fresh methanol/acetic acid (3/1, v/v) solution. Metaphase spreads were stained overnight with 0.005% quinacrine mustard solution in McIlvaine buffer (QFQ-banding), and standard cytogenetic analyses were performed with a Leica D5000B fluorescent microscope.

Monoclonal subculturing by limiting dilution. To exclude the possibility that the HeLaV and
HeLaSR cell lines arose from 2 different HeLa subclones, we obtained a cell culture derived from a single cell by limiting-dilution culture. We prepared serial dilutions to obtain a suspension containing 1 cell/μ l. A single μ l of the final cell suspension was seeded into each well of a 96-well plate containing growth medium. The presence of a single cell was evaluated by microscopy, and the cells were incubated at 37 °C and 5% CO 2 overnight. After one day, we used microscopy to confirm the presence of a single cluster of cells (2 or more) per well, implying a single-cell origin, and we excluded any wells with double or multiple clusters of cells. The monoclonal cell lines were trypsinised and seeded in T25 flasks to obtain sufficient cells for the metaphase spread analysis.

a-CGH analysis. a-CGH was performed on an Agilent microarray platform (Agilent Technologies
Inc., Santa Clara, CA USA) with an Agilent's human 4 × 180 K CGH slide. Sample preparation, labelling, and microarray hybridization were performed according to the Agilent CGH Enzymatic Protocol version 7.3. Slides were scanned using the Agilent G2565CA scanner and analysed using the Agilent Feature Extraction 10.7.3.1 software. The a-CGH profile was extrapolated using the Agilent Genomic Workbench 6.5.0.18 software with the following parameters: ADM-2 threshold 6, Fuzzy Zero ON, and Centralization OFF. Coverage plots were generated using the Circos visualization tool compared with the 2.2 × 10 9 genome covered by the Agilent probes.
Hypoxic treatment. Before administering the hypoxia treatment, the HeLaP and HeLaSR cells were maintained at 70-80% confluence. The medium was refreshed before the hypoxia treatment. Cells (two flasks per cell line) were maintained under hypoxic (1% O 2 , 5% CO 2 , 94% N 2 ) conditions or normoxic (95% air and 5% CO 2 ) conditions for 24 h. RNA extraction. Total RNA was purified from the HeLa cells using the RNeasy Plus kit (Qiagen).
RNA was quantified using an ND-1000 UV-Vis spectrophotometer (Thermo Scientific, Wilmington, DE, USA), and the integrity of the RNA was assessed with the Agilent 2100 Bioanalyzer (Agilent Technologies Inc.) according to the manufacturer's instructions. All of the RNA samples used in this study exhibited a 260/280 ratio above 1.9 and an RNA Integrity Number (RIN) above 9.0.
Microarray expression profiling. The microarray experiment included two biological replicates per treatment. All sample-labelling, hybridization, washing, and scanning steps were conducted according to the manufacturer's specifications. Briefly, Cy3-labelled cRNA was generated from 50 ng input total RNA using the One Color Quick Amp Labelling Kit (Agilent Technologies Inc.). For every sample, 600 ng cRNA from each labelling reaction (with a specific activity above 9.0) was hybridized using the Gene Expression Hybridization Kit (Agilent Technologies Inc.) to the Agilent Whole Human Genome Oligo Microarray (Agilent Technologies Inc.), which is in a 4 × 44k 60-mer slide format, where each of the 4 arrays represents approximately 41,000 unique genes and transcripts. After hybridization, the slides were washed and then scanned with the Agilent G2565BA Microarray Scanner (Agilent Technologies Inc.). The fluorescence intensities of the scanned images were extracted and pre-processed using the Agilent Feature Extraction Software (10.7.3.1).
Gene expression data analysis. Quality control and array normalization were performed in the R statistical environment using the Agi4 × 44PreProcess (v 1.18.0) package, which was downloaded from the Bioconductor web site. The normalization and filtering steps were based on those described in the Agi4 × 44PreProcess reference manual. Briefly, the Agi4 × 44PreProcess options were set to use the Mean Signal and the BG Median Signal as foreground and background signals, respectively. The data were normalized between arrays using the quantile method. Genes with a fold change greater than 1 log 2 were designated as modulated. All of the above computations were conducted using the R statistical programming environment. Principal-component analysis (PCA) was performed on all genes under investigation to determine their expression trends within the dataset. PCA is a useful technique to reduce the dimensionality of large data sets. The expression analysis systematic explorer (EASE) biological theme analysis of the regulated genes was conducted online using DAVID.

Quantitative real-time PCR validation of microarray data. A real-time quantitative PCR
(qRT-PCR) analysis was performed, in triplicate, on the same RNA samples that were used for the microarray hybridization to validate the microarray results. One μ g of total RNA was retro-transcribed using SuperScript II (Life Technologies, Carlsbad, CA, USA). qRT-PCR was performed using the iQ SYBR ® Green supermix (Bio-Rad, Hercules, CA, USA,) on the ABI Prism 7000 platform (Applied Biosystems) with the following thermal cycling protocol: denaturation for 1 min at 95 °C followed by 40 cycles of 15 sec at 95 °C and 1 min at 60 °C. The primers are listed in Supplementary Table S3. A relative quantitative analysis was performed using the 2 −ΔΔ Ct method.