Genome-wide methylation analysis identifies genes silenced in non-seminoma cell lines

Silencing of genes by DNA methylation is a common phenomenon in many types of cancer. However, the genome-wide effect of DNA methylation on gene expression has been analysed in relatively few cancers. Germ cell tumours (GCTs) are a complex group of malignancies. They are unique in developing from a pluripotent progenitor cell. Previous analyses have suggested that non-seminomas exhibit much higher levels of DNA methylation than seminomas. The genomic targets that are methylated, the extent to which this results in gene silencing and the identity of the silenced genes most likely to play a role in the tumours’ biology have not yet been established. In this study, genome-wide methylation and expression analysis of GCT cell lines was combined with gene expression data from primary tumours to address this question. Genome methylation was analysed using the Illumina infinium HumanMethylome450 bead chip system and gene expression was analysed using Affymetrix GeneChip Human Genome U133 Plus 2.0 arrays. Regulation by methylation was confirmed by demethylation using 5-aza-2-deoxycytidine and reverse transcription–quantitative PCR. Large differences in the level of methylation of the CpG islands of individual genes between tumour cell lines correlated well with differential gene expression. Treatment of non-seminoma cells with 5-aza-2-deoxycytidine verified that methylation of all genes tested played a role in their silencing in yolk sac tumour cells and many of these genes were also differentially expressed in primary tumours. Genes silenced by methylation in the various GCT cell lines were identified. Several pluripotency-associated genes were identified as a major functional group of silenced genes.


INTRODUCTION
Promoter hypermethylation of many different tumour suppressor genes is seen in a wide range of cancers. 1,2 This has been assumed, though only occasionally demonstrated, to silence the expression of those genes. The term 'methylator phenotype' or CpG island methylator phenotype has been coined to describe subgroups of cancers, such as some colon tumours and gliomas, that exhibit particularly high levels of methylation of a consistent subset of genes, usually in and around their CpG islands. [3][4][5][6][7] Testicular germ cell tumours (TGCTs) are the most common malignancy of young men. Despite high cure rates in response to platinum-based chemotherapy, they still represent a fatal disease in a minority of patients presenting with disseminated disease 8,9 and the prognosis in children is much worse than in adults. 10 GCTs are an exceptional group of tumours in many respects. They are the only class of cancer that arises from a pluripotent progenitor cell (the germ cell progenitor, PGC) and that cell exhibits profoundly different DNA methylation characteristics to all somatic cell types. They present as several remarkably varied histological phenotypes classified as seminomatous or nonseminomatous. Seminomatous tumours (called seminomas in the testes, dysgerminomas in the ovary and germinomas in extragonadal sites) exhibit a relatively uniform histology with a similarity to germ cell progenitors. Non-seminomatous tumours, such as yolk sac tumours (YSTs) and embryonal carcinomas (EC), tend to be more aggressive and resistant to therapy than seminomatous tumours, 8,9,11 especially in intracranial cases seen in children. 10 Despite frequently having already metastasised at presentation, most TGCTs are exceptionally chemosensitive. Their progression from Intratubular Germ Cell Neoplasia, Unspecified (ICGNU) gives rise to seminoma or to the various non-seminomas. The more aggressive and chemoresistant non-seminomas can arise from seminoma, even within the same tumour 12 or as a recurrence after treatment. 13 There is some evidence that progression to nonseminomas involves a dramatic increase in DNA methylation. 14,15 Since all forms of GCT are believed to progress from ICGNU, which, like germ cell progenitors, is hypomethylated, methylation must be an event associated with their progression rather than tumour initiation. 16 Two recent studies of the global methylation of paediatric GCTs demonstrated the hypermethylation of many candidate tumour suppressor genes. 14,15 Although these showed a dramatic difference in methylation between GCT subtypes, with seminomas showing much less methylation than non-seminomas, they could not identify, in an unbiased manner, those genes that were silenced by methylation. A critical question, therefore, is the extent to which methylation is linked to gene silencing and how the position of that methylation within the genes relates to this. 1 In this study, we set out to analyse the relationship between DNA methylation of different genes and gene elements and consequent gene silencing. For this purpose, we needed to rely on cell lines because they provide a more homogenous system (as compared with the heterogeneity of primary tumour samples) and where the causative role of DNA methylation in gene silencing can be tested. Two recent studies have been published that analysed global DNA methylation in GCT cell lines. Rijlarsdaam et al. 17 analysed methylation in cell lines derived from multiple types of GCT but they did not determine the relationship of this methylation to gene expression, while van der Zwan et al. 18 analysed both methylation and gene expression, but only in seminoma versus EC cell lines. Here, the Illumina infinium HumanMethylome450 bead chip system, which surveys over 99% of RefSeq genes with an average of 17 CpG sites per gene was used. To gain a comprehensive view of the correlation between methylation and gene expression, these same cell lines were analysed using Affymetrix expression arrays. Finally, key genes identified were tested to determine if they were activated by demethylation, confirming that DNA methylation was indeed playing a role in their reduced expression. These data were also compared with gene expression in a cohort of primary GCT samples using the same array platform. These data confirm that the cell lines derived from different histological subtypes of non-seminoma exhibit much greater gene-associated methylation than the seminoma cells and identify a group of pluripotencyassociated genes, which are silenced in the YST as compared with the seminoma cell line.

RESULTS
Relationships between genes methylated in different GCT subtypes DNA methylation was analysed in four adult GCT-derived cell lines (TCam-2, GCT44, GCT27 and NT2D1, subsequently referred to as Seminoma, YST, EC and Teratoma cell lines, respectively) on a genome-wide scale using the Infinium HumanMethylome450 array chip (University of London, London, UK; See Supplementary Dataset 1 and 2). In all cell lines, the lowest level of methylation was concentrated in CpG islands ( Figure 1).
All three non-seminoma cell lines showed higher numbers of methylated island CpGs (β-value ⩾ 0.6) than the seminoma cell line. For the EC and teratoma cell lines, a similar degree of difference to seminoma was also seen in all other regions (shores, shelves and 'open sea'), whereas for the YST cells this difference decreased in the shores and the number of methylated CpGs in shelves and open sea was lower than that in the seminoma cells where many more were unmethylated (β-value o0.3) (Figure 1).
We next set out to identify the specific genes where CpG islands were differentially methylated between seminoma and nonseminoma cell lines. We chose to initially select those genes that were methylated across CpG islands near to the TSS (CpG islands exhibiting an average methylation β-value ⩾ 0.6 were recorded as methylated). This analysis showed that the EC cell line had the highest number of genes with TSS-associated methylated CpG islands, followed by YST, teratoma and seminoma cells (Figure 2a).
To establish similarities and differences between the cell lines, the overlap in the lists of genes methylated in each cell type were identified (Figure 2b). Genes methylated in seminoma cells are largely a subset of those methylated in EC and teratoma cells. Indeed, 94% of genes (337/358) methylated in the seminoma cell line are also methylated in EC and/or teratoma cells with 62% of these genes being methylated in all three non-seminoma subtypes (Figure 2b). The population of genes methylated in the YST cell line appears to be more strikingly different. The YST cell line exhibited the highest number of uniquely methylated genes relative to the total number of methylated genes (270/806, Figure 2b). By comparison, only 16 genes were methylated uniquely in the seminoma cell line.  Figure 1. Percentage of CpGs methylated (β-value ⩾ 0.6) described relative to CpG islands. The 450 K arrays provide a quantitative reading (β-value) from 0 (unmethylated) to 1 (completely methylated) for individual CpG sites, each described in relation to the closest gene and the nearest CpG island. These are annotated as island (a region of at least 500 bp, with 455% GC and an observed-to-expected CpG ratio 40.65), shore (regions 2 kb either side of an island), shelf (regions 2 kb outside of the shores) or 'other' , also referred to as 'open sea' . Transcriptome analysis reveals that over 50% of differentially methylated genes show a corresponding difference in gene expression The best-documented mechanism for a gene methylation event to contribute to the biology of a cell is through altering that gene's expression. We therefore set out to determine to what extent the gene methylation events described above were reflected in silencing of these genes.
RNA was isolated from each of the cell lines and subjected to gene expression analysis using Affymetrix U133 plus 2 chips (the University of Nottingham, Nottingham, UK; See Supplementary Dataset 3). The data were analysed to assess the relationship between CpG methylation and gene expression. In particular, we asked if high methylation correlated with low expression (see Supplementary Dataset 4). The degree of correlation between the level of differential methylation of the various gene elements (regarding islands, shores or shelves) and inverse gene expression was analysed pairwise between cell lines. We divided gene elements into categories of differential methylation at average Δβ intervals of 0.05 across those elements (see Table 1). The lowest category (Δβ = 0 to 0.05) corresponds to a large group of genes, which showed similar levels of methylation in seminoma and non-seminoma lines, while the highest category (Δβ = 0.9 to 0.95) corresponds to genes that showed the greatest increase in methylation in the seminoma line relative to non-seminoma lines. For each category, we calculated the expected frequency of genes showing more than twofold differential expression under the null hypothesis that lower gene expression does not correlate with methylation. For each category of methylation, we then applied a Pearson's Χ 2 -test to determine whether the observed frequency of differential expressed genes was greater than expected by chance. The aim of this approach was to provide us with an objective basis for the selection of genes where a correlation between differential methylation and differential expression was likely to be of biological significance. The resulting data (for CpG islands comparing the non-seminoma and seminoma cell lines) are shown in Table 1. These data were then used to generate graphs showing the percentage of genes with more than twofold differential expression for each Δβ-value category ( Figure 3, Supplementary Figure S1).
Comparing the YST and seminoma cell lines, there was a substantial (greater than two times the value expected at random) and significant correlation between lower expression in YST cells and a difference in methylation 40.65 Δβ-value ( Figure 3a). For this reason, all further analyses excluded genes that were differentially expressed but where differential methylation was o0.65, since for any given gene a correlation with a lower level of differential methylation is more likely to simply reflect a random association. Similar comparison of EC and teratoma cell lines to the seminoma cell line found that a significant and substantial association between methylation and silencing of expression was reached at an average Δβ-value of over 0.7 (Figures 3b,c). For all non-seminoma cell lines, islands showed a stronger correlation with reduced expression than methylation of shores or any other regions ( Figure 3, Supplementary Figure S1).
Based on the Δβ-value thresholds established above those genes expressed in seminoma but not in the non-seminoma cell lines, where reciprocal differential methylation of CpG island near the TSS was implicated in their silencing in non-seminoma cells, were identified (Figures 3 and 4a). Among genes differentially methylated at a TSS-associated CpG island between nonseminoma and seminoma cell lines (for which expression data were available) about half showed a correlating decreased expression in the various non-seminoma cell lines. It was notable that the genes identified in this way feature high among the most differentially expressed genes between seminoma and the various non-seminoma cell lines. Of the top 10 most differentially expressed genes (for which we have methylation data) in the non-seminomas, 4 are differentially methylated in EC cells, 3 in YSTs and 2 in teratoma cells. Thus it seems that differential methylation could play a substantial role in the differential gene expression between seminoma and non-seminoma cells.
Strong correlation between methylation of islands in gene bodies and gene silencing In previous studies, methylation in gene bodies has been associated with active genes. 19,20 However, using the same cutoff of 0.65 Δβ-value for differential methylation and a twofold difference in gene expression described above, we found that increased methylation in body CpG islands was more strongly associated with gene silencing than activation (Figure 4b). In the YST cells line, 45 out of 128 genes exhibiting increased methylation of body CpG islands compared with the seminoma cells showed a correlating twofold or greater decrease in expression and were only rarely associated with gene activation (Figure 4b). A similar relationship was seen in the EC and teratoma cells (Figure 4b). Although many of these genes also exhibited methylation of a CpG island in the region of the promoter, even for genes with a TSS-associated CpG island that was not differentially methylated, the body CpG island methylation was still more strongly associated with silent rather than active genes ( Figure 4b). In total, 34 genes for which body CpG island methylation correlated with silencing of expression either lacked a promoter-associated island or these other islands were not differentially methylated. These genes were therefore included in subsequent analysis (asterisk in Tables in Figure 4a).

Validation of genes silenced by methylation
The expression of a subset of the above genes was assessed using reverse transcription-PCR (RT-PCR). This confirmed the results of the Affymetrix expression arrays for all 17 genes analysed ( Figure 5 and see Supplementary Figure S2). The positions of all the CpGs analysed within each gene were also characterized with reference to the gene structure and all CpGs within the gene (some of which were not included on the methylation array). This verified that in these 17 genes, the differences in average methylation across CpGs annotated as islands did reflect multiple CpGs and that these differences were quite consistent across the whole or large parts of each of those islands.
A particular reason for using cell lines in this study was that it allowed us to confirm the role of DNA methylation in regulating expression of the genes identified. We therefore tested whether these genes would be re-expressed if demethylated. YST cells were treated for 2 days with 5-aza-2-deoxycytidine and then expression of five of the same 17 genes was reexamined by RT-PCR and RT-quantitative PCR (RT-qPCR). This showed that all five genes were activated by 5-aza-2-deoxycytidine ( Figure 6).

Aberrant gene methylation most likely represents gain of methylation in non-seminoma cells
To determine which methylation events in the cell lines were likely to be aberrant cancer-related events, methylation of the key genes identified above ( Figure 4) was compared with a series of control sets of Infinium HumanMethylome450 array methylation data from normal tissues (See Supplementary Dataset 5).
Almost all of the genes identified as methylated in nonseminoma cell lines but unmethylated in seminoma cells were also unmethylated in all control samples. Two striking exceptions to this were the genes DDX43 and TDRD12. DDX43 was heavily methylated in all samples other than seminoma while TDRD12 was methylated in all samples except seminoma and teratoma. Hence, the heavy methylation of these two genes in all control samples implies the difference between GCTs is due to unusual hypomethylation in seminoma (and teratoma for TDRD12). Table 1. Contingency tables of observed and expected number of genes differentially expressed (correlating and anti-correlating), and genes with no difference in expression for ranges of differential methylation between non-seminoma and seminoma cell lines  18 We reanalysed the data of van der Zwan et al. 18 using the same pipeline described above (see Supplementary Datasets 6 and 7; S3 and 4 Tables). Strikingly, 63% of the genes that were differentially methylated in the study of van der Zwan et al. 18 were also differentially methylated in our study; 59% of the differentially expressed genes were shared, and of the 7 genes that were both differentially methylated and differentially expressed (with the methylation correlating with gene silencing) 3 of these (HSPA2, PON3 and TACSTD2) were among the 43 genes in this category in our study. These correlations are highly significant (binomial test Po 1 × 10 − 7 ). These data show that the gene methylation and expression events are remarkably consistent between independent EC cell lines.
Identification of genes for which methylation is most likely to be of biological significance To determine if the differences in expression between seminoma and non-seminoma cell lines might be a more general feature of GCTs, we made use of the Affymetrix expression data of Korkola et al. 21 from a cohort of adult TGCTs and Palmer et al. 22 from a cohort of paediatric seminomas and YSTs from many different anatomical locations. Although, such banks of tumour samples would be expected to differ substantially from the gene expression seen in the clonal cell lines used here, we reasoned that genes identified by this comparison, where the role of these genes might be conserved in many GCT samples analysed, would be those events of greatest importance generally in GCT biology. Comparing seminoma to YST samples, the overlap in genes differentially expressed in these data sets was highly significant (Supplementary Figure S5). Among the genes represented in all three studies, any 2 studies shared~500-700 genes that were differentially expressed and of these 339 genes were differentially expressed in all 3datasets. Of the 72 genes shown to be both differentially methylated and differentially expressed between the YST and seminoma cell lines in this study (Figure 4), 23 were expressed at higher levels in primary seminomatous tumours than in YSTs (41.5-fold) in either the Palmer et al. 22 or Korkola et al. 21 data sets with 11 being differentially expressed in both studies (Table 2A). Such a high proportion of differentially expressed genes common to the cell lines and the primary tumours is highly significant (binomial test: P o10 − 30 ) strongly suggesting that the cell lines provide a good surrogate system for studying gene expression in this tumour type. It is also striking that several of these genes encode proteins that are involved in pluripotency and the inhibition of differentiation (see Discussion below). Comparison of the differentially expressed genes between our data and that of Korkola, et al. 21 for EC and teratoma cell lines also showed much weaker overlap of the genes differentially expressed, although this was still significant for the EC cell data. However, it is noteworthy that three of the five genes methylated and differentially expressed in all three non-seminoma cell lines (DDX43, PON3 and RBMXL2) were also differentially expressed in all three tumour types in the data of Korkola et al. 21 (Tables 2B,C).

DISCUSSION
This study describes the first comprehensive analysis of global methylation and its relationship to gene expression in cell lines of the major subtypes of GCTs. Of the 7,244 genes that produced reliable signals in the expression array dataset,~2% (147) of the genes showed correlating differential methylation and expression in our data, of which we were able to identify 23 genes with similar differential expression in cohorts of primary tumour samples.
Global differences in methylation between GCT cell lines This study confirms the suggestion from analysis of fewer genes that non-seminoma cells exhibit very high levels of gene methylation as compared with seminoma cells, which exhibit a strikingly low level of gene methylation. 14,15 As found in other classes of cancer, the hypermethylation we see in the YST as compared with seminoma cells is restricted to the CpG islands of a relatively small proportion of genes, a so-called CpG island methylator phenotype. Analysis of EC and teratoma cells revealed a much less localised difference in methylation. In these cells, the extent to which methylation of CpGs was higher than in seminoma cells was almost uniform across CpGs in all regions. This hypermethylation does not, therefore, represent a CpG island methylator phenotype. This implies a much less biologically regulated process than the targeted methylation of CpG islands seen in YST cells. However, despite the fact that the methylation did not appear to be targeted to CpG islands in EC/teratoma cells, there was still a strong inverse correlation between CpG island methylation and the level of gene expression. Hence, regardless of the mechanism that results in CpG island hypermethylation, such methylation shows a strong correlation with gene silencing. CpG island methylation shows the strongest correlation with reduced gene expression Methylation of CpGs islands was more strongly correlated with low gene expression than was methylation in other regions. With respect to gene structure itself, most methylated islands were in the region of gene promoters. However, some were in the gene body where they similarly correlated with gene silencing. Methylation of CpG islands in gene bodies has been reported to be associated with active genes. 23,24 Therefore, the situation in GCT cell lines reveals a distinctly different relationship. Recent studies have implicated a variety of differentially methylated regions as most influential in the regulation of gene expression. Kulis et al. 25 found that in differentiated B-cells and leukaemic cells differences in methylation in the gene bodies and promoter-associated CpGs correlated with differences in gene expression, but the correlation was stronger for gene bodies. 25 In their study, as has been found before, increased methylation in gene bodies correlated more often with increased expression rather than decreased expression, although numerous genes did exhibit the latter. Correlation between differential expression of over twofold and different degrees of differential CpG island methylation between seminoma and non-seminomas. Histograms showing the observed (blue) and expected (red) percentage correlation for each Δβ-value range for TCAM2 (seminoma) versus GCT44 (YST) (a), GCT27 (EC) (b) and NT2D1 (teratoma) (c). A Δβ-value of 40.65 or 0.7 consistently correlated significantly with a difference in expression of greater than twofold. While some lower Δβ-value categories also show significant association, statistical significance was reached at much smaller percentage levels of association due to a larger number of genes in those differential methylation ranges (see Figure 2 for details) (for example, comparing GCT44 to TCAM2, 152 genes exhibit Δβ-values between 0.2-0.25 with only 26% show a correlation with decreased expression (over a random expected association of 18%, but this achieves a P value o 0.01).
On the other hand, due to the small numbers of genes exhibiting some of the highest differential methylation values, these were not significantly associated with differences in expression, often despite a large percentage levels of association (for example, comparing GCT44 to TCAM2, seven genes exhibit a Δβ-values between 0.85 and 0.9 of which three (43%) show a correlation with decreased expression, but this does not achieve a significant P value). Significance of the Χ 2 -tests of association are shown (*Po 0.05, **Po0.01, ***Po0.001) and the total number of genes in each category is displayed above the bar.
Irizarry et al. 26 found that differences in methylation of CpG shores correlated best with differences in gene expression in both tissue-specific and cancer-specific differentially methylated regions. 26 More recently a whole genome bisulfite sequencing study in medulloblastomas identified regions~2 kb downstream of the transcription start site as showing the strongest correlation with gene expression. 27 Hence, it seems that the role of methylation of different gene regions in controlling gene expression varies in different tissue types and different cancers. It will be interesting in the future to find whether the situation seen in GCT cell lines is an unusual feature of their germ cell lineage, or a more general phenomenon in a range of cancers.

DNA methylation and cancer progression in GCTs
Our finding that the non-seminoma cell lines exhibit very high levels of gene methylation as compared with seminoma cells is consistent with earlier, less comprehensive studies, which showed that non-seminomas exhibit much higher levels of gene methylation than seminomas. 14,15 Figure 4. Genes differentially methylated and differentially expressed between seminoma and non-seminoma cell lines. (a) Tables show genes expressed at higher levels in seminoma than in non-seminoma cell lines (⩾2-fold difference in expression from microarray data), that are also significantly more methylated (Δβ ⩾ 0.7 for EC and teratoma, ⩾ 0.65 for YST) in the non-seminoma lines. Differential methylation was associated with CpG islands near to a TSS, except for those genes marked with an asterisk where methylation was across CpG islands in the body of the gene. These same genes are shown as numbers in the various overlapping cell types (colours match tables) in the central Venn diagram. (b) Graph showing numbers of genes showing differential expression between seminoma and various non-seminoma cell lines grouped according to exhibiting differential methylation in body and TSS regions (all genes included exhibit greater methylation (Δβ ⩾ 0.65) in a non-seminoma versus the seminoma cell line across the island. Numbers above each bar indicate the number of genes that were differentially methylated but showed less than a twofold difference in expression between the cell lines.
In previous studies, using immunostaining, DNA methylation was undetectable in ICGNU and most seminomas while strong methylation was seen in non-seminomas. 16,28 Even in mixed tumours, seminoma elements were unmethylated and nonseminoma components methylated. 16 This led Netto et al. 16 to propose that ICGNUs are derived from PGCs at an embryonic stage when methylation has been erased, and seminomas subsequently arise from ICGNU. Given that the seminoma cell line, TCAM2, can give rise to tumours closely resembling EC in an experimental in vivo context 29 and tumours containing a mixture of seminomatous and non-seminomatous components are not uncommon, it seems possible that hypermethylation could represent a progression event involving de novo methylation of such seminomatous components. It therefore seems plausible that pure non-seminomas could also arise by de novo methylation from a non-methylated ICGNU or seminoma precursor lesion that existed prior to diagnosis. This would be consistent with the well-documented phenomenon of non-seminomas arising as recurrences of seminomas. 13 Cell lines as a useful system for methylation studies As shown here, the correlation between substantial methylation of CpG islands and gene silencing is well below 100%. It is therefore critical that DNA methylation is not simply assumed to be the cause of gene silencing. In this study, in addition to determining the level of differential methylation that best correlated with gene silencing, we were able to utilise these cell lines to verify the role of methylation using the DNA-demethylating agent 5-aza-2deoxycytidine. To determine whether silencing of these genes was a more general phenomenon in primary tumours, we showed that there was a highly significant overlap in the genes silenced by methylation in the cell lines in this study and those genes that differed in their expression in a cohort of paediatric GCT samples. Although this does not determine if differences in expression in the tumour samples are due to their methylation, it is the difference in expression that is the biologically important outcome of the methylation. Analysis of primary tumour samples using lower coverage analyses has already shown the same general differences in the level of methylation between seminoma and non-seminoma tumours that we found in the corresponding cell lines, 14,15 supporting the hypothesis that the methylation seen in the cell lines is a reflection of that seen in primary tumours.
Concerns have been raised over using cell lines for methylation studies. 30 Smiraglia et al. 30 concluded that cancer cell lines develop aberrant methylation profiles as an artifact of the culture process. 30 Although this possibility cannot be ruled out, this is not, in our view, the only or indeed the most likely interpretation of the data in their study. Smiraglia et al. 30 showed, using restriction landmark genomic scanning, that the high level of methylation seen in some cancer cell lines was not reflected in primary tumours of the same type (although not the actual tumour samples from which those cell lines were derived). Their conclusion is based on the assumption that the average methylation state of all the cells in most of a given class of tumour should be similar to that found in a cell line derived from that tumour. However, an alternative explanation for their observations, for which there is clear precedent, is that the cell line methylation status reflects the minority population of tumour cells from which the cell lines originate. These cells are a proliferating subset of cells within the tumours, which could represent the cancer stem cell/tumour initiating cell. Such a difference in DNA methylation clearly exists in normal tissues of a single individual in vivo that, like tumours, are genetically clonal. This is true of different cell populations in the same tissue 31,32 and differentiated cells versus proliferating stem cells in whole organisms. 33 In such examples the gene methylation profiles of genetically identical cells can be dramatically different in vivo, just as the profile of a cancer stem cell or cancer-derived cell line might differ from less primitive proliferating cells or their more differentiated progeny in the primary tumour.
Indeed, using a more sensitive analysis of cell lines and the pancreatic cancers from which they were derived, Ueki et al. 34 (using methylation-specific PCR) concluded that 'most of the DNA methylation of tumour suppressor genes observed in cancer cell lines is present in the primary carcinoma from which they were derived. 34 Given our observations that the seminoma cell line exhibits no apparent increase in methylation, that the hypermethylation we identified in one EC cell line was very similar to that seen in a different cell line in another study van der Zwan et al. 18 and that the overall pattern of methylation we found is consistent with earlier studies of primary tumour samples, it seems likely that the majority of methylation events we see reflect genuine differences between these tumour types.
Several pluripotency regulators are silenced in non-seminoma cell lines Of the 108 genes silenced by methylation in the YST cell line, 21 were also similarly differentially expressed in primary tumour samples ( Table 2). Among these 21 genes several have previously been associated with germ cell progenitors and/or pluripotency.
KLF4 is one of the four 'Yamanaka' factors that together can endow somatic cells with the potential to adopt a pluripotent embryonic stem cell-like state 35 and is highly expressed in the pluripotent progenitors of germ cells, the PGCs. 36,37 Its suppression by methylation in non-seminomas might therefore play a key role in their more differentiated and more aggressive state. It is also noteworthy that Klf4 expression is affected by the protein encoded by another silenced gene, Prdm14 38 (discussed below) and that silencing Tet1 in mice (which results in hypermethylation of DNA) downregulates the expression of Klf4 and Prdm14. 39 Therefore, silencing of these two genes may be intrinsically linked.   The 'fold change' column is the fold change in expression seen between the seminoma and non-seminoma cell lines in our study. The 'Korkola fold change' and 'Palmer fold change' are the difference in expression seen in primary tumours in the studies by Korkola et al. 21 and Palmer et al. 22 Only genes for which data were available in all data sets are shown. Genes where expression in seminoma cells is at least 1.5-fold greater than in the non-seminoma cells are in bold.

Genome-wide methylation analysis of GCT cell lines DAM Noor et al
Among the other genes identified, several have particular roles during male gamete production. DDX43 and TDRD12 were specifically hypomethylated in seminoma cells. DDX43 encodes a 'cancer testis antigen' (also called HAGE), which is an RNA-dependent helicase with expression largely restricted to testis and a variety of cancer types. 40 TDRD12 encodes a tudor domain-containing protein (also capable of functioning as an RNA-dependent helicase) found almost exclusively in testes. 41 It is important in the biogenesis of piRNAs, which are also testis specific. 41 Hence the heavy methylation of these genes in all control samples is consistent with their very restricted expression patterns in normal tissues. This implies that the difference in expression of these two genes between GCT subtypes is due to hypomethylation in seminoma (and teratoma for TDRD12). MNS1 and RBMXL2 on the other hand were hypermethylated in YST cells. MNS1 (meiosis-specific nuclear structural 1) encodes a coiled-coil protein of unknown function, which is essential for spermiogenesis. 42 RBMXL2 (also called hnRNP G-T) codes for an hnRNP expressed almost exclusively in testes and GCTs believed to function during meiotic prophase or to act as a germ cellspecific splicing regulator. 43 Hence, the silencing of these genes in non-seminomas may also play a role in their differentiation towards somatic cell lineages.

PRDM14
PRDM14, which is differentially methylated and expressed between the seminoma and YST cell lines (methylated and silenced in YST), merits special attention. It encodes a multiple zinc finger transcription factor almost exclusively expressed in PGCs, GCTs and blastocyst stage embryos (NCBI EST profile http://www. ncbi.nlm.nih.gov/IniGene/ESTProfileViewer.cgi?uglist = Hs.287532), which can function as both an activator and repressor of transcription and may possess DNA methyltransferase activity. Alongside PRDM1 (BLIMP1), PRDM14 activates TFAP2c (which encodes AP2γ) and together these three 'key regulators of PGC specification' [44][45][46] repress somatic gene expression, activate PGC genes and initiate the demethylation of the genome. Together these factors can convert ES cells to PGCs. [44][45][46] PRDM14 actively promotes DNA demethylation by directly repressing the DNA methytransferases, DNMT3a, DNMT3b, DNMT3l and UHRF1 (a DNMT1 cofactor) 38,47,48 and directly activating DNA demethylation by increasing the activity of the TET enzymes. 48 In mice, Prdm14 can also enhance reprogramming of somatic cells to iPS cells by Sox2, Oct4 and Klf4. 49 It regulates Oct4 49 and can even induce pluripotency when overexpressed alongside Blimp1 and Prmt5. 50 Increased copy number of the PRDM14 gene has been reported in GCTs 51 and a recent GWAS study identified PRDM14 as a susceptibility locus in testicular cancer. 52 Studies in normal ES cells (where PRDM14 is also expressed but at much lower levels than in PGCs) provide evidence for a role in the progression of GCTs towards the YST phenotype. In one study, knockdown of PRDM14 led to differentiation towards extraembryonic endoderm, 47 a tissue type similar to that seen in YSTs. However, this remains somewhat contentious since, in a similar study by others, knockdown of PRDM14 caused cells to differentiate into embryonic cell types. 53 Together, these data suggest that PRDM14 could play a central role in GCT progression in which it is initially expressed in seminomas hence helping to retain their germ cell-like phenotype. It is then silenced by methylation triggering more widespread methylation of the genome and promoting the cells' differentiation into the extraembryonic cell types that typify YSTs.

Conclusions
This study revealed a very different methylator phenotype in non-seminoma cell lines as compared with other types of cancer.
Several new potentially biologically important genes were identified, most particularly a group of genes associated with the germ cell state and/or pluripotency-PRDM14, TDRD12, DDX43, MNS1, RBMXL2 and Klf4. Silencing of these factors that normally suppress somatic differentiation could be a mandatory step in the progression from seminoma to non-seminoma. Both the silenced genes and gene methylation generally represent new potential therapeutic targets for the more chemoresistant GCTs.

Methylation analysis
Bisulfite conversion of the DNA (1 μg) was performed using EZ DNA methylation kit (Zymo Research Corporation, Irvine, CA, USA), with hybridisation to the Infinium HumanMethylome450 arrays (Illumina, San Diego, CA, USA) and scanning performed by the Queen Mary University of London Genome Centre. Quality control of the dataset was performed by analysing the bisulfite conversion using Genome Studio software (Illumina). The ratio of unmethylated probe to methylated probe was calculated with samples all showing good conversion rations o0.2. Low signal intensity was also controlled for by removing CpG probes with an averaged detection P value of 40.05 across samples. In addition, averaged signal intensity values for each CpG, for both the red and green signals, across the samples were log2 transformed with values o 11.1 removed. For the analysis shown here, probes that bound multiple sites (chr-MULTI) were excluded.
CpGs were annotated by the chip manufacturers according their position relative to CpG islands: 2,000 bp either side of an island as north 'shore' (5′ relative to the associated gene) or south shore (3' relative to the associated gene). North and south 'shelves' 2,000 bp flanking the shores, and 'other' or 'open sea' for CpGs more distant from an island.' Quantitative measurements of DNA methylation across all known genes and CpGs were represented as β-values (0oβ o1, 0 represents unmethylated sites, and 1 indicates the site is fully methylated). Δβ-values of differential methylation were calculated as the difference between the β-values of each cell line relative to the seminoma cell line. The Excel tool PivotTable (Microsoft office 2010) was used to assign average methylation and differential methylation measurements to each gene according to various combinations of locations with respect to CpG islands and gene regions. Probes labelled by multiple gene names were excluded from the analysis.
Expression array analysis RNA extraction was performed using the RNeasy extraction kit (Qiagen, Manchester, UK) according to the manufacturer's protocol. RNA was eluted with RNAse-free water (Qiagen). The quality of TGCT RNA samples was determined using a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA) as suggested by the manufacturer. Measurement was calculated using 2100 expert software version B.02.07 (Agilent Technologies) and displayed as RNA concentration, the ribosomal ratio and the RNA integrity number (RIN). For the purpose of selecting samples for Affymetrix Gene Expression array, samples with RIN49.0 were selected as this implies a high-quality RNA sample. Arrays were performed at the Nottingham Arabidopsis Stock Centre, University of Nottingham Sutton Bonington Campus using Affymetrix GeneChip Human Genome U133 Plus 2.0 arrays (the University of Nottingham).
Data were first preprocessed using the statistical software, R with 'Affy' package provided by www.bioconductor.org. Data were normalised using the RMA method 60 and filtered such that probes which gave expression outputs below control background probes (recorded in the GeneChip) for all four cell lines were excluded. Fold changes in expression between each probe of each cell lines relative to Seminoma were calculated, and annotation packages were used to assign gene information to each probe set. The data were exported as a.txt file to be read and analysed in Excel. The Excel tool PivotTable was used to assign average expression intensity values to each gene.
CEL files from the studies by Palmer et al. 22 and van der Zwan et al. 18 were processed using the statistical package, R. Data were normalised using the RMA method, and filtered to exclude outputs that fell below background levels. These data were then processed in Excel and a onetailed (right tail) Welch's t-test was performed for each gene comparing Seminoma samples with non-seminoma samples (P value of o0.05 indicated significantly higher expression in seminoma samples versus YST samples).
For RT-qPCR, a master mix for each primer pair was prepared by mixing 2X Brilliant SYBR Green QPCR Master Mix (Agilent), 100 nM of forward and reverse primer, and water up to 25 ml/well. About 25 μl of master mix was transferred into each designated well of a 96-well plate before adding 50 ng complementary DNA into each well. Samples were then subjected to PCR using Bio-Rad C1000 Thermal Cycler machine (Bio Rad, Heatfords, UK). The PCR cycling condition are as follows: 95°C for 3 min followed by 35 cycles of 95°C for 1 min, 58°C for 1 min and 72°C for 30 s then 72°C for 5 min. All data were then analysed using Bio-Rad CFX Manager 3.0 software (Bio Rad).
The threshold cycle (Ct) values of each sample was defined by standard threshold and the relative comparison with the housekeeping gene, ACTB, was calculated using the Pfaffl equation. 61 No ethical approval was required for this study.

DATA DEPOSIT
The array data presented in this paper have been deposited in the GEO database.