Methylome sequencing in triple-negative breast cancer reveals distinct methylation clusters with prognostic value

Epigenetic alterations in the cancer methylome are common in breast cancer and provide novel options for tumour stratification. Here, we perform whole-genome methylation capture sequencing on small amounts of DNA isolated from formalin-fixed, paraffin-embedded tissue from triple-negative breast cancer (TNBC) and matched normal samples. We identify differentially methylated regions (DMRs) enriched with promoters associated with transcription factor binding sites and DNA hypersensitive sites. Importantly, we stratify TNBCs into three distinct methylation clusters associated with better or worse prognosis and identify 17 DMRs that show a strong association with overall survival, including DMRs located in the Wilms tumour 1 (WT1) gene, bi-directional-promoter and antisense WT1-AS. Our data reveal that coordinated hypermethylation can occur in oestrogen receptor-negative disease, and that characterizing the epigenetic framework provides a potential signature to stratify TNBCs. Together, our findings demonstrate the feasibility of profiling the cancer methylome with limited archival tissue to identify regulatory regions associated with cancer. Triple-negative breast cancers (TNBCs) are a heterogeneous group of cancers with varying prognoses. Here, the authors carry out whole-genome methylation capture sequencing from TNBC samples and matched normal samples, and identify differentially methylated regions that define a potentially novel TNBC signature.

Epigenetic alterations in the cancer methylome are common in breast cancer and provide novel options for tumour stratification. Here, we perform whole-genome methylation capture sequencing on small amounts of DNA isolated from formalin-fixed, paraffin-embedded tissue from triple-negative breast cancer (TNBC) and matched normal samples. We identify differentially methylated regions (DMRs) enriched with promoters associated with transcription factor binding sites and DNA hypersensitive sites. Importantly, we stratify TNBCs into three distinct methylation clusters associated with better or worse prognosis and identify 17 DMRs that show a strong association with overall survival, including DMRs located in the Wilms tumour 1 (WT1) gene, bi-directional-promoter and antisense WT1-AS. Our data reveal that coordinated hypermethylation can occur in oestrogen receptor-negative disease, and that characterizing the epigenetic framework provides a potential signature to stratify TNBCs. Together, our findings demonstrate the feasibility of profiling the cancer methylome with limited archival tissue to identify regulatory regions associated with cancer.
T riple-negative breast cancers (TNBCs) comprise a heterogeneous group of cancers with varying prognoses, presenting a challenge for effective clinical management. TNBC is clinically defined by the absence of oestrogen receptor (ER) and progesterone receptor expression, and neither overexpression nor amplification of human epidermal growth factor receptor 2 (HER2) 1,2 . TNBC represents B15-20% of all newly diagnosed breast cancer cases and is generally associated with high risk of disease recurrence and shorter overall survival compared with non-TNBC 3 . Broadly, TNBC patients can be categorized into two distinct groups; those that succumb to their disease within three to five years regardless of treatment and those that remain disease-free to the extent that their overall survival exceeds that of non-TNBC patients (that is, approximately 48 to 10 years post diagnosis) 4,5 . Currently, methods by which TNBC patients are stratified into high-and low-risk subgroups remain limited to staging by clinicopathological factors such as tumour size, level of invasiveness and lymph node infiltration. However, unlike other breast cancer subtypes, TNBC outcome is less closely related to stage 6 . Thus, there is a clear need to identify a robust method by which TNBC patients can be stratified by prognosis, to enable more informed disease management.
Current efforts to stratify early breast cancer prognosis have primarily focused on multi-gene expression signatures and all have received varying degrees of acceptance 7 . In addition to multi-gene expression assays, DNA methylation signatures are being assessed as potential molecular biomarkers of cancer 8 . A number of studies have documented aberrant methylation events in breast carcinogenesis and identified specific DNA methylation biomarkers that have significant diagnostic and prognostic potential [9][10][11][12] . Several studies have also identified DNA methylation signatures that can distinguish between breast cancer subtypes [13][14][15][16] , and others that may be predictive of treatment response [17][18][19] .
Despite growing interest in the prognostic significance of DNA methylation in breast cancer, there have been no studies specifically investigating the DNA methylation profile of human TNBC and its association with disease outcome. Here we carry out genome-wide DNA methylation profiling of formalin-fixed paraffin-embedded (FFPE) triple-negative clinical DNA samples, using affinity capture of methylated DNA with recombinant methyl-CpG binding domain of MBD2 protein, followed by next generation sequencing (MBDCap-Seq) 20,21 . This high-resolution technique allows for genome-wide methylation analysis of CpG rich DNA 22,23 . Using MBDCap-Seq, we identify regional methylation profiles specific to TNBC, which we validate using methylation data extracted from TCGA breast cancer cohort 13 . Importantly, we also report the first potential prognostic methylation signature of survival, specific for TNBC that now warrants further study in larger cohorts.

Results
Genome coverage of MBDCap-Seq. To delineate regions assayable with MBDCap-Seq, we first profiled fully methylated (CpG methyltransferase SssI-treated blood sample) DNA. Computational analysis of SssI MBDCap-Seq revealed that MBDCap-Seq can robustly assess the methylation status of 230,655 regions spanning a total of 116 Mbp, comprising 5,012,633 CpG dinucleotides, or B18% of the total number of CpG sites in the human genome (see Methods; Supplementary  Fig. 1a). The assayed CpG sites span 91% of all CpG islands; 84% CpG island shores; 72% RefSeq promoters; 38% introns and 31% exons. We next compared coverage of MBDCap-Seq with the Illumina HumanMethylation450K (HM450K) array ( Supplementary Fig. 1b) and found that MBDCap-Seq interrogates additional 4,740,327 CpG sites as compared with the high-density HM450K array.
A major advantage of the MBDCap-Seq method is the ability to interrogate regional blocks of hypermethylation, that is, methylation spanning consecutive CpG sites, which commonly occurs in cancer. We compared regional MBDCap-Seq coverage to coverage of HM450K arrays ( Supplementary Fig. 1a) and found that while MBDCap-Seq and HM450K arrays have similar regional coverage of CpG islands (91 versus 81%) and RefSeq promoters (71 versus 83%), MBDCap-Seq regional coverage of shores (77 versus 28%), enhancers (12 versus 2%) and insulators (11 versus 1%) is much greater, highlighting the potential advantage of MBDCap-Seq in screening novel functional regions of the cancer methylome.
To determine if MBDCap-Seq can also provide accurate methylation analysis from FFPET DNA, we compared DNA methylation profiles from DNA isolated from fresh frozen (FF) and FFPET of matching tumour and lymph node samples. We show that MBDCap-Seq from FFPET provides equivalent methylation to FF DNA (Pearson Correlation Coefficient of 0.95 and 0.86, respectively) ( Supplementary Fig. 2a) and that MBDCap-Seq and HM450K array performed on the same FFPET tumour and lymph node DNA show high concordance (0.79 and 0.77, respectively) (Supplementary Fig. 2b-d). We also show that there are regions uniquely covered by MBDCap-Seq, for example, at enhancers and insulator elements ( Supplementary  Fig. 2e,f).

Differentially methylated regions in TNBCs.
To identify differentially methylated regions (DMRs) in TNBCs, we first profiled FFPET DNA using MBDCap-Seq from a discovery cohort of 19 Grade 3 TNBC tumours and six matched normal samples (Supplementary Table 1) and analysed the data with a novel computational pipeline for comparative statistical analysis of MBDCap-Seq samples (see Methods; Supplementary Fig. 3). We identified 822 hypermethylated and 43 hypomethylated statistically significant DMRs (FDR o0.05), harbouring 64,005 and 623 CpG sites, respectively, compared with matched normal samples (see Fig. 1a,b; Supplementary Data 1) and validated sample-specific differential methylation using Sequenom DNA methylation analysis ( Supplementary Fig. 4). Next, we determined the genomic location of the DMRs and found that CpG islands, CpG island shores and promoters are significantly overrepresented (hypergeometric test; P value o0.0001) in the 822 hypermethylated regions and underrepresented in the 43 regions of hypomethylation ( Fig. 1c; see Methods). Notably, ChromHMMannotated HMEC promoters 24 and polycomb repressed regions were also significantly enriched (hypergeometric test; P value o o0.001) for gain of methylation in the breast cancer samples. Finally, we validated example DMRs in an independent cohort of 31 TNBCs and 15 normal breast samples and a panel of cell lines (Supplementary Table 2). We performed Sequenom methylation analysis on five of the 822 hypermethylated regions spanning the CpG island promoters of NPY, FERD3L, HMX2, SATB2 and C9orf125 ( Supplementary Fig. 5). The levels of methylation detected in the normal samples were uniformly low, whereas the five DMRs showed striking hypermethylation in the TNBCs (Fig. 1d) and 24 breast cancer cell lines (Fig. 1e).
Functional characterization of hypermethylated genes. To predict the potential functional significance of the 822 DMRs identified in the TNBC, we first determined which regions overlapped with promoters and genes and found that the DMRs were associated with 513 RefSeq promoters, which corresponded to 308 genes (Supplementary Data 2). We used the DAVID functional annotation tool 25 to annotate this set of genes. Visualization of statistically significantly (FDR o0.05) overrepresented gene sets revealed two largely non-overlapping groups of genes 26   and are depleted in upregulated genes (28 out of 245 genes are upregulated; hypergeometric test; FC 0.53; P value o o0.001) (Fig. 1f).
Differentially methylated regions specific to TNBCs. We next asked if any of the 822 DMRs were also found in ER À or ER þ breast cancer. We used TCGA breast cancer methylation cohort, which comprises HM450K data for 354 ER þ and 105 ER À breast tumours (73 of which are TNBCs) and 83 normal breast samples (see Methods for the analysis of TCGA methylation data). Of the 822 DMRs regions identified in the MBDCap-seq methylation discovery set, 770 DMRs are interrogated by a total of 4,987 HM450K probes from the TCGA data set. We found that while the majority of these probes are not methylated in breast normal tissue, they were hypermethylated to various degrees in both ER þ and ER À breast cancers (Fig. 2a). Both ER þ and ER À subtypes also contained samples with minimal methylation across all probes, as well as those that displayed extensive methylation more representative of a CpG island methylator phenotype (CIMP) 29 .
Next, we asked whether any of the DMRs were TNBC specific. Out of 4,987 HM450K probes, we found that 5% (282/4,987) were significantly hypermethylated in TNBCs (t-test; mean differential (diff) methylation 410%; P value o0.05) compared with the ER þ tumours and the rest of the ER À ve tumours. Using methylation values of 282 TNBC-specific probes, we were able to classify tumour samples in the TCGA HM450K cohort into TNBCs and non-TNBCs with sensitivity of 0.72 sensitivity, specificity of 0.94 and AUC 0f 0.90 (Fig. 2b). From the 282 TNBC-specific probes, we identified 36 TNBC-specific regions (harbouring at least three or more 450K TNBC-specific probes) that primarily overlap promoters and/or gene bodies (Supplementary Table 4; Supplementary Fig. 7). The regions predominantly overlap genes-encoding zinc fingers and transcription factors and intergenic regions that are commonly marked by polycomb in HMECs. An example of two such TNBCspecific regions are located in the promoters of genes-encoding zinc finger proteins ZNF154 and ZNF671 on chromosome 19 (Fig. 2c). Both promoters have low methylation levels in normal breast and increased levels of methylation in TNBC samples as compared with ER þ cancer. The distribution of expression values mirrors the methylation status, with normal samples showing the highest levels of expression and TNBC tumours showing the lowest levels of expression (Fig. 2d), suggesting silencing by methylation of both ZNF154 and ZNF671 in TNBC tumours.
DNA methylation profile can stratify TNBCS. To identify DMRs that potentially stratify TNBCs, we used unsupervised cluster analysis on methylation of the 4,987 HM450K probes and identified three distinct groups of TNBC tumours from the TCGA data sets ( Fig. 3a; see Methods). Survival analysis revealed that the largely hypomethylated cluster (blue cluster) was associated with better prognosis as compared with the other two more highly methylated clusters (orange and red clusters) (Fig. 3b). In particular, the medium methylated cluster (orange cluster) comprises samples with the worst prognosis (Cox proportional hazards model; hazard ratio ¼ 8.64; P value ¼ 0.005) as compared with the good prognosis TNBC cluster (blue cluster). Moreover, there was no association between the induced clusters and survival for ER þ or non TNBC samples ( Supplementary Fig. 8).
Next, we determined to what extent regional methylation stratify TNBCs into good and bad prognosis groups. Survival analysis identified 190 probes that were associated with survival in TCGA TNBC samples (Cox proportional hazards model; P value o0.05) in both univariate and multivariate analyses (see Methods). We observed regional association (at least three concordantly located survival probes) for 17 regions; 14 genomic regions with poor survival and three genomic regions for good survival (Table 1 (Table 1). Interestingly, with the exception of the region encoded by chr10: 102,409,068-102,409,766, all prognostic regions overlap DNase1 hypersensitive sites (ENCODE) and are marked with a polycomb signature in HMEC cells and many contain numerous conserved transcription factor binding sites (TRANSFAC 30 ) (Table 1). Furthermore, we show that the average level of methylation of CpG sites in the 17 potential prognostic regions is higher in the two poor survival groups and is lower in the normal and low-risk groups ( Supplementary Fig. 10).
A striking example of regional hypermethylation across consecutive CpG probes that shows statistical significance as a prognostic marker of survival are the DMRs spanning the bi-directional promoter and gene bodies of WT1 gene and its antisense counterpart, WT1-AS (Fig. 3f). Wilms tumour protein (WT1) is a zinc finger transcription factor overexpressed in several tumour types including breast 31 . We observe an association between high level of methylation in chr11-11623 and chr11-1210, regions spanning the gene bodies of WT1 and WT1-AS, respectively, and poor survival in TCGA TNBC cohort (Fig. 3f). Moreover, increased levels of methylation in these regions are also associated with increased expression of WT1 (chr11-11623) and WT1-AS (chr11-1210) in TNBC patients ( Supplementary Fig. 11). Conversely, we observe that TNBC patients with high methylation in chr11-4047, a region spanning bi-directional promoter of WT1 and WT1-AS, survive longer than TNBC patients with low methylation in this region.

Discussion
The prognostic stratification of TNBC patients remains one of the most significant challenges in breast cancer research. While current efforts have primarily focused on the development of multi-gene expression classifiers to inform patient outcome, here we demonstrate the significant prognostic potential of DNA methylation biomarkers for the stratification of TNBCs. We performed genome-wide DNA methylation profiling on TNBC, identified novel regions of differential methylation, and validated regions specific for TNBC using TCGA methylation data as an independent cohort. Strikingly, unsupervised cluster analysis of DMRs stratified TNBC patients into populations of high, medium ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms6899 or low-risk disease outcome. In addition, using both univariate and multivariate Cox proportional hazard models, we identified 17 DMRs significantly associated with TNBC patient survival (P-value o0.05). Critically, our classifiers paralleled the biologically relevant time-dependent pattern of patient outcome, whereby TNBC patients are most vulnerable to disease-associated death within the first five years following diagnosis, highlighting their potential use as a valuable prognostic application.

ARTICLE
The DNA methylation aberrations we identified in the TNBC samples follow specific patterns common to many cancer types 32 . For instance, hypermethylation events were localized to CpG islands and shores, while hypomethylation occurred globally across intragenic regions 32 . We observed a strong co-localization of the hypermethylated regions with H3K27me3 marked (polycomb repressed) regions in HMEC cells, supporting the finding that many polycomb-regulated genes are predisposed to aberrant methylation in cancer 33 . We identified 308 genes affected by promoter hypermethylation and functional analysis revealed significant enrichment of genes and transcription factors involved in development and differentiation, as well as DNA binding, homeobox proteins and transcriptional regulation. Hypermethylation of homeobox genes has been previously reported in breast cancer and associated with disease progression and poor patient prognosis 15,16,34,35 . Genes encoding glycoproteins were also enriched in the functional analysis. A significant function of glycoproteins is that of directing immune response 36 . This is particularly poignant since several gene expression modules associated with immune response have been used to predict TNBC patient outcome [37][38][39][40][41] . Many of the aberrant cancer promoter hypermethylation events affect genes already silenced in the tissue of origin and therefore considered to be passenger events that do not actively contribute to cancer initiation or progression 42 . To identify potential driver methylation events, we highlighted genes that were both downregulated in TNBC tumours and recurrently mutated in breast cancer. Twelve methylated genes were identified as both mutated and downregulated, including ROBO3 and SEMA5A that are a part of the axon guidance pathway, recently implicated in tumour initiation and progression 27 . In total, promoter hypermethylation affects seven members of the axon guidance pathway. Although the mechanism by which axon guidance drives cancer progression is not completely understood, our data support a potential causal role for DNA methylation for many of these family members in TNBCs.
Using an independent TNBC cohort from the TCGA data, we validated 36 TNBC DMRs comprising 20 genes. Strikingly, four of the 20 genes encoded zinc finger proteins (ZNFs). Individual ZNFs and even some clusters of ZNF genes have been found hypermethylated and silenced in several tumour types [43][44][45][46] . In addition, methylation of other ZNF genes have potential prognostic value in prostate and bladder cancer 47,48 . Although the mechanisms by which aberrant ZNF expression facilitates oncogenesis are not completely understood, ZNFs are included in two independently derived, TNBC specific, multi-gene expression classifiers (TN45 and Buck 14) 38,39 .
Recent studies have identified non-TNBC as more heavily methylated compared with TNBC 16 . In our study, we found that a distinct population of both ER þ and ER À tumours are associated with extensive methylation across the DMRs, more representative of a CpG island methylator phenotype (CIMP) 29 . Interestingly, a previous report describes the breast-CIMP (B-CIMP) group comprising solely ER þ tumours 16 ; however, our results show that coordinated hypermethylation can also occur in ER À disease. We also identified three distinct methylation clusters of TNBC tumours based on our DMRs. The largely hypomethylated profile was associated with better survival within the first five years post diagnosis compared with the more heavily methylated subtypes. Interestingly, the medium methylated cluster was associated with the worst survival. Proof of concept that methylation can be used to stratify breast cancer subtypes was recently demonstrated by TCGA, where DNA methylation data were used to classify breast cancer into five distinct subtypes; however, each of the five methylation groups were represented by multiple tumour subtypes and the relationship between methylation and prognosis was not explored 13 . Here, we also identified 17 individual DMRs capable of stratifying TNBC patients into good and poor prognosis groups. Notably, these regions predominately overlap with DNAaseI hypersensitive regions and contain conserved transcription factor binding sites highlighting their potentially significant role in transcriptional regulation. Of the genes listed, many, including WT1, WT1-AS, DMRTA1 and HOXB13, have been previously identified as hypermethylated in numerous cancer subtypes including breast cancer [49][50][51][52] , although associations with patient prognosis were not defined in these studies.
Finally, three 'survival' DMRs span the bi-directional promoter and gene bodies of WT1 gene and its antisense counter-part WT1-AS. WT1 is an extensively studied transcription factor essential for normal development of the urogenital system and deregulated across many cancer types 31 . In breast cancer, high mRNA levels of WT1 were reported to be associated with poor patient survival 53 and positive modulation of expression of WT1 by its antisense transcript WT1-AS 54,55 . Our observed patterns of methylation and survival support an extensive body of evidence on the tight epigenetic transcriptional regulation of WT1 and its role in breast cancer prognosis. More specifically, high levels of methylation across regions spanning gene bodies of WT1 and WT1-AS genes correlate with elevated levels of expression and poor survival, whereas hypermethylation spanning the bidirectional promoter is associated with decreased WT1 and WT1-AS expression and improved survival.
Cumulatively, the work presented here highlights the prognostic potential of DNA methylation in TNBC. We identified individual potential biomarkers of patient outcome as well as providing the first evidence to suggest that DNA methylation could be used to stratify TNBC subtypes associated with distinct prognostic profiles. Both observations warrant further clinical investigation in larger independent cohorts as these signatures may in the future provide valuable tools in the management of TNBC.

Methods
Breast cancer tissue samples. Human tissue samples representing normal and tumour breast from fresh frozen and formalin-fixed paraffin-embedded tissue were obtained for this study. Only samples that were classified as triple-negative Grade 3 ductal adenocarcinomas (Supplementary Table 1 DNA isolation from formalin-fixed paraffin-embedded material. DNA isolation from microdissected formalin-fixed paraffin-embedded tissue was performed using the Gentra Puregene Genomic DNA purification tissue kit according to the manufacturer's instructions (Qiagen). 5 Â 1 mm cores or 5 Â 10 mm full-faced sections were used for each extraction. The de-paraffinization step was carried out as follows: the paraffin samples were cut into small pieces, 500 ml xylene was added and incubated at 55°C for 5 min, and the tissue was pelleted at 16,000g for 3 min, discarding the xylene. After repeating this step, 500 ml 100% EtOH was added for 5 min at room temperature with constant mixing and the tissue collected by centrifugation at 16,000g for 3 min. The EtOH step was repeated and the tissue pellet dried for 10 min. Then, 300 ml of cell lysis solution was added and the tube incubated for 70°C for 10 min, followed by the addition of 20 ml Proteinase K (20 mg ml À 1 ) to each sample and vortexing for 20 s and incubation in a 55°C block overnight with constant vortexing. The following day, a further 10 ml proteinase K was added, vortexed for 20 s and further incubated at 55°C until the samples appear clear. Then, 1 ml RNase A solution (100 mg ml À 1 ) was added, mixed by inverting 25 times and incubated at 37°C for 1 h. The sample was placed on ice to quickly cool it. Then 100 ml protein precipitation solution was added to the cell lysates, vortexed for 20 s, incubated on ice for 5 min and centrifuged at full speed for 5 min at 4°C to pellet the protein precipitate. The supernatant containing the DNA was carefully removed into a clean microcentrifuge tube. The DNA was precipitated with 300 ml 100% isopropanol, and 2 ml glycogen (20 mg ml À 1 ) was added if low yield was expected (o1 mg). The solutions were mixed by inversion (50 times) followed by centrifugation for 10 min at 4°C. The DNA pellet was washed with 70% EtOH, air-dried and dissolved in 20 ml H 2 O. To dissolve the pellet, it was incubated for 1 h at 65°C with constant vortexing. Enrichment of methylated DNA by MBDCap. The MethylMiner Methylated DNA Enrichment Kit (Invitrogen) was used to isolate methylated DNA from 500 ng to 1 mg of genomic FFPET DNA and was sonicated to 100-500 bp. MBD-Biotin Protein (3.5 mg) was coupled to 10 ml of Dynabeads M-280 Streptavidin according to the manufacturer's instructions. The MBD-magnetic bead conjugates were washed three times and re-suspended in one volume of 1 Â Bind/Wash buffer. The capture reaction was performed by the addition of 500 ng to 1 mg sonicated DNA to the MBD-magnetic beads on a rotating mixer for 1 h at room temperature. All capture reactions were done in duplicate. The beads were washed three times with 1 Â Bind/Wash buffer. The bound methylated DNA was eluted as a single fraction with a single high-salt elution buffer (2,000 mM NaCl). Each fraction was concentrated by ethanol precipitation using 1 ml glycogen (20 mg ml À 1 ), 1/10 volume of 3 M sodium acetate, pH 5.2 and two sample volumes of 100% ethanol and re-suspended in 60 ml H 2 O. Enrichment of methylated DNA after capture was previously assessed by quantitative PCR of control genes of known methylation status; namely EN1 (heavily methylated) and GAPDH (unmethylated) 22 . Computational analysis of MBDCap-Seq data. Sequenced reads were aligned to the hg18 version of the human genome with bowtie. Reads with more than three mismatches and reads mapping to multiple positions were removed. Finally, multiple reads mapping to exactly the same genomic coordinate were eliminated and only one read was retained for downstream analysis. Alignment statistics for samples used in this study are given in Supplementary Table 5. MBDCap-Seq platform was previously shown to interrogate CpG dense regions of the genome 23 .
To accurately delineate regions of the genome assayable by MBDCap-Seq, we used fully methylated sample (SssI blood sample) to guide us to the genomic regions attracting sequenced tags. More specifically, we applied findPeaks peak calling utility from HOMER suite of programs 56 to fully methylated sample (with parameter settings of -style histone -size 300 -minDist 300 -tagThreshold 18) to identify 230,655 regions covering B116 Mbp of the genome. We interchangeably refer to these regions as regions of interest or SssI regions. For each MBDCap-Seq sample to be analysed, we computed the number of sequenced tags overlapping SssI regions, which resulted in table of counts where columns are samples and rows are SssI regions. We used edgeR Bioconductor package 57 (http://www. bioconductor.org/packages/release/bioc/html/edgeR.html) to model distribution of reads between normal (n ¼ 6) and tumour (n ¼ 19) group of samples in the discovery cohort. Since edgeR package does not support modelling of paired and unpaired data simultaneously, we performed two separate analyses, a paired analysis with six normal/tumour pairs and unpaired analysis with all the samples, and then intersected the results. We found 822 hypermethylated and 43 hypomethylated regions at FDR threshold of 0.05 in both paired and unpaired analyses.
Clustering of MBDCap-Seq data. The number of reads mapping to a particular region of genome depends not just on the average level of methylation in the region, but also on other factors, such as density of methylated CpG nucleotides. To compare MBDCap-Seq readout to other more quantitative technologies such as HM450K and Sequenom, we used fully methylated MBDCap-Seq sample to normalize MBDCap-Seq readouts for samples in the discovery cohort. More specifically, let X i be the number of tags overlapping region i and N be the total number of tags overlapping SssI regions in the sample to be normalized and Y i and M be the corresponding numbers in the control sample. Then, the normalized number of tags overlapping the region i is given by We used normalized tag counts for heatmap visualization in Fig. 1, for comparison with HM450K in Supplementary Fig. 2, and for comparison with Sequenom in Supplementary Fig. 4.
Functional annotations of the genome. CpG island annotation for hg18 was obtained from UCSC genome browser. The location of CpG island shores was derived from CpG islands by taking ± 2 Kb flanking regions and removing any overlaps with CpG islands. RefSeq transcript annotation for hg18 was obtained from UCSC genome browser. Promoters were defined as þ 2,000/ À 100 bp around transcription start site. Intergenic regions were defined as regions complementing transcript regions extended to ± 2 Kb around the transcripts. HMEC ChromHMM annotations for hg18 were downloaded from ENCODE. The original annotation partitions the HMEC genome into 15 functional states (see Fig. 1b in ref. 24). In Fig. 1c and Supplementary Fig. 1B, for brevity, we collapsed the three original promoter states into one promoter state and the four original enhancer states into one enhancer state.
Enrichment analysis statistical methods. For the enrichment analysis of hypermethylated regions, we used hypergeometric test to assess the enrichment of various functional annotations of the genome in the set of differentially methylated regions. For a given functional annotation represented by a set of genomic regions, fraction of SssI regions (regions assayable by MBDCap-Seq) overlapping functional annotation was compared with the fraction of hyper-/hypomethylated regions overlapping functional annotation using hypergeometric distribution. For the enrichment analysis of genes affected by promoter hypermethylation, first, we used DAVID functional annotation tool 25 to carry out analysis against gene sets defined by SP_PIR_KEYWORDS annotation. Second, we used hypergeometric test to assess the enrichment of additional gene sets in the set of genes affected by promoter hypermethylation 26 . In both the analyses, the set of 15,643 RefSeq genes with promoters overlapping SssI regions was used as a background.
Sequenom quantitative massARRAY methylation analysis. Sequenom MassARRAY methylation analysis was performed according to Coolen et al. 58 Briefly, 500 ng of FFPET clinical sample and cell line DNA (Supplementary Table 2) was extracted and bisulphite treated using the standard bisulphite protocol 59 . As controls for the methylation analysis, whole-genome amplified DNA (0% methylated) and M.SssI-treated DNA (100% methylated) were bisulphite treated in parallel. The primers were designed using the EpiDesignerBETA software from Sequenom (see Supplementary Table 6 for sequences). Each reverse primer has a T7-promoter tag (5 0 -CAGTAATACGACTCACTATAGGGAGAAG GCT-3 0 ) and each forward primer has a 10-mer tag (5 0 -AGGAAGAGAG-3 0 ). On testing these primers on bisulphite-treated DNA, all the primers gave specific PCR products at a Tm of 60°C. To check for potential PCR bias towards methylated or non-methylated sequences, we used serological DNA (Millipore) as a 100% methylated control and whole-genome amplified human blood DNA as a 0% methylated control. The PCRs were optimized and performed in triplicate using the conditions: 95°C for 2 min, 45 cycles of 95°C for 40 s, 60°C for 1 min and 72°C for 1 min 30 s and final extension at 72°C for 5 min. After PCR amplification, the triplicates were pooled and a shrimp alkaline phosphatase treatment was performed using 5 ml of the PCR product as template. Then, 2 ml of the shrimp alkaline phosphatase-treated PCR product was taken and subjected to in vitro transcription and RNaseA Cleavage for the T-cleavage reaction. The samples were purified by resin treatment and spotted on a 384-well SpectroCHIP by a MassARRAY Nanodispenser. This was followed by spectral acquisition on a MassARRAY Analyser Compact matrix-assisted laser desorption/ionization timeof-flight mass spectrometry. The results were then analysed by the EpiTYPER software V 1.0, which gives quantitative methylation levels for individual CpG sites. The average methylation ratio was calculated by averaging the ratios obtained from each CpG site. For the Sequenom validation, sample sizes were determined for a two sample t-test with a two-sided alpha of 0.01, assuming five regions were to be investigated. Assuming the difference in average methylation levels is 0. 25 Table 8).
Analysis of HM450K methylation data. The raw HM450K data were preprocessed and background normalized with Bioconductor minfi package using preprocessIllumina(..., bg.correct ¼ TRUE, normalize ¼ 'controls', reference ¼ 1) command; resulting M-values were used for statistical analyses 60 and beta-values for heatmap visualizations and clustering. To identify TNBC-specific HM450K NATURE COMMUNICATIONS | DOI: 10.1038/ncomms6899 ARTICLE probes, we carried out t-test comparison between TNBC (n ¼ 73) and non-TNBC (n ¼ 386) tumours. This analysis resulted in 282 probes having adj. P value o0.05 and estimated mean difference of methylation between TNBC and non-TNBC tumours of at least 10%; these probes were declared as TNBC specific. Regions overlapping three or more TNBC-specific probes were declared as TNBC specific. For TNBC-specific signature, we trained a Partial Least Squares model as implemented in the caret R package 61,62 to classify tumours into TNBC and non-TNBC based on the methylation values of 282 TNBC-specific probes. The tumour samples in the TCGA HM450K cohort were randomly partitioned into equal-size training/testing sets. The model parameters were derived from training set and then applied to make predictions on the testing set. The performance of the model was assessed using test set predictions.
Analysis of expression data. Differential expression analysis between normal (n ¼ 8) and tumour (n ¼ 89) TNBC samples was carried out with Bioconductor limma package. Since only subset of tumour samples had paired adjacent normal samples, patient data were treated as random effect using limma's duplicate-Correlation(y) function. This analysis resulted in 3,017 downregulated and 3,407 upregulated genes with adj. P value o0.05 out of 17,655 genes on the array. In Fig. 1f,g, we only considered genes with SssI regions in their promoter regions reducing the number of downregulated, upregulated and total genes to 2,119, 2,722 and 15,543, respectively. We used log-transformed RNA-Seq expression values to highlight the relationship between methylation and expression for number of candidate regions in Fig. 2c and Supplementary Fig. 11.
Survival analysis. TNBC tumour samples in TCGA HM450K cohort (n ¼ 73) were clustered on the basis of methylation beta-values of 4,987 HM450K probes overlapping the 822 hypermethylated regions. We applied consensus clustering algorithm 63 as implemented in Bioconductor ConsensusClusterPlus package to the 4,987 Â 73 methylation matrix with parameters maxK ¼ 4, reps ¼ 1000, pItem ¼ 0.8, pFeature ¼ 0.8, clusterAlg ¼ 'km', distance ¼ 'euclidean'. We used SVD decomposition to reduce the dimension of the methylation matrix to R 10 before clustering. We chose the three-cluster configuration for downstream survival analysis. Survival analysis was carried out using Cox proportional hazards model as implemented in R survival package against overall survival data (Supplementary Table 7). Survival analysis of cluster data was carried out with cluster membership as an explanatory variable. The BRCA TNBC cohort consists of 73 patients with HM450K methylation data and 12 events. Survival analysis of individual probes (4,987 probes overlapping 822 hypermethylated DMRs) was carried out with probe methylation status as explanatory variable (univariate analysis) and age, stage and probe methylation status (multivariate analysis). Methylation status was represented by a binary variable, high (higher than the median beta-value for the probe) and low (smaller or equal to the median beta-value for the probe). Stage was derived from AJCC stage in the clinical annotation of samples. Due to moderate size of the cohort, we reduced the number of values of the stage variable to two by collapsing stages I, IA, IB, II, IIA and IIB into one state and stages III, IIIA, IIIB, IIIC and IV into one state. This resulted in 190 probes with methylation status statistically and significantly (P value o0.05 in both univariate and multivariate analyses) associated with overall survival in TCGA TNBC patients. Regional aggregation of survival probes identified 17 hypermethylated DMRs overlapping three or more survival probes. Fourteen regions were associated with poor prognosis, these regions overlapped probes for which high methylation corresponded to lower probability of survival, and three regions were associated with good prognosis.