Rapid and reliable detection of disease-associated DNA methylation patterns has major potential to advance molecular diagnostics and underpin research investigations. We describe the development and validation of minimal methylation classifier (MIMIC), combining CpG signature design from genome-wide datasets, multiplex-PCR and detection by single-base extension and MALDI-TOF mass spectrometry, in a novel method to assess multi-locus DNA methylation profiles within routine clinically-applicable assays. We illustrate the application of MIMIC to successfully identify the methylation-dependent diagnostic molecular subgroups of medulloblastoma (the most common malignant childhood brain tumour), using scant/low-quality samples remaining from the most recently completed pan-European medulloblastoma clinical trial, refractory to analysis by conventional genome-wide DNA methylation analysis. Using this approach, we identify critical DNA methylation patterns from previously inaccessible cohorts, and reveal novel survival differences between the medulloblastoma disease subgroups with significant potential for clinical exploitation.
Altered DNA methylation patterns have emerged as a common feature of disease pathogenesis, showing clear potential in diagnostics, sub-classification and prediction of therapeutic response/ disease course1,2,3,4,5,6,7. In contrast to current high-throughput, genome-wide research methodologies (e.g. whole-genome bisulfite sequencing8, DNA methylation arrays9), particular challenges exist in the clinical application of disease-associated methylation patterns. These include derivation and validation of representative DNA methylation signatures from genome-scale datasets, and their assessment using platform-independent assays that can be applied rapidly to single samples, including low quality and/or quantity biopsies, in routine diagnostics. To address this, we have developed and validated MIMIC (minimal methylation classifier), a novel polymerase chain reaction (PCR)-based assay for the multiplexed assessment of bisulfite-induced methylation-dependent DNA sequence changes10 at multiple signature CpG loci. Sequence-specific single-base variants are exposed by primer extension and, here, coupled to detection by MALDI-TOF mass spectrometry commonly used in clinical DNA diagnostics11, enabling the investigation of samples that were not suitable for analysis using existing methods.
We focussed assay development on medulloblastoma, the most common malignant brain tumour of childhood12, where DNA methylation signatures have clear potential for use in routine clinical sub-classification13,14. Medulloblastoma comprises four primary molecular subgroups – WNT, SHH, Grp3 and Grp4 - defined by distinct methylomic, transcriptomic and genomic features13,14,15. These subgroups display characteristic clinical features, drug targets and outcomes, and have significantly contributed to the 2016 World Health Organisation (WHO) classification of brain tumours16. Following design and validation of a MIMIC assay for molecular subgrouping, we assessed its efficacy in limited archival tumour biopsies previously refractory to subgrouping using current research methods, taken from the pan-European HIT-SIOP-PNET4 medulloblastoma clinical trial (2000–2006)17,18. This trial enrolled patients negative for all established clinico-molecular risk-factors (termed ‘standard-risk (SR)’ disease12), a group for which there is an urgent unmet need to develop biomarker-driven treatment strategies.
Derivation of minimal DNA methylation signatures to identify the four medulloblastoma molecular subgroups
We first identified a DNA methylation signature of 17 CpG loci, established detection methods and developed a Support Vector Machine (SVM) classification model for distinction of the four medulloblastoma molecular subgroups (WNT, SHH, Grp3 and Grp4; Fig. 1a). Non-negative matrix factorisation (NMF) consensus clustering13,19,20 was used to identify subgroup membership of a training cohort comprising genome-scale Illumina 450k DNA methylation microarray data for 220 medulloblastomas (Fig. 1b; Fig. 2). The 50 most discriminatory CpG loci for each subgroup (i.e. 200 in total) were considered as signature candidates. These were triaged using (i) a 10-fold cross validated classification fusion algorithm, (ii) a reiterative primer design process where amenability to primer design and multiplex bisulfite PCR was assessed in silico (Supplemental experimental methods), and (iii) in vitro PCR validation (Fig. 1a; Fig. 3). Candidate signature CpG loci were assayed by the development of a novel application of the Agena iPlex assay21, whereby methylation-dependent SNPs representative of CpG methylation status were induced by initial treatment of DNA with sodium bisulfite10, followed by multiplexed PCR and single base extension of probe oligonucleotides. The resultant products were quantified by MALDI-TOF MS (Matrix-assisted laser desorption/ionization-time of flight mass spectrometry; Supplementary Fig. 1). The accuracy and precision of methylation estimates from multiplexed extension reactions were tested using incremental proportions of bisulfite-treated methylated:unmethylated DNA (Supplementary Fig. 2). Using these techniques, our optimal, multiply-redundant 17-CpG locus signature was generated. Finally, the training cohort was used to generate an optimised SVM classifier for the signature using 450k DNA methylation array data.
Development and validation of MS-MIMIC
We next assessed mass spectrometry-minimal methylation classifier (MS-MIMIC) performance against Illumina 450k methylation microarrays in an independent validation cohort of 106 medulloblastoma DNA samples (Fig. 1c) containing all four medulloblastoma subgroups (Supplementary Table 1). DNAs were derived from tissue samples reflecting different clinical fixation methods; fresh-frozen biopsies (n = 40), formalin-fixed paraffin-embedded biopsies (FFPE, tumour section; n = 39), or FFPE-derived cytospin nuclear preparations18 (n = 27). QC measures for CpG locus-specific assay failure were established; up to 6 failed CpGs per sample were tolerated within the multiply-redundant signature/classifier without impacting performance and this was the case for 101/106 validation cohort samples (Supplementary Figs 3 and 4). A probability threshold for confident subgroup classification was derived empirically, below which samples were deemed non-classifiable (6/101 samples; 5.7%) (Supplementary Fig. 5). Respective subgroup classifications were compared; MS-MIMIC classifications showed complete concordance with the reference subgroup (as determined by DNA methylation array and/or CTNNB1 mutation status for WNT tumours15; Fig. 1d). Furthermore, CpG-level methylation estimates (β-values) were equivalent between methods (R2 = 0.79, p < 0.00001; Fig. 1e). As expected, fresh-frozen derivatives performed best (n = 39/40; 98% successfully subgrouped), with 92% success (n = 56/61) using FFPE-derived DNA (Fig. 1f–h).
Application of MS-MIMIC to the HIT-SIOP-PNET4 clinical trials cohort
Following successful assay development and validation, we next wished to test the application of MS-MIMIC methylation signature detection in limited, poor quality, archival, clinical biopsies. Analysis of remnant material from the HIT-SIOP-PNET4 cohort offers the first opportunity to determine the potential utility of molecular subgroup status to predict disease outcome in a clinical trial of risk-factor negative (SR) medulloblastoma (Fig. 4a, Supplementary Table 2). Only FFPE sections (n = 42/153 available tumour samples) and cytospin nuclear preparations (approximately 30,000 nuclei isolated and centrifuged onto microscope slides18; n = 111/153) remained, whose DNA derivatives all fell below quality and quantity thresholds (>200 ng double-stranded DNA (dsDNA)) required for methylation profiling using conventional research methods (Illumina 450k and MethylationEpic arrays14). Using MS-MIMIC, 70% (107/153) of samples were successfully subgrouped, and subgroup assignments and β-value estimations were consistent across duplicate determinations (Fig. 4b,c). Assay performance was equivalent across the input DNA range (<2 ng (limit of detection) to 100ng dsDNA, 41.4ng median DNA input, p = 0.852, chi-squared test) (Fig. 4d,e,f). Reasons for assay failure included unsuccessful bisulfite conversion/PCR (6%; 9/153), and inability to classify due to assay QC failure (24%; 37/153) (Fig. 4e). These findings from HIT-SIOP-PNET4 reveal important subgroup-dependent molecular pathology in SR medulloblastoma. Grp4 was most common (n = 62; 58%), with approximately equivalent numbers of WNT (18/170; 16%), SHH (17/107; 16%) and Grp3 (10/107; 9%) tumours observed. As expected, all tumours with CTNNB1 mutations (n = 14) were classified as WNT (Fig. 4g). The majority of events (defined as disease recurrence or progression following treatment) observed (11/13) affected Grp4 patients (82% 5-year progression-free survival (PFS)), with > 95% PFS in non-Grp4 patients (p = 0.038, log-rank test; Fig. 4h). Subgroup assignment will thus be essential to inform future clinical and research studies in SR medulloblastoma.
We have provided a blueprint for defining minimal, multiply-redundant disease-associated DNA methylation signatures from genome-wide datasets, and have developed MS-MIMIC as a validated assay for their assessment, including open-source classification tools for data interpretation (http://medulloblastomadiagnostics.ncl.ac.uk; Supplementary Fig. 6). Unlike research methodologies (e.g. Illumina 450k and MethylationEPIC arrays) which require batched assessments (≥8 samples per run), MS-MIMIC exploits detection technologies in common clinical use (MALDI-TOF) to enable rapid (<3 days from DNA extraction to result), low-cost (<$200 per sample in 2017), routine assessment in single or multiple samples. The assay format allows a flexible, modular approach, in which multiplex PCRs can be straightforwardly added or removed, offering the ability to adapt or extend panels to evolving clinical needs. Moreover, its low DNA input requirements and applicability to archival sample collections has the potential to unlock previously inaccessible molecular information from informative cohorts, as demonstrated for HIT-SIOP-PNET4. This assessment of DNA methylation signatures in the clinical setting holds rich promise for molecular sub-classification and prognostication across diverse diseases.
Cohorts and sample collection
Three cohorts (training; n = 220, validation; n = 106 and test; n = 153) were used in this study and are described in Supplementary Table 1. The training and validation cohorts comprised archival non-trial medulloblastoma DNA samples and included tumour samples provided by the UK Children’s Cancer and Leukaemia Group (CCLG) biobank as part of CCLG-approved biological study BS-2007–04. The validation cohort consisted of samples with varying DNA quality, to assess assay performance. The test cohort included samples from the HIT-SIOP-PNET4 clinical trial (2001–2006)17. All tumours assayed had a confirmed histopathological diagnosis of medulloblastoma, with a high tumour cell content. Informed consent was obtained from all participants and/or their legal guardians. All experiments were performed in accordance with relevant guidelines and regulations.
Identification of a minimal methylation signature for discrimination of medulloblastoma molecular subgroups
Non-negative matrix factorisation (NMF) consensus clustering19 was used to identify the four recognised medulloblastoma consensus molecular subgroups13 using 220 training cohort samples run on the Illumina 450k methylation microarray platform (Fig. 2). The 50 most differentially-methylated CpG loci for each subgroup were selected as potential signature candidates using limma22. An iterative CpG locus selection algorithm was used to select signature gene candidates. To optimise signature loci redundancy in each level, up to 6 loci were repeatedly removed at random and classification performance evaluated. The 17 signature loci with the highest ranking in classification were identified (Fig. 3).
The MS-MIMIC assay is based on an adaptation of Agena Biosciences’ iPLEX assay and the MassARRAY platform21. In order to determine methylation status at each signature CpG locus, bisulfite treatment of DNA was used to induce methylation-dependent SNPs. These regions were amplified by multiplex PCR, followed by single base extension using mass-modified dideoxynucleotide terminators. MALDI-TOF mass spectrometry then identifies the proportions of the induced-SNP alleles, from which methylation status can be inferred.
Primer design and validation
PCR and extension primers were designed for multiplex assessment of methylation in 17 signature loci (Supplemental experimental methods) across three multiplexes. Plex 1 contained an additional bisulfite conversion control, targeting an invariably unmethylated locus which undergoes complete conversion to uracil. The multiplexes were validated in vitro using a triplicate mixture series of control DNAs ranging from 0–100% methylation (Supplementary Fig. 2). The correlation between the input and estimated DNA methylation was calculated, and amplicons with a poor correlation were discarded and replaced with a new CpG locus as part of the iterative redesign process (Figs 1a and 3). All signature loci had good linear correlation (average correlation coefficient R2 = 0.86.)
Where possible, 100ng of DNA was bisulfite converted and purified using the Qiagen EpiTect Bisulfite kit, according to manufacturer’s protocol. To ensure that template was not too fragmented for analysis, a test bisulfite PCR, targeting a 200 bp amplicon, was performed (Supplementary Table 3). Reaction mixtures and thermal cycling parameters used to amplify the 17 signature loci in multiplex are shown in the Supplementary Table 4. Successful amplification was confirmed by gel electrophoresis. The multiplex primer iPLEX extension assay was performed as previously described23. Primer sequences for multiplex signature loci PCR and iPLEX extension PCR are shown in the Supplementary Table 5 and 6. Mass spectra for the multiplexes were acquired on a MALDI-TOF mass spectrophotometer (Voyager DE; PerSeptive Biosystems).
Classifier design and validation
Two support vector machine (SVM) classifier models were created to assign subgroup and corresponding probability24, one trained on 450k array data using the 10,000 most variably methylated CpG loci, the second with the 17-CpG locus signature from the training cohort (Fig. 1c). Subsequently, 101/106 validation cohort samples were used to assess MS-MIMIC concordance with 450k-derived data, at the level of molecular subgroup call (Fig. 1d) and estimates of methylation β-value (Fig. 1e). It was anticipated that when applying MS-MIMIC to poor quality samples, certain loci would not be assessable. Using bootstrapped datasets (n = 10,000), a threshold of 6 was established for a maximum acceptable number of missing loci. Missing loci were imputed using expectation maximisation. Subgroup assignments using MS-MIMIC classifier were compared against corresponding subgroup calls from the 450k classifier. A threshold for probability of assignment by the MS-MIMIC classifier was empirically set to 0.69, below which samples were non-classifiable (Supplementary Fig. 5).
Application to HIT-SIOP-PNET4 clinical trial cohort
Following successful assay development and validation, we applied MS-MIMIC to remnant, poor quality, archival, biopsies from the HIT-SIOP-PNET4 clinical trial of risk-factor negative medulloblastoma (Fig. 4a, Supplementary Table 2). Only FFPE sections (n = 42/153 available tumour samples) and cytospin nuclear preparations (approximately 30,000 nuclei isolated and centrifuged onto microscope slides18; n = 111/153) remained, whose DNA derivatives all fell below quality and quantity thresholds (>200 ng double-stranded DNA (dsDNA)) required for methylation profiling using conventional research methods (Illumina 450k and MethylationEpic arrays14). We assessed differential survival of the MS-MIMIC subgroup assignments using log-rank tests.
Further technical details are provided in the Supplemental experimental methods.
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by Cancer Research UK (Programme grants C8464/A13457 and C8464/A23391; S.C.C., D.W. and S.B.) and the UK Children’s Cancer and Leukaemia Group (CCLG)/Tom Grahame Trust (Project grant to S.C.C, D.W. and S.B.). The HIT-SIOP-PNET4 clinical trial was supported by Cancer Research UK, Swedish Childhood Cancer Foundation, French Ministry of Health/French National Cancer Institute and German Children’s Cancer Foundation; informed consent was obtained from all subjects. Tumour investigations were conducted with approval from Newcastle and North Tyneside Research Ethics Committee (Study reference 07/Q0905/71).