The use of liquid biopsies for cancer detection and management is rapidly gaining prominence1. Current methods for the detection of circulating tumour DNA involve sequencing somatic mutations using cell-free DNA, but the sensitivity of these methods may be low among patients with early-stage cancer given the limited number of recurrent mutations2,3,4,5. By contrast, large-scale epigenetic alterations—which are tissue- and cancer-type specific—are not similarly constrained6 and therefore potentially have greater ability to detect and classify cancers in patients with early-stage disease. Here we develop a sensitive, immunoprecipitation-based protocol to analyse the methylome of small quantities of circulating cell-free DNA, and demonstrate the ability to detect large-scale DNA methylation changes that are enriched for tumour-specific patterns. We also demonstrate robust performance in cancer detection and classification across an extensive collection of plasma samples from several tumour types. This work sets the stage to establish biomarkers for the minimally invasive detection, interception and classification of early-stage cancers based on plasma cell-free DNA methylation patterns.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
R markdowns (either knit or raw) and scripts used to generate the findings in this study have been deposited on Zenodo (DOIs in Supplementary Table 13). All the cell line datasets generated and/or analysed during the current study are available in the Gene Expression Omnibus repository under accession code GSE79838. The cfMeDIP–seq next-generation sequencing data for patient samples that support the findings of this study are available upon request from the corresponding author to comply with institutional ethics regulation. Source data for Fig. 1b and Extended Data Fig. 3e are provided in Supplementary Table 9, and for Fig. 1c in Supplementary Table 10. Additional source data can be found on Zenodo (Supplementary Table 13).
Diaz, L. A., Jr & Bardelli, A. Liquid biopsies: genotyping circulating tumor DNA. J. Clin. Oncol. 32, 579–586 (2014).
Aravanis, A. M., Lee, M. & Klausner, R. D. Next-generation sequencing of circulating tumor DNA for early cancer detection. Cell 168, 571–574 (2017).
Newman, A. M. et al. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat. Med. 20, 548–554 (2014).
Cohen, J. D. et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926–930 (2018).
Phallen, J. et al. Direct detection of early-stage cancers using circulating tumor DNA. Sci. Transl. Med. 9, eaan2415 (2017).
Hoadley, K. A. et al. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell 158, 929–944 (2014).
Lehmann-Werman, R. et al. Identification of tissue-specific cell death using methylation patterns of circulating DNA. Proc. Natl Acad. Sci. USA 113, E1826–E1834 (2016).
Visvanathan, K. et al. Monitoring of serum DNA methylation as an early independent marker of response and survival in metastatic breast cancer: TBCRC 005 prospective biomarker study. J. Clin. Oncol. 35, 751–758 (2017).
Potter, N. T. et al. Validation of a real-time PCR-based qualitative assay for the detection of methylated SEPT9 DNA in human plasma. Clin. Chem. 60, 1183–1191 (2014).
Chan, K. C. et al. Noninvasive detection of cancer-associated genome-wide hypomethylation and copy number aberrations by plasma DNA bisulfite sequencing. Proc. Natl Acad. Sci. USA 110, 18761–18768 (2013).
Sun, K. et al. Plasma DNA tissue mapping by genome-wide methylation sequencing for noninvasive prenatal, cancer, and transplantation assessments. Proc. Natl Acad. Sci. USA 112, E5503–E5512 (2015).
Grunau, C., Clark, S. J. & Rosenthal, A. Bisulfite genomic sequencing: systematic investigation of critical experimental parameters. Nucleic Acids Res. 29, E65 (2001).
Taiwo, O. et al. Methylome analysis using MeDIP-seq with low DNA concentrations. Nat. Protoc. 7, 617–636 (2012).
Newman, A. M. et al. Integrated digital error suppression for improved detection of circulating tumor DNA. Nat. Biotechnol. 34, 547–555 (2016).
Sharma, S., Kelly, T. K. & Jones, P. A. Epigenetics in cancer. Carcinogenesis 31, 27–36 (2010).
Yin, Y. et al. Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science 356, eaaj2239 (2017).
Michiels, S., Koscielny, S. & Hill, C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365, 488–492 (2005).
Pedersen, K. S. et al. Leukocyte DNA methylation signature differentiates pancreatic cancer patients from healthy controls. PLoS ONE 6, e18223 (2011).
Teschendorff, A. E. et al. An epigenetic signature in peripheral blood predicts active ovarian cancer. PLoS ONE 4, e8274 (2009).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Lienhard, M., Grimm, C., Morkel, M., Herwig, R. & Chavez, L. MEDIPS: genome-wide differential coverage analysis of sequencing data derived from DNA enrichment experiments. Bioinformatics 30, 284–286 (2014).
Kis, O. et al. Circulating tumour DNA sequence analysis as an alternative to multiple myeloma bone marrow aspirates. Nat. Commun. 8, 15086 (2017).
Kennedy, S. R. et al. Detecting ultralow-frequency mutations by Duplex Sequencing. Nat. Protoc. 9, 2586–2606 (2014).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).
Schmitt, M. W. et al. Detection of ultra-rare mutations by next-generation sequencing. Proc. Natl Acad. Sci. USA 109, 14508–14513 (2012).
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Gu, H. et al. Preparation of reduced representation bisulfite sequencing libraries for genome-scale DNA methylation profiling. Nat. Protoc. 6, 468–481 (2011).
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 28, (2008).
Law, C. W., Chen, Y., Shi, W. & Smyth, G. K. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014).
Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).
Aryee, M. J. et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30, 1363–1369 (2014).
This study was conducted with support from the University of Toronto McLaughlin Centre (MC-2015-02), the Canadian Institutes of Health Research (CIHR FDN 148430 and CIHR New Investigator Salary award 201512MSH-360794-228629), Ontario Institute for Cancer Research (OICR) with funds from the province of Ontario, Canada Research Chair (950-231346), and the Princess Margaret Cancer Foundation to D.D.D.C. as well as Canadian Cancer Society (CCSRI 701717) to R.J.H., CCSRI 704716 to R.J.H. and D.D.D.C. and CCSRI 703827 to M.M.H. Recruitment of healthy individuals was supported by Cancer Care Ontario Chair of Population Health and CCSRI 020214 awarded to R.J.H. Collection of lung cancer samples was supported by the Alan B. Brown chair in molecular genomics and the Lusi Wong Lung Cancer Early Detection Program to G.L. We acknowledge the Princess Margaret Genomics Centre for carrying out the next-generation sequencing and the Bioinformatics and HPC Core, Princess Margaret Cancer Centre for their expertise in generating the next-generation sequencing data.
Nature thanks E. Collisson, A. Teschendorff and the other anonymous reviewer(s) for their contribution to the peer review of this work.
D.D.D.C., S.Y.S., A.C., S.V.B., R.S. and R.J.H. are listed as inventors/contributors on patents filed related to this work.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Simulation of the probability of detecting ctDNA as a function of the number of DMRs, sequencing depth and percentage of ctDNA in plasma cfDNA, and a proposed method to enrich ctDNA.
a, Bioinformatic simulation of scenarios with different proportions of ctDNA present in the sample (0.001% to 10%, columns), and a range of tumour-specific DMRs—from 1, 10, 100, 1,000 or 10,000—determined through the comparison of ctDNA to normal cfDNA (rows), with reads sampled at varying sequencing depths at each locus (10×, 100×, 1,000× and 10,000×) (x axis). The probability of detecting at least five epimutations per DMR increases as the number of available features increases, even at shallow coverage per locus (left y axis). Each panel depicts probability of detection against coverage per candidate DMR for one simulation scenario. b, Schematic representation of the cfMeDIP–seq protocol.
Extended Data Fig. 2 Sequencing saturation analysis and quality controls of MeDIP–seq and cfMeDIP–seq carried out on varying starting inputs of HCT116 DNA sheared to mimic cfDNA.
a, Results of the saturation analysis from the Bioconductor package MEDIPS analysing cfMeDIP–seq data from each replicate, for each starting input amount and including an input control. b, The protocol was tested in two biological replicates of four starting DNA inputs (100, 10, 5 and 1 ng) of HCT116 DNA sheared to mimic cfDNA. The specificity of the reaction was calculated using methylated and unmethylated spiked-in A. thaliana DNA. The fold-enrichment ratio was calculated using genomic regions of the fragmented HCT116 DNA (human methylated HIST1H2BA and unmethylated GAPDH). The horizontal dotted line indicates a fold-enrichment ratio threshold of 25, dots represent biological replicates, with lines representing the mean. c, CpG enrichment scores of the sequenced samples (two biological replicates each of four starting DNA inputs (100, 10, 5 and 1 ng) and one input control) show a robust enrichment of CpGs within the genomic regions from the immunoprecipitated samples compared to the input control. The CpG enrichment score was obtained by dividing the relative frequency of CpGs of the regions by the relative frequency of CpGs of the human genome. The horizontal dotted line indicates a CpG enrichment score of 1, dots represent biological replicates, with lines representing the mean. d, Genome-wide Pearson correlations of normalized read counts per 300-bp window between cfMeDIP–seq signal for 1 to 100 ng of input HCT116 DNA sheared to mimic cfDNA (2 biological replicates per concentration). e, Genome Browser snapshot of HCT116 cfMeDIP–seq signal across a window (chr8:145,095,942–145,116,942) selected out of four examined loci, at different starting DNA inputs (1 to 100 ng, in biological replicates), compared with RRBS (ENCODE: ENCSR000DFS) and WGBS (Gene Expression Omnibus: GSM1465024) data (aligned to hg19). For cfMeDIP–seq, the y axis indicates RPKMs; for RRBS, yellow and blue blocks represent hypermethylated and hypomethylated CpGs, respectively. In the WGBS track, peak heights indicate methylation level.
Extended Data Fig. 3 Sequencing saturation analysis and quality controls of cfMeDIP–seq from serial dilution.
a, Schematic representation of the CRC DNA (HCT116) dilution series into multiple myeloma DNA (MM.1S). For both CRC and multiple myeloma DNA, the genomic DNA was sheared to mimic cfDNA fragmentation. The entire dilution series was used to carry out cfMeDIP–seq (n = 1) and ultra-deep sequencing for mutation detection (n = 1). b, The specificity of the reaction for each dilution in the series (n = 1) was calculated using methylated and unmethylated spiked-in A. thaliana DNA. c, CpG enrichment representing the ratio of relative frequency of CpGs in regions to relative frequency of CpGs in the human genome for each dilution in the series (n = 1), determined by cfMeDIP–seq. The horizontal dashed line represents a CpG enrichment of 1. d, Saturation analysis of cfMeDIP–seq sequenced reads from each dilution point in the series (n = 1). e, Across a serial dilution series (n = 7 dilution points, two technical replicates, each replicate was used per protocol) of HCT116 DNA spiked into MM.1S multiple myeloma DNA, near-perfect correlations are observed between observed and expected numbers of DMRs. f, g, Ultra-deep sequencing for mutation detection of three CRC-specific point mutations within BRAF (p.P301P), KRAS (p.G13D) and PIK3CA (p.H1047R) in the same dilution series (of CRC into multiple myeloma DNA) (n = 1). UMIs were incorporated into the sequencing adapters and used to create SSCSs (f) and DCSs (g) for the detection of allele frequency for each mutation at each locus. For each mutation, the reference allele is found at the top. The dashed red line indicates the limit of detection.
Extended Data Fig. 4 Quality control of cfMeDIP–seq from circulating cfDNA from patients with PDAC (cases) and healthy donors (controls).
a, b, Specificity of reaction calculated using methylated and unmethylated spiked-in A. thaliana DNA for each case sample (a) and each control sample (b). The fold-enrichment ratio was not calculated owing to the very limited amount of DNA available after final libraries were generated. c, d, CpG enrichment of the sequenced cases (c) and controls (d). The horizontal dashed line represents a CpG enrichment of 1. e, Principal component (PC) analysis of cfDNA methylation from 24 plasma cfDNA samples from healthy donors and 24 plasma cfDNA samples from patients with PDAC, using the 1 million most variable windows by median absolute deviation (300 bp) genome-wide. Left, PC2 against PC1; right, PC3 against PC1. f, Percentage of variance explained by each principal component.
Extended Data Fig. 5 Methylome analysis of plasma cfDNA distinguishes patients with early-stage PDAC from healthy controls.
a, The difference in plasma cfDNA methylation plotted against the difference in tumour DNA methylation for each overlapping window (n = 547,887). The difference in plasma cfDNA methylation between patients with PDAC and healthy controls is log10-fold, as measured by cfMeDIP–seq. Tumour DNA methylation difference is delta beta from primary PDAC tumour to normal tissue, as measured by RRBS. The blue line is a trend line, with the correlation determined by Pearson’s correlation. b, Scatter plot showing the DNA methylation difference for each overlapping window. The x axis shows the DNA methylation difference for the primary PDAC tumour compared with normal PBMCs from the RRBS data. The y axis shows the DNA methylation difference for the plasma cfDNA methylation from patients with PDAC compared with healthy donors from the cfMeDIP–seq data. Correlation determined by Pearson’s correlation. c, Genome Browser snapshot of RRBS and cfMeDIP–seq signal across a representative chromosomal region selected from four candidate regions (chr8:145,095,942–145,116,942) using reference genome hg19. RRBS tracks show the methylation signal for the laser capture microdissection tissues from PDAC tumour cases and the matching normal tissue, from the same patient, shown in the same order. Each coloured block represents DMCs, with yellow representing hypermethylated and blue representing hypomethylated. cfMeDIP–seq tracks show the methylation signal (RPKMs) detected in the cfDNA, with cases representing plasma from the same PDAC cases and controls corresponding to plasma from age- and sex-matched healthy controls. For the cfMeDIP–seq tracks, green and blue peaks indicate the methylation signal (RPKMs) detected in the cfDNA.
Extended Data Fig. 6 Circulating cfDNA methylation profiles can identify transcription factor footprints and infer active transcriptional networks in the tissue of origin.
a, Expression profile of all transcription factors (n = 42) that were characterized as binding in healthy controls across 53 human tissues from the GTEx project. Several transcription factors that are preferentially expressed in the haematopoietic system were identified (PU.1, NFE2 and GATA1). b, Expression profiles (ssGSEA scores; single-sample gene set enrichment analysis) of all transcription factors with hypomethylated motifs in controls (n = 42) are overexpressed compared with those of 1,000 random sets of 42 transcription factors across GTEx whole-blood data (P < 2.2 × 10−16, Wilcoxon’s Rank Sum test, two-sided). c, Expression profile of all transcription factors (n = 52) characterized as binding in patients with PDAC. Several pancreas-specific or pancreatic-cancer-associated transcription factors were identified. Moreover, hallmark transcription factors that drive molecular subtypes of pancreatic cancer were also identified. d, Expression profile (ssGSEA scores) of all transcription factors with hypomethylated motifs in cases (n = 52) are overexpressed compared with those of 1,000 random sets of 52 transcription factors in the normal pancreas (GTEx data) (Wilcoxon Rank Sum test, two-sided test, P < 2.2 × 10−16). e, Expression profile of all transcription factors with hypomethylated motifs in PDAC cases (n = 52) are overexpressed compared those of 1,000 random sets of 52 transcription factors in PDAC tissue (TCGA data) (Wilcoxon Rank Sum test, two-sided test, P < 2.2 × 10−16). For violin plots (b, d, e) the ends of the boxes represent the lower and upper quartiles and the middle line indicates the median. Whiskers represent 1.5× IQR, and outliers are excluded. Rotated kernel densities are also displayed.
Extended Data Fig. 7 Quality control of cfMeDIP–seq from circulating cfDNA from multiple cancer types.
a, c, e, g, i, k, Specificity of the reaction; and b, d, f, h, j, CpG enrichment score for each sample per cancer type. The horizontal dashed lines represent a CpG enrichment of 1.
a, Yield of cfDNA extracted per ml of plasma from healthy donors (n = 24), bladder cancer (n = 20), renal cancer (n = 20), lung cancer (n = 25), breast cancer (n = 25), pancreatic cancer (n = 24), colorectal cancer (23) and AML (n = 28). Horizontal bars represent the mean, with dots representing individual samples. b–h, Scatter plots showing the DNA methylation difference for all overlapping windows in PDAC (n = 245,980 windows) (b), AML (n = 206,735 windows) (c), BLCA (n = 193,943 windows) (d), BRCA (n = 204,623 windows) (e), CRC (n = 210,645 windows) (f), LUC (n = 193,043 windows) (g) and RCC (n = 198,390 windows) (h). The x axis shows the DNA methylation difference between the primary tumour (TCGA data) and normal PBMCs. The y axis shows the DNA methylation difference between the plasma cfDNA methylation for each cancer type and healthy controls from the cfMeDIP–seq data. The blue line is a trend line, with the correlation determined by Pearson’s correlation.
Extended Data Fig. 9 Circulating plasma cfDNA methylation samples used to distinguish between multiple cancer types and healthy donors.
a, b, Pathology stage (according to the AJCC/UICC 7th Edition) breakdown by tumour type for samples in the training set (a) and in the validation set (b). Non-small-cell lung carcinoma, LUC (NSCLC); small-cell lung cancer, LUC (SCLC).
Extended Data Fig. 10 Characterization of hypermethylated regions from cfDNA that are not methylated in leukocytes.
a, Violin plots for the DNA methylation (plotted as beta value) of 38,352 regions in normal blood cells selected on the basis of low DNA methylation levels using IHEC whole-genome bisulfite sequencing data. For violin plots, the ends of the boxes represent the lower and upper quartiles and the middle line represents the median. Whiskers represent 1.5× IQR, and outliers are excluded. Rotated kernel densities are also displayed. b, Volcano plots representing the regions with low DNA methylation levels in normal blood cells that overlap with hypermethylated regions in the plasma cfDNA for PDAC (n = 3,146 CpG sites) relative to normal tissue, and RCC (n = 2,767 CpG sites), BLCA (n = 3,286 CpG sites), BRCA (n = 6,836 CpG sites), CRC (n = 8,360 CpG sites) and LUC (n = 5,239 CpG sites) relative to PBMCs. The x axis represents DNA methylation (plotted as delta beta value), obtained from tumour data from TCGA for cancers other than PDAC and RRBS for PDAC. The y axis represents −log10 q values (Benjamini Hochberg false discovery rate, BHFDR).
About this article
Cite this article
Shen, S.Y., Singhania, R., Fehringer, G. et al. Sensitive tumour detection and classification using plasma cell-free DNA methylomes. Nature 563, 579–583 (2018) doi:10.1038/s41586-018-0703-0
Molecular Oncology (2019)
Drug resistance: origins, evolution and characterization of genomic clones and the tumor ecosystem to optimize precise individualized therapy
Drug Discovery Today (2019)
Pediatric Blood & Cancer (2019)
A Machine Learning Method for Identifying Lung Cancer Based on Routine Blood Indices: Qualitative Feasibility Study
JMIR Medical Informatics (2019)