Targeted bisulfite sequencing (TBS) has become the method of choice for the cost-effective, targeted analysis of the human methylome at base-pair resolution. In this study, we benchmarked five commercially available TBS platforms—three hybridization capture-based (Agilent, Roche and Illumina) and two reduced-representation-based (Diagenode and NuGen)—across 11 samples. Two samples were also compared with whole-genome DNA methylation sequencing with the Illumina and Oxford Nanopore platforms. We assessed workflow complexity, on/off-target performance, coverage, accuracy and reproducibility. Although all platforms produced robust and reproducible data, major differences in the number and identity of the CpG sites covered make it difficult to compare datasets generated on different platforms. To overcome this limitation, we applied imputation and show that it improves interoperability from an average of 10.35% (0.8 million) to 97% (7.6 million) common CpG sites. Our study provides guidance on which TBS platform to use for different methylome features and offers an imputation-based harmonization solution that allows comparative, integrative analysis.
Your institute does not have access to this article
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
The datasets generated and analyzed in the current study, including all raw targeted bisulfite sequencing, WGBS of Ref.gDNA and Nanopore sequencing data, have been deposited in the European Nucleotide Archive repository under accession number PRJEB46506 and are freely available. Raw WGBS sequencing data for the Coriell-NA12878 WGBS_EC sample generated by the ENCODE Project Consortium26 were downloaded from the ENCODE Project (experiment: ENCSR890UQO, library: ENCLB898WPW) (https://www.encodeproject.org/experiments/ENCSR890UQO/), and CpG count files for WGBS_IL sample were downloaded from Illumina BaseSpace Hub (https://basespace.illumina.com/datacentral) under sample name WGBS_P3 from HiSeq 4000: TruSeq DNA Methylation (NA12878, 2 × 76) dataset.
The code used for annotation, differential methylation analysis, plotting and imputation is available in the GitHub repository at https://github.com/ucl-medical-genomics/EpiCapture.
Schubeler, D. Function and information content of DNA methylation. Nature 517, 321–326 (2015).
Laird, P. W. Principles and challenges of genomewide DNA methylation analysis. Nat. Rev. Genet. 11, 191–203 (2010).
Stirzaker, C., Taberlay, P. C., Statham, A. L. & Clark, S. J. Mining cancer methylomes: prospects and challenges. Trends Genet. 30, 75–84 (2014).
Gu, H. et al. Genome-scale DNA methylation mapping of clinical samples at single-nucleotide resolution. Nat. Methods 7, 133–136 (2010).
Guo, S. et al. Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA. Nat. Genet. 49, 635–642 (2017).
Gnirke, A. et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat. Biotechnol. 27, 182–189 (2009).
Meissner, A. et al. Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis. Nucleic Acids Res. 33, 5868–5877 (2005).
Kacmarczyk, T. J. et al. ‘Same difference’: comprehensive evaluation of four DNA methylation measurement platforms. Epigenetics Chromatin 11, 21 (2018).
Warnecke, P. M. et al. Detection and measurement of PCR bias in quantitative methylation analysis of bisulphite-treated DNA. Nucleic Acids Res. 25, 4422–4426 (1997).
Wojdacz, T. K., Borgbo, T. & Hansen, L. L. Primer design versus PCR bias in methylation independent PCR amplifications. Epigenetics 4, 231–234 (2009).
Ebbert, M. T. et al. Evaluating the necessity of PCR duplicate removal from next-generation sequencing data and a comparison of approaches. BMC Bioinformatics 17, 239 (2016).
Kivioja, T. et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9, 72–74 (2011).
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).
Rhee, I. et al. DNMT1 and DNMT3b cooperate to silence genes in human cancer cells. Nature 416, 552–556 (2002).
Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14, 407–410 (2017).
Gershman, A. et al. Epigenetic patterns in a complete human genome. Science 376, eabj5089 (2022).
Angermueller, C., Lee, H. J., Reik, W. & Stegle, O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 18, 67 (2017).
Zou, L. S. et al. BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues. BMC Genomics 19, 390 (2018).
Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009).
Liu, M. C. et al. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann. Oncol. 31, 745–759 (2020).
Landau, D. A. et al. Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia. Cancer Cell 26, 813–825 (2014).
Li, S. et al. Distinct evolution and dynamics of epigenetic and genetic heterogeneity in acute myeloid leukemia. Nat. Med. 22, 792–799 (2016).
Rosenthal, R. et al. Neoantigen-directed immune escape in lung cancer evolution. Nature 567, 479–485 (2019).
Olova, N. et al. Comparison of whole-genome bisulfite sequencing library preparation strategies identifies sources of biases affecting DNA methylation data. Genome Biol. 19, 33 (2018).
Li, Q. et al. Post-conversion targeted capture of modified cytosines in mammalian and plant genomes. Nucleic Acids Res. 43, e81 (2015).
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for bisulfite-seq applications. Bioinformatics 27, 1571–1572 (2011).
Ewels, P., Magnusson, M., Lundin, S. & Kaller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
Quinlan, A. R. BEDTools: the Swiss-army tool for genome feature analysis. Curr. Protoc. Bioinformatics 47, 11.12.1–11.12.34 (2014).
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).
Zhu, L. J. et al. ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 11, 237 (2010).
Akalin, A. et al. methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol. 13, R87 (2012).
Wang, H. Q., Tuominen, L. K. & Tsai, C. J. SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures. Bioinformatics 27, 225–231 (2011).
Cavalcante, R. G. & Sartor, M. A. annotatr: genomic regions in context. Bioinformatics 33, 2381–2383 (2017).
Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).
Morgan, M. & Shepherd, L. AnnotationHub: client to access AnnotationHub resources. R package version 3.2.0. https://bioconductor.org/packages/release/bioc/html/AnnotationHub.html (2022).
Lawrence, M. HelloRanges: introduce *Ranges to bedtools users. R package version 1.20.0. https://bioconductor.org/packages/release/bioc/html/HelloRanges.html (2022).
Khan, A. & Mathelier, A. Intervene: a tool for intersection and visualization of multiple gene or genomic region sets. BMC Bioinformatics 18, 287 (2017).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
M.T. received funding from the European Union’s Seventh Framework Programme (Marie Skłodowska-Curie Actions FP7/2007-2013/WHRI-ACADEMY-608765); the Danish Council for Strategic Research (1309-00006B); the Ministry of Education, Science and Technological Development of Serbia (2011-2019/III-41026 and 451-03-68/2020-14/200043); and the Science Fund of the Republic of Serbia (PROMIS/2020/6060876). I.M. is supported by the Biotechnology and Biological Sciences Research Council (grant no. BB/M009513/1). S.B. has received funding from the Wellcome Trust (218274/Z/19/Z) and a Royal Society Wolfson Research Merit Award (WM100023). A.F. received support from the UCL/UCLH Biomedical Research Centre, the Medical Research Council (MR/M025411/1), Prostate Cancer UK (MA_TR15_009) and the Biotechnology and Biological Sciences Research Council (BB/R009295/1). S.R. received funding from Orchid. We further acknowledge support from D. Turner and B. Sipos (Oxford Nanopore Technologies) for the generation of the Nanopore sequencing data and from the CRUK–UCL Centre-funded Genomics and Genome Engineering and Bioinformatics Translational Technology Platforms.
The authors declare no competing interests.
Peer review information
Nature Biotechnology thanks Miguel Branco, Alexander Dobrovic and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a Bismark alignment rates for uniquely, ambiguously, or unaligned reads for each sample by platform; b Percent of reads aligning to top or bottom DNA strand for each sample by the platform; c Global methylation levels of CpG dinucleotides for each sample by the platform; d The global cytosine methylation level in CHG context for each sample by the platform used an estimate of sodium bisulfite under-conversion rates; e The global cytosine methylation level in CHH context for each sample by the platform used an estimate of sodium bisulfite under-conversion rates.; f -g M-bias plot shows the average percentage methylation and coverage across read length for each sample. Each line represents a sample. Methylation bias for the forward sequencing read by platform (f); Methylation bias for the reverse sequencing read by platform (g).
The fraction of targets covered at specific depth of sequencing for each sample by the platform: Agilent (a), Illumina (b) and Roche (c). Each sample is represented by a line.
Scatterplot showing pairwise Pearson correlation coefficient for Coriell NA12878 data from, WGBS EC vs. WGBS IL (a), Nanopore vs. WGBS IL (b), and Nanopore vs. WGBS EC (c).
Interoperability between platforms for Coriell NA12878 (left) and Ref.gDNA (right) before imputation (first row), after imputation without distnce treshold (sencond row), after imputation with 1000 bp distance treshold (third row) and after imputauion with 25 bp distance treshold (fourth row). Venn diagram showing CpGs overlapping between the platforms.
About this article
Cite this article
Tanić, M., Moghul, I., Rodney, S. et al. Comparison and imputation-aided integration of five commercial platforms for targeted DNA methylome analysis. Nat Biotechnol (2022). https://doi.org/10.1038/s41587-022-01336-9