Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Comparison and imputation-aided integration of five commercial platforms for targeted DNA methylome analysis

Abstract

Targeted bisulfite sequencing (TBS) has become the method of choice for the cost-effective, targeted analysis of the human methylome at base-pair resolution. In this study, we benchmarked five commercially available TBS platforms—three hybridization capture-based (Agilent, Roche and Illumina) and two reduced-representation-based (Diagenode and NuGen)—across 11 samples. Two samples were also compared with whole-genome DNA methylation sequencing with the Illumina and Oxford Nanopore platforms. We assessed workflow complexity, on/off-target performance, coverage, accuracy and reproducibility. Although all platforms produced robust and reproducible data, major differences in the number and identity of the CpG sites covered make it difficult to compare datasets generated on different platforms. To overcome this limitation, we applied imputation and show that it improves interoperability from an average of 10.35% (0.8 million) to 97% (7.6 million) common CpG sites. Our study provides guidance on which TBS platform to use for different methylome features and offers an imputation-based harmonization solution that allows comparative, integrative analysis.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Technology and design comparison of TBS platforms.
Fig. 2: Sequencing performance by the platform.
Fig. 3: Platform similarity and feature annotation.
Fig. 4: Platform reproducibility and concordance of DNA methylation calls.
Fig. 5: Differential methylation calls by the platform and imputation.

Similar content being viewed by others

Data availability

The datasets generated and analyzed in the current study, including all raw targeted bisulfite sequencing, WGBS of Ref.gDNA and Nanopore sequencing data, have been deposited in the European Nucleotide Archive repository under accession number PRJEB46506 and are freely available. Raw WGBS sequencing data for the Coriell-NA12878 WGBS_EC sample generated by the ENCODE Project Consortium26 were downloaded from the ENCODE Project (experiment: ENCSR890UQO, library: ENCLB898WPW) (https://www.encodeproject.org/experiments/ENCSR890UQO/), and CpG count files for WGBS_IL sample were downloaded from Illumina BaseSpace Hub (https://basespace.illumina.com/datacentral) under sample name WGBS_P3 from HiSeq 4000: TruSeq DNA Methylation (NA12878, 2 × 76) dataset.

Code availability

The code used for annotation, differential methylation analysis, plotting and imputation is available in the GitHub repository at https://github.com/ucl-medical-genomics/EpiCapture.

References

  1. Schubeler, D. Function and information content of DNA methylation. Nature 517, 321–326 (2015).

    Article  CAS  Google Scholar 

  2. Laird, P. W. Principles and challenges of genomewide DNA methylation analysis. Nat. Rev. Genet. 11, 191–203 (2010).

    Article  CAS  Google Scholar 

  3. Stirzaker, C., Taberlay, P. C., Statham, A. L. & Clark, S. J. Mining cancer methylomes: prospects and challenges. Trends Genet. 30, 75–84 (2014).

    Article  CAS  Google Scholar 

  4. Gu, H. et al. Genome-scale DNA methylation mapping of clinical samples at single-nucleotide resolution. Nat. Methods 7, 133–136 (2010).

    Article  CAS  Google Scholar 

  5. Guo, S. et al. Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA. Nat. Genet. 49, 635–642 (2017).

    Article  CAS  Google Scholar 

  6. Gnirke, A. et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat. Biotechnol. 27, 182–189 (2009).

    Article  CAS  Google Scholar 

  7. Meissner, A. et al. Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis. Nucleic Acids Res. 33, 5868–5877 (2005).

    Article  CAS  Google Scholar 

  8. Kacmarczyk, T. J. et al. ‘Same difference’: comprehensive evaluation of four DNA methylation measurement platforms. Epigenetics Chromatin 11, 21 (2018).

    Article  Google Scholar 

  9. Warnecke, P. M. et al. Detection and measurement of PCR bias in quantitative methylation analysis of bisulphite-treated DNA. Nucleic Acids Res. 25, 4422–4426 (1997).

    Article  CAS  Google Scholar 

  10. Wojdacz, T. K., Borgbo, T. & Hansen, L. L. Primer design versus PCR bias in methylation independent PCR amplifications. Epigenetics 4, 231–234 (2009).

    Article  CAS  Google Scholar 

  11. Ebbert, M. T. et al. Evaluating the necessity of PCR duplicate removal from next-generation sequencing data and a comparison of approaches. BMC Bioinformatics 17, 239 (2016).

    Article  Google Scholar 

  12. Kivioja, T. et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9, 72–74 (2011).

    Article  Google Scholar 

  13. Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).

    Article  CAS  Google Scholar 

  14. Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).

    Article  CAS  Google Scholar 

  15. Rhee, I. et al. DNMT1 and DNMT3b cooperate to silence genes in human cancer cells. Nature 416, 552–556 (2002).

    Article  CAS  Google Scholar 

  16. Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14, 407–410 (2017).

    Article  CAS  Google Scholar 

  17. Gershman, A. et al. Epigenetic patterns in a complete human genome. Science 376, eabj5089 (2022).

    Article  CAS  Google Scholar 

  18. Angermueller, C., Lee, H. J., Reik, W. & Stegle, O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 18, 67 (2017).

    Article  Google Scholar 

  19. Zou, L. S. et al. BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues. BMC Genomics 19, 390 (2018).

    Article  Google Scholar 

  20. Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009).

    Article  CAS  Google Scholar 

  21. Liu, M. C. et al. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann. Oncol. 31, 745–759 (2020).

    Article  CAS  Google Scholar 

  22. Landau, D. A. et al. Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia. Cancer Cell 26, 813–825 (2014).

    Article  CAS  Google Scholar 

  23. Li, S. et al. Distinct evolution and dynamics of epigenetic and genetic heterogeneity in acute myeloid leukemia. Nat. Med. 22, 792–799 (2016).

    Article  CAS  Google Scholar 

  24. Rosenthal, R. et al. Neoantigen-directed immune escape in lung cancer evolution. Nature 567, 479–485 (2019).

    Article  CAS  Google Scholar 

  25. Olova, N. et al. Comparison of whole-genome bisulfite sequencing library preparation strategies identifies sources of biases affecting DNA methylation data. Genome Biol. 19, 33 (2018).

    Article  Google Scholar 

  26. Li, Q. et al. Post-conversion targeted capture of modified cytosines in mammalian and plant genomes. Nucleic Acids Res. 43, e81 (2015).

    Article  Google Scholar 

  27. Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for bisulfite-seq applications. Bioinformatics 27, 1571–1572 (2011).

    Article  CAS  Google Scholar 

  28. Ewels, P., Magnusson, M., Lundin, S. & Kaller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).

    Article  CAS  Google Scholar 

  29. Quinlan, A. R. BEDTools: the Swiss-army tool for genome feature analysis. Curr. Protoc. Bioinformatics 47, 11.12.1–11.12.34 (2014).

    Article  Google Scholar 

  30. Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).

    Article  CAS  Google Scholar 

  31. Zhu, L. J. et al. ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 11, 237 (2010).

    Article  Google Scholar 

  32. Akalin, A. et al. methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol. 13, R87 (2012).

    Article  Google Scholar 

  33. Wang, H. Q., Tuominen, L. K. & Tsai, C. J. SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures. Bioinformatics 27, 225–231 (2011).

    Article  Google Scholar 

  34. Cavalcante, R. G. & Sartor, M. A. annotatr: genomic regions in context. Bioinformatics 33, 2381–2383 (2017).

    Article  CAS  Google Scholar 

  35. Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).

    Article  CAS  Google Scholar 

  36. Morgan, M. & Shepherd, L. AnnotationHub: client to access AnnotationHub resources. R package version 3.2.0. https://bioconductor.org/packages/release/bioc/html/AnnotationHub.html (2022).

  37. Lawrence, M. HelloRanges: introduce *Ranges to bedtools users. R package version 1.20.0. https://bioconductor.org/packages/release/bioc/html/HelloRanges.html (2022).

  38. Khan, A. & Mathelier, A. Intervene: a tool for intersection and visualization of multiple gene or genomic region sets. BMC Bioinformatics 18, 287 (2017).

    Article  Google Scholar 

  39. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  Google Scholar 

Download references

Acknowledgements

M.T. received funding from the European Union’s Seventh Framework Programme (Marie Skłodowska-Curie Actions FP7/2007-2013/WHRI-ACADEMY-608765); the Danish Council for Strategic Research (1309-00006B); the Ministry of Education, Science and Technological Development of Serbia (2011-2019/III-41026 and 451-03-68/2020-14/200043); and the Science Fund of the Republic of Serbia (PROMIS/2020/6060876). I.M. is supported by the Biotechnology and Biological Sciences Research Council (grant no. BB/M009513/1). S.B. has received funding from the Wellcome Trust (218274/Z/19/Z) and a Royal Society Wolfson Research Merit Award (WM100023). A.F. received support from the UCL/UCLH Biomedical Research Centre, the Medical Research Council (MR/M025411/1), Prostate Cancer UK (MA_TR15_009) and the Biotechnology and Biological Sciences Research Council (BB/R009295/1). S.R. received funding from Orchid. We further acknowledge support from D. Turner and B. Sipos (Oxford Nanopore Technologies) for the generation of the Nanopore sequencing data and from the CRUK–UCL Centre-funded Genomics and Genome Engineering and Bioinformatics Translational Technology Platforms.

Author information

Authors and Affiliations

Authors

Contributions

M.T., A.F. and S.B. conceived and designed the study. M.T. and S.R. performed the hybridization capture and RRBS experiments. P.D. and H.V. sequenced the libraries. M.T. and J.B. processed raw sequencing data. M.T. performed analysis of TBS data. I.M. analyzed WGBS and Nanopore data and performed imputation analysis. M.T., A.F. and S.B. interpreted the results. M.T., A.F. and S.B. wrote the manuscript. All authors read and approved the manuscript.

Corresponding authors

Correspondence to Miljana Tanić or Stephan Beck.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Miguel Branco, Alexander Dobrovic and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Sequencing data processing quality metrics produced by MultiQC.

a Bismark alignment rates for uniquely, ambiguously, or unaligned reads for each sample by platform; b Percent of reads aligning to top or bottom DNA strand for each sample by the platform; c Global methylation levels of CpG dinucleotides for each sample by the platform; d The global cytosine methylation level in CHG context for each sample by the platform used an estimate of sodium bisulfite under-conversion rates; e The global cytosine methylation level in CHH context for each sample by the platform used an estimate of sodium bisulfite under-conversion rates.; f -g M-bias plot shows the average percentage methylation and coverage across read length for each sample. Each line represents a sample. Methylation bias for the forward sequencing read by platform (f); Methylation bias for the reverse sequencing read by platform (g).

Extended Data Fig. 2 Target depth of coverage.

The fraction of targets covered at specific depth of sequencing for each sample by the platform: Agilent (a), Illumina (b) and Roche (c). Each sample is represented by a line.

Extended Data Fig. 3 Intra-platform concordance.

Scatterplot showing pairwise Pearson correlation coefficient for Coriell NA12878 data from, WGBS EC vs. WGBS IL (a), Nanopore vs. WGBS IL (b), and Nanopore vs. WGBS EC (c).

Extended Data Fig. 4 Platform interoperability.

Interoperability between platforms for Coriell NA12878 (left) and Ref.gDNA (right) before imputation (first row), after imputation without distnce treshold (sencond row), after imputation with 1000 bp distance treshold (third row) and after imputauion with 25 bp distance treshold (fourth row). Venn diagram showing CpGs overlapping between the platforms.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tanić, M., Moghul, I., Rodney, S. et al. Comparison and imputation-aided integration of five commercial platforms for targeted DNA methylome analysis. Nat Biotechnol 40, 1478–1487 (2022). https://doi.org/10.1038/s41587-022-01336-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-022-01336-9

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing