Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Analysing and interpreting DNA methylation data

Key Points

  • Recent technological advances make it possible to map DNA methylation in essentially any cell type, tissue or organism.

  • Computational methods and software tools are essential for processing, analysing and interpreting large-scale DNA methylation data sets.

  • Tailored software tools are now available for processing data obtained with all common methods for genome-wide DNA methylation mapping (including bisulphite sequencing and the Infinium assay).

  • Bioinformatic methods for visualization of DNA methylation data facilitate quality assessment and help to pinpoint global trends in the data.

  • By combining stringent statistical methods with computational and experimental validation, researchers can establish accurate lists of differentially methylated regions for a phenotype of interest.

  • Biological interpretation of differential DNA methylation is aided by computational tools for data exploration and enrichment analysis.

  • Large community projects are currently generating reference epigenome maps for many different cell types; the interpretation of these maps will require a comprehensive effort in functional epigenomics.

Abstract

DNA methylation is an epigenetic mark that has suspected regulatory roles in a broad range of biological processes and diseases. The technology is now available for studying DNA methylation genome-wide, at a high resolution and in a large number of samples. This Review discusses relevant concepts, computational methods and software tools for analysing and interpreting DNA methylation data. It focuses not only on the bioinformatic challenges of large epigenome-mapping projects and epigenome-wide association studies but also highlights software tools that make genome-wide DNA methylation mapping more accessible for laboratories with limited bioinformatics experience.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Workflow for analysing and interpreting DNA methylation data.
Figure 2: Two alternative strategies for bisulphite alignment.
Figure 3: Effective identification of differentially methylated regions in a highly annotated genome.

References

  1. Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6–21 (2002).

    Article  CAS  PubMed  Google Scholar 

  2. Ehrlich, M. et al. Amount and distribution of 5-methylcytosine in human DNA from different types of tissues of cells. Nucleic Acids Res. 10, 2709–2721 (1982).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Jones, P. A. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nature Rev. Genet. 13, 484–492 (2012).

    Article  CAS  PubMed  Google Scholar 

  4. Barlow, D. P. Genomic imprinting: a mammalian epigenetic discovery model. Annu. Rev. Genet. 45, 379–403 (2011).

    Article  CAS  PubMed  Google Scholar 

  5. Bestor, T. H. The host defence function of genomic methylation patterns. Novartis Found. Symp. 214, 187–199 (1998).

    CAS  PubMed  Google Scholar 

  6. Meissner, A. Epigenetic modifications in pluripotent and differentiated cells. Nature Biotech. 28, 1079–1088 (2010).

    Article  CAS  Google Scholar 

  7. Reik, W. Stability and flexibility of epigenetic gene regulation in mammalian development. Nature 447, 425–432 (2007).

    Article  CAS  PubMed  Google Scholar 

  8. Martin, M. & Herceg, Z. From hepatitis to hepatocellular carcinoma: a proposed model for cross-talk between inflammation and epigenetic mechanisms. Genome Med. 4, 8 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Baylin, S. B. & Jones, P. A. A decade of exploring the cancer epigenome — biological and translational implications. Nature Rev. Cancer 11, 726–734 (2011).

    Article  CAS  Google Scholar 

  10. Feinberg, A. P. Phenotypic plasticity and the epigenetics of human disease. Nature 447, 433–440 (2007).

    Article  CAS  PubMed  Google Scholar 

  11. Walker, C. L. & Ho, S. M. Developmental reprogramming of cancer susceptibility. Nature Rev. Cancer 12, 479–486 (2012).

    Article  CAS  Google Scholar 

  12. Laird, P. W. The power and the promise of DNA methylation markers. Nature Rev. Cancer 3, 253–266 (2003).

    Article  CAS  Google Scholar 

  13. Bock, C. Epigenetic biomarker development. Epigenomics 1, 99–110 (2009).

    Article  CAS  PubMed  Google Scholar 

  14. Laird, P. W. Principles and challenges of genome-wide DNA methylation analysis. Nature Rev. Genet. 11, 191–203 (2010).

    Article  CAS  PubMed  Google Scholar 

  15. Bock, C. & Lengauer, T. Computational epigenetics. Bioinformatics 24, 1–10 (2008).

    Article  CAS  PubMed  Google Scholar 

  16. Satterlee, J. S., Schübeler, D. & Ng, H. H. Tackling the epigenome: challenges and opportunities for collaboration. Nature Biotech. 28, 1039–1044 (2010).

    Article  CAS  Google Scholar 

  17. Foley, D. L. et al. Prospects for epigenetic epidemiology. Am. J. Epidemiol. 169, 389–400 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Rakyan, V. K., Down, T. A., Balding, D. J. & Beck, S. Epigenome-wide association studies for common human diseases. Nature Rev. Genet. 12, 529–541 (2011). This Review describes the planning and execution of an EWAS for common diseases.

    Article  CAS  PubMed  Google Scholar 

  19. Robinson, M. D., Statham, A. L., Speed, T. P. & Clark, S. J. Protocol matters: which methylome are you actually studying? Epigenomics 2, 587–598 (2010).

    Article  CAS  PubMed  Google Scholar 

  20. Beck, S. Taking the measure of the methylome. Nature Biotech. 28, 1026–1028 (2010).

    Article  CAS  Google Scholar 

  21. Xi, Y. & Li, W. BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics 10, 232 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Wu, T. D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Frith, M. C., Mori, R. & Asai, K. A mostly traditional approach improves alignment of bisulfite-converted DNA. Nucleic Acids Res. 40, e100 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Coarfa, C. et al. Pash 3.0: a versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing. BMC Bioinformatics 11, 572 (2011).

    Article  Google Scholar 

  25. Smith, A. D. et al. Updates to the RMAP short-read mapping software. Bioinformatics 25, 2841–2842 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Xi, Y. et al. RRBSMAP: a fast, accurate and user-friendly alignment tool for reduced representation bisulfite sequencing. Bioinformatics 28, 430–432 (2012).

    Article  CAS  PubMed  Google Scholar 

  27. Otto, C., Stadler, P. F. & Hoffmann, S. Fast and sensitive mapping of bisulfite-treated sequencing data. Bioinformatics 28, 1698–1704 (2012).

    Article  CAS  PubMed  Google Scholar 

  28. Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Harris, E. Y., Ponts, N., Levchuk, A., Roch, K. L. & Lonardi, S. BRAT: bisulfite-treated reads analysis tool. Bioinformatics 26, 572–573 (2010).

    Article  CAS  PubMed  Google Scholar 

  30. Harris, E. Y., Ponts, N., Le Roch, K. G. & Lonardi, S. BRAT-BW: efficient and accurate mapping of bisulfite-treated reads. Bioinformatics 28, 1795–1796 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Chen, P. Y., Cokus, S. J. & Pellegrini, M. BS Seeker: precise mapping for bisulfite sequencing. BMC Bioinformatics 11, 203 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Pedersen, B., Hsieh, T. F., Ibarra, C. & Fischer, R. L. MethylCoder: software pipeline for bisulfite-treated sequences. Bioinformatics 27, 2435–2436 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Nielsen, R., Paul, J. S., Albrechtsen, A. & Song, Y. S. Genotype and SNP calling from next-generation sequencing data. Nature Rev. Genet. 12, 443–451 (2011).

    Article  CAS  PubMed  Google Scholar 

  35. Liu, Y., Siegmund, K. D., Laird, P. W. & Berman, B. P. Bis-SNP: combined DNA methylation and SNP calling for Bisulfite-seq data. Genome Biol. 13, R61 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Miller, C. A., Hampton, O., Coarfa, C. & Milosavljevic, A. ReadDepth: a parallel R package for detecting copy number alterations from short sequencing reads. PLoS ONE 6, e16327 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Hansen, K. D. et al. Increased methylation variation in epigenetic domains across cancer types. Nature Genet. 43, 768–775 (2011). This reference describes a comprehensive and well-documented analysis of a cancer-specific DNA methylation data set.

    Article  CAS  PubMed  Google Scholar 

  38. Krueger, F., Kreck, B., Franke, A. & Andrews, S. R. DNA methylome analysis using short bisulfite sequencing data. Nature Methods 9, 145–151 (2012).

    Article  CAS  PubMed  Google Scholar 

  39. Chung, C. A. High-throughput sequencing of the methylome using two-base encoding. Methods Mol. Biol. 910, 71–86 (2012).

    Article  CAS  PubMed  Google Scholar 

  40. Kreck, B. et al. B-SOLANA: an approach for the analysis of two-base encoding bisulfite sequencing data. Bioinformatics 28, 428–429 (2012).

    Article  CAS  PubMed  Google Scholar 

  41. Ramsahoye, B. H. et al. Non-CpG methylation is prevalent in embryonic stem cells and may be mediated by DNA methyltransferase 3a. Proc. Natl Acad. Sci. USA 97, 5237–5242 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Ziller, M. J. et al. Genomic distribution and inter-sample variation of non-CpG methylation across human cell types. PLoS Genet. 7, e1002389 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Bibikova, M. et al. High density DNA methylation array with single CpG site resolution. Genomics 98, 288–295 (2011).

    Article  CAS  PubMed  Google Scholar 

  44. Dedeurwaerder, S. et al. Evaluation of the Infinium Methylation 450K technology. Epigenomics 3, 771–784 (2011). This study carried out an independent empirical evaluation of the Illumina Infinium 450k assay.

    Article  CAS  PubMed  Google Scholar 

  45. Makismovic, J., Gordon, L. & Oshlack, A. SWAN: Subset-quantile Within Array Normalization for Illumina Infinium HumanMethylation450 BeadChips. Genome Biol. 13, R44 (2012).

    Article  CAS  Google Scholar 

  46. Touleimat, N. & Tost, J. Complete pipeline for Infinium Human Methylation 450K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation. Epigenomics 4, 325–341 (2012).

    Article  CAS  PubMed  Google Scholar 

  47. Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Wang, D. et al. Comparison of different normalization assumptions for analyses of DNA methylation data from the cancer genome. Gene 506, 36–42 (2012).

    Article  CAS  PubMed  Google Scholar 

  49. Du, P. et al. Comparison of beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics 11, 587 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Teschendorff, A. E., Zhuang, J. & Widschwendter, M. Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies. Bioinformatics 27, 1496–1505 (2011).

    Article  CAS  PubMed  Google Scholar 

  51. Leek, J. T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Rev. Genet. 11, 733–739 (2010). This Perspectives article highlights the prevalence of batch effects in genomic data and suggests ways of addressing this problem.

    Article  CAS  PubMed  Google Scholar 

  52. Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).

    Article  PubMed  Google Scholar 

  53. Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The SVA package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Chen, Y. A. et al. Sequence overlap between autosomal and sex-linked probes on the Illumina HumanMethylation27 microarray. Genomics 97, 214–222 (2011).

    Article  CAS  PubMed  Google Scholar 

  55. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Down, T. A. et al. A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nature Biotech. 26, 779–785 (2008).

    Article  CAS  Google Scholar 

  57. Pelizzola, M. et al. MEDME: an experimental and analytical methodology for the estimation of DNA methylation levels based on microarray derived MeDIP-enrichment. Genome Res. 18, 1652–1659 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Chavez, L. et al. Computational analysis of genome-wide DNA methylation during the differentiation of human embryonic stem cells along the endodermal lineage. Genome Res. 20, 1441–1450 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Huang, J. et al. MeQA: a pipeline for MeDIP-seq data quality assessment and analysis. Bioinformatics 28, 587–588 (2012).

    Article  CAS  PubMed  Google Scholar 

  60. Wilson, G. et al. Resources for methylome analysis suitable for gene knockout studies of potential epigenome modifiers. GigaScience 1, 3 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Statham, A. L. et al. Repitools: an R package for the analysis of enrichment-based epigenomic data. Bioinformatics 26, 1662–1663 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Singer, M. et al. MetMap enables genome-scale Methyltyping for determining methylation states in populations. PLoS Comput. Biol. 6, e1000888 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Jing, Q., McLellan, A., Greally, J. M. & Suzuki, M. Automated computational analysis of genome-wide DNA methylation profiling data from HELP-tagging assays. Methods Mol. Biol. 815, 79–87 (2012).

    Article  CAS  PubMed  Google Scholar 

  64. Bock, C. et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nature Biotech. 28, 1106–1114 (2010).

    Article  CAS  Google Scholar 

  65. Harris, R. A. et al. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications. Nature Biotech. 28, 1097–1105 (2010).

    Article  CAS  Google Scholar 

  66. Robinson, M. D. et al. Evaluation of affinity-based genome-wide DNA methylation data: effects of CpG density, amplification bias, and copy number variation. Genome Res. 20, 1719–1729 (2010). References 64, 65 and 66 carried out an empirical benchmarking of widely used methods for genome-wide DNA methylation mapping.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Irizarry, R. A. et al. Comprehensive high-throughput arrays for relative methylation (CHARM). Genome Res. 18, 780–790 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Kent, W. J., Zweig, A. S., Barber, G., Hinrichs, A. S. & Karolchik, D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26, 2204–2207 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Karolchik, D. et al. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 36, D773–D779 (2008).

    Article  CAS  PubMed  Google Scholar 

  71. Flicek, P. et al. Ensembl 2008. Nucleic Acids Res. 36, D707–D714 (2008).

    Article  CAS  PubMed  Google Scholar 

  72. Zhou, X. et al. The Human Epigenome Browser at Washington University. Nature Methods 8, 989–990 (2011). This reference describes a useful Web-based software tool for visualization and graphical analysis of human reference epigenomes.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Robinson, J. T. et al. Integrative genomics viewer. Nature Biotech. 29, 24–26 (2011).

    Article  CAS  Google Scholar 

  74. Nicol, J. W., Helt, G. A., Blanchard, S. G. Jr, Raja, A. & Loraine, A. E. The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets. Bioinformatics 25, 2730–2731 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Shearstone, J. R. et al. Global DNA demethylation during mouse erythropoiesis in vivo. Science 334, 799–802 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Smiraglia, D. J. et al. Excessive CpG island hypermethylation in cancer cell lines versus primary human malignancies. Hum. Mol. Genet. 10, 1413–1419 (2001).

    Article  CAS  PubMed  Google Scholar 

  77. Anders, S. Visualization of genomic data with the Hilbert curve. Bioinformatics 25, 1231–1235 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Bock, C. et al. Reference maps of human ES and iPS cell variation enable high-throughput characterization of pluripotent cell lines. Cell 144, 439–452 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Bock, C. et al. DNA methylation dynamics during in vivo differentiation of blood and skin stem cells. Mol. Cell 47, 633–647 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Figueroa, M. E. et al. Leukemic IDH1 and IDH2 mutations result in a hypermethylation phenotype, disrupt TET2 function, and impair hematopoietic differentiation. Cancer Cell 18, 553–567 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Noushmehr, H. et al. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell 17, 510–522 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Houseman, E. A. et al. Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of β distributions. BMC Bioinformatics 9, 365 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Marjoram, P., Chang, J., Laird, P. W. & Siegmund, K. D. Cluster analysis for DNA methylation profiles having a detection threshold. BMC Bioinformatics 7, 361 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Siegmund, K. D., Laird, P. W. & Laird-Offringa, I. A. A comparison of cluster analysis methods using DNA methylation data. Bioinformatics 20, 1896–1904 (2004).

    Article  CAS  PubMed  Google Scholar 

  85. Xu, J. et al. Pioneer factor interactions and unmethylated CpG dinucleotides mark silent tissue-specific enhancers in embryonic stem cells. Proc. Natl Acad. Sci. USA 104, 12377–12382 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Raval, A. et al. Downregulation of death-associated protein kinase 1 (DAPK1) in chronic lymphocytic leukemia. Cell 129, 879–890 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Moser, D. et al. Functional analysis of a potassium-chloride co-transporter 3 (SLC12A6) promoter polymorphism leading to an additional DNA methylation site. Neuropsychopharmacology 34, 458–467 (2008).

    Article  CAS  PubMed  Google Scholar 

  88. Deaton, A. M. & Bird, A. CpG islands and the regulation of transcription. Genes Dev. 25, 1010–1022 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Wang, D. et al. IMA: an R package for high-throughput analysis of Illumina's 450K Infinium methylation data. Bioinformatics 28, 729–730 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Wang, S. Method to detect differentially methylated loci with case-control designs using Illumina arrays. Genet. Epidemiol. 35, 686–694 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Zhang, Y. et al. QDMR: a quantitative method for identification of differentially methylated regions by entropy. Nucleic Acids Res. 39, e58 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Zhuang, J., Widschwendter, M. & Teschendorff, A. E. A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform. BMC Bioinformatics 13, 59 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Chen, Z., Liu, Q. & Nadarajah, S. A new statistical approach to detecting differentially methylated loci for case control Illumina array methylation data. Bioinformatics 28, 1109–1113 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Poage, G. M. et al. Identification of an epigenetic profile classifier that is associated with survival in head and neck cancer. Cancer Res. 72, 2728–2737 (2012). This study demonstrates how the aggregation by genomic sequence features can reduce the multiple-testing burden and increase the power of a small and otherwise underpowered EWAS.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Robinson, M. D. et al. Copy-number-aware differential analysis of quantitative DNA sequencing data. Genome Res. 9 Aug 2012 (doi:10.1101/gr.139055.112).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Jaffe, A. E. et al. Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int. J. Epidemiol. 41, 200–209 (2012). This study describes a flexible workflow for identifying DMRs in a statistically sound manner.

    Article  PubMed  PubMed Central  Google Scholar 

  97. Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Kuan, P. F. & Chiang, D. Y. Integrating prior knowledge in multiple testing under dependence with applications to detecting differential DNA methylation. Biometrics 19 Jan 2012 (doi:10.1111/j.1541-0420.2011.01730.x).

    Article  PubMed  PubMed Central  Google Scholar 

  99. Ji, H. & Liu, X. S. Analyzing 'omics data using hierarchical models. Nature Biotech. 28, 337–340 (2010).

    Article  CAS  Google Scholar 

  100. Lugthart, S. et al. Aberrant DNA hypermethylation signature in acute myeloid leukemia directed by EVI1. Blood 117, 234–241 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Barfield, R. T., Kilaru, V., Smith, A. K. & Conneely, K. N. CpGassoc: an R function for analysis of DNA methylation microarray data. Bioinformatics 28, 1280–1281 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Kilaru, V., Barfield, R. T., Schroeder, J. W., Smith, A. K. & Conneely, K. N. MethLAB: a graphical user interface package for the analysis of array-based DNA methylation data. Epigenetics 7, 225–229 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  103. Kristensen, L. S. & Hansen, L. L. PCR-based methods for detecting single-locus DNA methylation biomarkers in cancer diagnostics, prognostics, and response to treatment. Clin. Chem. 55, 1471–1483 (2009).

    Article  CAS  PubMed  Google Scholar 

  104. Sepulveda, A. R. et al. CpG methylation analysis-current status of clinical assays and potential applications in molecular diagnostics: a report of the Association for Molecular Pathology. J. Mol. Diagn. 11, 266–278 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Schüffler, P., Mikeska, T., Waha, A., Lengauer, T. & Bock, C. MethMarker: user-friendly design and optimization of gene-specific DNA methylation assays. Genome Biol. 10, R105 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Thompson, R. F., Suzuki, M., Lau, K. W. & Greally, J. M. A pipeline for the quantitative analysis of CG dinucleotide methylation using mass spectrometry. Bioinformatics 25, 2164–2170 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Srivastava, G. P., Guo, J., Shi, H. & Xu, D. PRIMEGENS-v2: genome-wide primer design for analyzing DNA methylation patterns of CpG islands. Bioinformatics 24, 1837–1842 (2008).

    Article  CAS  PubMed  Google Scholar 

  108. Lutsik, P. et al. BiQ Analyzer HT: locus-specific analysis of DNA methylation by high-throughput bisulfite sequencing. Nucleic Acids Res. 39, W551–W556 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Potapova, A. et al. Systematic cross-validation of 454 sequencing and pyrosequencing for the exact quantification of DNA methylation patterns with single CpG resolution. BMC Biotechnol. 11, 6 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Halachev, K., Bast, H., Albrecht, F., Lengauer, T. & Bock, C. EpiExplorer: live exploration and global analysis of large epigenomic datasets. Genome Biol. (in the press). This reference describes a Web-based software tool for interactive exploration and biological hypothesis generation based on epigenome data.

  111. Goecks, J., Nekrutenko, A. & Taylor, J. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  112. Sandve, G. K. et al. The Genomic HyperBrowser: inferential genomics at the sequence level. Genome Biol. 11, R121 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  113. Nam, D. & Kim, S. Y. Gene-set approach for expression pattern analysis. Brief Bioinform. 9, 189–197 (2008).

    Article  PubMed  Google Scholar 

  114. Marsit, C. J. et al. DNA methylation array analysis identifies profiles of blood-derived DNA methylation associated with bladder cancer. J. Clin. Oncol. 29, 1133–1139 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  115. McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nature Biotech. 28, 495–501 (2010).

    Article  CAS  Google Scholar 

  116. Bock, C., Halachev, K., Büch, J. & Lengauer, T. EpiGRAPH: user-friendly software for statistical analysis and prediction of (epi-) genomic data. Genome Biol. 10, R14 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Hackenberg, M. & Matthiesen, R. Annotation-Modules: a tool for finding significant combinations of multisource annotations for gene lists. Bioinformatics 24, 1386–1393 (2008).

    Article  CAS  PubMed  Google Scholar 

  118. Liu, T. et al. Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol. 12, R83 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Irizarry, R. A. et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nature Genet. 41, 178–186 (2009).

    Article  CAS  PubMed  Google Scholar 

  120. Bock, C., Walter, J., Paulsen, M. & Lengauer, T. CpG island mapping by epigenome prediction. PLoS Comput. Biol. 3, e110 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Bjornsson, H. T. et al. Intra-individual change over time in DNA methylation with familial clustering. JAMA 299, 2877–2883 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Bock, C., Walter, J., Paulsen, M. & Lengauer, T. Inter-individual variation of DNA methylation and its implications for large-scale epigenome mapping. Nucleic Acids Res. 36, e55 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Gertz, J. et al. Analysis of DNA methylation in a three-generation family reveals widespread genetic influence on epigenetic regulation. PLoS Genet. 7, e1002228 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. Heijmans, B. T., Kremer, D., Tobi, E. W., Boomsma, D. I. & Slagboom, P. E. Heritable rather than age-related environmental and stochastic factors dominate variation in DNA methylation of the human IGF2/H19 locus. Hum. Mol. Genet. 16, 547–554 (2007).

    Article  CAS  PubMed  Google Scholar 

  125. Xie, H. et al. Genome-wide quantitative assessment of variation in DNA methylation patterns. Nucleic Acids Res. 39, 4099–4108 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  126. Bell, J. T. et al. DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol. 12, R10 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  127. Gibbs, J. R. et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet. 6, e1000952 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  128. Zhang, D. et al. Genetic control of individual differences in gene-specific methylation in human brain. Am. J. Hum. Genet. 86, 411–419 (2010). References 125, 126 and 127 describe the systematic identification of genetic variants that are associated with DNA methylation levels across the genome.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  129. Ji, H. et al. Comprehensive methylome map of lineage commitment from haematopoietic progenitors. Nature 467, 338–342 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Houseman, E. A. et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics 13, 86 (2012). This reference describes a computational method for inferring the cellular composition of heterogeneous tissues from aggregate DNA methylation data.

    Article  PubMed  PubMed Central  Google Scholar 

  131. Jaffe, A. E., Feinberg, A. P., Irizarry, R. A. & Leek, J. T. Significance analysis and statistical dissection of variably methylated regions. Biostatistics 13, 166–178 (2012).

    Article  PubMed  Google Scholar 

  132. Teschendorff, A. E. & Widschwendter, M. Differential variability improves the identification of cancer risk markers in DNA methylation studies profiling precursor cancer lesions. Bioinformatics 28, 1487–1494 (2012).

    Article  CAS  PubMed  Google Scholar 

  133. Feinberg, A. P. & Irizarry, R. A. Evolution in health and medicine Sackler colloquium: stochastic epigenetic variation as a driving force of development, evolutionary adaptation, and disease. Proc. Natl Acad. Sci. USA 107, S1757–S1764 (2010).

    Article  Google Scholar 

  134. Pujadas, E. & Feinberg, A. P. Regulated noise in the epigenetic landscape of development and disease. Cell 148, 1123–1131 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  135. Xie, W. et al. Base-resolution analyses of sequence and parent-of-origin dependent DNA methylation in the mouse genome. Cell 148, 816–831 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Fang, F. et al. Genomic landscape of human allele-specific DNA methylation. Proc. Natl Acad. Sci. USA 109, 7332–7337 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Peng, Q. & Ecker, J. R. Detection of allele-specific methylation through a generalized heterogeneous epigenome model. Bioinformatics 28, i163–i171 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. Relton, C. L. & Davey Smith, G. Two-step epigenetic Mendelian randomization: a strategy for establishing the causal role of epigenetic processes in pathways to disease. Int. J. Epidemiol. 41, 161–176 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  139. Hudson, T. J. et al. International network of cancer genome projects. Nature 464, 993–998 (2010).

    Article  CAS  PubMed  Google Scholar 

  140. Clarke, J. et al. Continuous base identification for single-molecule nanopore DNA sequencing. Nature Nanotechnol. 4, 265–270 (2009).

    Article  CAS  Google Scholar 

  141. Wu, H. & Zhang, Y. Mechanisms and functions of Tet protein-mediated 5-methylcytosine oxidation. Genes Dev. 25, 2436–2452 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  142. Bock, C. & Lengauer, T. Managing drug resistance in cancer: lessons from HIV therapy. Nature Rev. Cancer 12, 494–501 (2012).

    Article  CAS  Google Scholar 

  143. Flusberg, B. A. et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nature Methods 7, 461–465 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  144. Booth, M. J. et al. Quantitative sequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution. Science 336, 934–937 (2012).

    Article  CAS  PubMed  Google Scholar 

  145. Huang, Y. et al. The behaviour of 5-hydroxymethylcytosine in bisulfite sequencing. PLoS ONE 5, e8888 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  146. Gu, H. et al. Genome-scale DNA methylation mapping of clinical samples at single-nucleotide resolution. Nature Methods 7, 133–136 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. Diep, D. et al. Library-free methylation sequencing with bisulfite padlock probes. Nature Methods 9, 270–272 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. Lee, E. J. et al. Targeted bisulfite sequencing by solution hybrid selection and massively parallel sequencing. Nucleic Acids Res. 39, e127 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  149. Bernstein, B. E. et al. The NIH Roadmap Epigenomics Mapping Consortium. Nature Biotech. 28, 1045–1048 (2010).

    Article  CAS  Google Scholar 

  150. Adams, D. et al. BLUEPRINT to decode the epigenetic signature written in blood. Nature Biotech. 30, 224–226 (2012).

    Article  CAS  Google Scholar 

  151. Morton, B. B. A method for combining non-independent, one-sided tests of significance. Biometrics 31, 987–992 (1975).

    Article  Google Scholar 

  152. Allison, D. B., Cui, X., Page, G. P. & Sabripour, M. Microarray data analysis: from disarray to consolidation and consensus. Nature Rev. Genet. 7, 55–65 (2006).

    Article  CAS  PubMed  Google Scholar 

  153. Shen-Orr, S. S. et al. Cell type-specific gene expression differences in complex tissues. Nature Methods 7, 287–289 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  154. Westra, H. J. et al. MixupMapper: correcting sample mix-ups in genome-wide datasets increases power to detect small genetic effects. Bioinformatics 27, 2104–2111 (2011).

    Article  CAS  PubMed  Google Scholar 

  155. Pickrell, J. K., Gaffney, D. J., Gilad, Y. & Pritchard, J. K. False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions. Bioinformatics 27, 2144–2146 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  156. Zhang, X., Mu, W. & Zhang, W. On the analysis of the Illumina 450k array data: probes ambiguously mapped to the human genome. Front. Genet. 3, 73 (2012).

    PubMed  PubMed Central  Google Scholar 

  157. Ehrich, M., Zoll, S., Sur, S. & van den Boom, D. A new method for accurate assessment of DNA quality after bisulfite treatment. Nucleic Acids Res. 35, e29 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  158. Warnecke, P. M. et al. Identification and resolution of artifacts in bisulfite sequencing. Methods 27, 101–107 (2002).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The author would like to thank S. Beck, M. Esteller, T. Lengauer, A. Meissner, H. Stunnenberg and J. Walter for helpful discussions and all past and present collaborators for sharing their ideas, data and insights.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christoph Bock.

Ethics declarations

Competing interests

The author declares no competing financial interests.

Related links

Glossary

Biomarkers

Molecular assays that predict a clinical phenotype, such as disease status or response to a drug.

Reference epigenomes

Publicly available epigenome maps that comprise multiple epigenetic marks for the same cell type (for example, DNA methylation, several histone modifications and non-coding RNA expression).

Epigenome-wide association study

(EWAS). A study design that involves measuring an epigenetic mark in cases and controls to identify disease-associated differences.

Differentially methylated regions

(DMRs). Genomic regions that exhibit statistically significant differences in DNA methylation between sample groups.

Bisulphite

Bisulphite ions (HSO3) selectively deaminate unmethylated but not methylated Cs, giving rise to Us, which are replaced by Ts during subsequent PCR amplification.

Absolute DNA methylation levels

Percentage of methylated alleles for a given C; this value is always binary (0% or 100%) for single alleles but can take any value between 0% and 100% when averaging over many cells.

Sequence complexity

The diversity of the DNA sequence; it can be measured by the information content of the base composition.

Genotype calling

The determination of SNPs and other genetic variants in a given individual.

Genome Analysis Toolkit

(GATK). A widely used software tool for genotype calling based on next-generation sequencing data.

M bias plot

A quality-control diagram that plots mean DNA methylation levels for each position of the bisulphite-sequencing reads. Deviations from a horizontal line indicate biases.

R/Bioconductor

A powerful command-line tool for data processing, statistical analysis and visualization of biological data sets.

β-values

An alternative term for the absolute DNA methylation levels, which stems from the observation that the distribution of DNA methylation levels across the genome resembles a β-distribution.

M values

Logistically transformed β-values. The transformation mitigates some statistical problems of the β-value (namely, limited value range and strongly bimodal distribution) at the cost of reduced biological interpretability.

Batch effects

Systematic biases in the data that are unrelated to the research question but that arise from undesirable (and often unrecognized) differences in sample handling.

Confounding

A nonrandom relationship between the phenotype of interest and external factors (for example, batch effects or population structure) that can give rise to spurious associations.

Enrichment scores

The relative enrichment of DNA fragments from a given genomic region compared to a control experiment (such as sequencing of unenriched DNA).

Tiling map

Segmentation of the genome into tiling windows of a fixed and typically small size (for example, 100 bases).

Logistic regression model

A type of regression model used for modelling the relationship between a binary outcome variable and one or more predictor variables.

CpG methylation table

A data table that contains DNA methylation levels (and, optionally, confidence scores) for each assayed CpG in each sample after normalization and quality control.

False discovery rate

(FDR). A measure of significance that corrects for a large number of statistical tests being carried out on the same data set.

Effect size

A measure for the strength of association between two variables that provides important complementary information to P values and false discovery rates.

Bisulphite pyrosequencing

A locus-specific method for accurate quantification of DNA methylation levels at a small number of CpGs in many samples.

Combined bisulphite restriction analysis

(COBRA). A method that combines bisulphite treatment with sequence-specific restriction enzymes for locus-specific analysis of DNA methylation.

Methylation-specific PCR

(MSP). A method for highly sensitive detection of locus-specific DNA methylation using PCR amplification of bisulphite-converted DNA.

MethyLight

A variant of methylation-specific PCR that is highly quantitative and practical for measuring locus-specific DNA methylation levels in many samples.

EpiTYPER

An assay for measuring locus-specific DNA methylation in many samples on the basis of a combination of bisulphite treatment and mass spectrometry.

Cross-validation

A method for estimating the predictive power of a differentially methylated region or biomarker by carrying out training and validation on different portions of the same data set.

Quantitative trait loci

(QTLs). Genomic regions that control a phenotype of interest, such as the DNA methylation levels of another genomic region.

Mendelian randomization

Epidemiological method for assessing the causal role of an exposure for a phenotype of interest, using genetic variants that are affected neither by the exposure nor by the phenotype.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bock, C. Analysing and interpreting DNA methylation data. Nat Rev Genet 13, 705–719 (2012). https://doi.org/10.1038/nrg3273

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg3273

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing