Analysing and interpreting DNA methylation data

Key Points

  • Recent technological advances make it possible to map DNA methylation in essentially any cell type, tissue or organism.

  • Computational methods and software tools are essential for processing, analysing and interpreting large-scale DNA methylation data sets.

  • Tailored software tools are now available for processing data obtained with all common methods for genome-wide DNA methylation mapping (including bisulphite sequencing and the Infinium assay).

  • Bioinformatic methods for visualization of DNA methylation data facilitate quality assessment and help to pinpoint global trends in the data.

  • By combining stringent statistical methods with computational and experimental validation, researchers can establish accurate lists of differentially methylated regions for a phenotype of interest.

  • Biological interpretation of differential DNA methylation is aided by computational tools for data exploration and enrichment analysis.

  • Large community projects are currently generating reference epigenome maps for many different cell types; the interpretation of these maps will require a comprehensive effort in functional epigenomics.

Abstract

DNA methylation is an epigenetic mark that has suspected regulatory roles in a broad range of biological processes and diseases. The technology is now available for studying DNA methylation genome-wide, at a high resolution and in a large number of samples. This Review discusses relevant concepts, computational methods and software tools for analysing and interpreting DNA methylation data. It focuses not only on the bioinformatic challenges of large epigenome-mapping projects and epigenome-wide association studies but also highlights software tools that make genome-wide DNA methylation mapping more accessible for laboratories with limited bioinformatics experience.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Workflow for analysing and interpreting DNA methylation data.
Figure 2: Two alternative strategies for bisulphite alignment.
Figure 3: Effective identification of differentially methylated regions in a highly annotated genome.

References

  1. 1

    Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6–21 (2002).

  2. 2

    Ehrlich, M. et al. Amount and distribution of 5-methylcytosine in human DNA from different types of tissues of cells. Nucleic Acids Res. 10, 2709–2721 (1982).

  3. 3

    Jones, P. A. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nature Rev. Genet. 13, 484–492 (2012).

  4. 4

    Barlow, D. P. Genomic imprinting: a mammalian epigenetic discovery model. Annu. Rev. Genet. 45, 379–403 (2011).

  5. 5

    Bestor, T. H. The host defence function of genomic methylation patterns. Novartis Found. Symp. 214, 187–199 (1998).

  6. 6

    Meissner, A. Epigenetic modifications in pluripotent and differentiated cells. Nature Biotech. 28, 1079–1088 (2010).

  7. 7

    Reik, W. Stability and flexibility of epigenetic gene regulation in mammalian development. Nature 447, 425–432 (2007).

  8. 8

    Martin, M. & Herceg, Z. From hepatitis to hepatocellular carcinoma: a proposed model for cross-talk between inflammation and epigenetic mechanisms. Genome Med. 4, 8 (2012).

  9. 9

    Baylin, S. B. & Jones, P. A. A decade of exploring the cancer epigenome — biological and translational implications. Nature Rev. Cancer 11, 726–734 (2011).

  10. 10

    Feinberg, A. P. Phenotypic plasticity and the epigenetics of human disease. Nature 447, 433–440 (2007).

  11. 11

    Walker, C. L. & Ho, S. M. Developmental reprogramming of cancer susceptibility. Nature Rev. Cancer 12, 479–486 (2012).

  12. 12

    Laird, P. W. The power and the promise of DNA methylation markers. Nature Rev. Cancer 3, 253–266 (2003).

  13. 13

    Bock, C. Epigenetic biomarker development. Epigenomics 1, 99–110 (2009).

  14. 14

    Laird, P. W. Principles and challenges of genome-wide DNA methylation analysis. Nature Rev. Genet. 11, 191–203 (2010).

  15. 15

    Bock, C. & Lengauer, T. Computational epigenetics. Bioinformatics 24, 1–10 (2008).

  16. 16

    Satterlee, J. S., Schübeler, D. & Ng, H. H. Tackling the epigenome: challenges and opportunities for collaboration. Nature Biotech. 28, 1039–1044 (2010).

  17. 17

    Foley, D. L. et al. Prospects for epigenetic epidemiology. Am. J. Epidemiol. 169, 389–400 (2009).

  18. 18

    Rakyan, V. K., Down, T. A., Balding, D. J. & Beck, S. Epigenome-wide association studies for common human diseases. Nature Rev. Genet. 12, 529–541 (2011). This Review describes the planning and execution of an EWAS for common diseases.

  19. 19

    Robinson, M. D., Statham, A. L., Speed, T. P. & Clark, S. J. Protocol matters: which methylome are you actually studying? Epigenomics 2, 587–598 (2010).

  20. 20

    Beck, S. Taking the measure of the methylome. Nature Biotech. 28, 1026–1028 (2010).

  21. 21

    Xi, Y. & Li, W. BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics 10, 232 (2009).

  22. 22

    Wu, T. D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010).

  23. 23

    Frith, M. C., Mori, R. & Asai, K. A mostly traditional approach improves alignment of bisulfite-converted DNA. Nucleic Acids Res. 40, e100 (2012).

  24. 24

    Coarfa, C. et al. Pash 3.0: a versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing. BMC Bioinformatics 11, 572 (2011).

  25. 25

    Smith, A. D. et al. Updates to the RMAP short-read mapping software. Bioinformatics 25, 2841–2842 (2009).

  26. 26

    Xi, Y. et al. RRBSMAP: a fast, accurate and user-friendly alignment tool for reduced representation bisulfite sequencing. Bioinformatics 28, 430–432 (2012).

  27. 27

    Otto, C., Stadler, P. F. & Hoffmann, S. Fast and sensitive mapping of bisulfite-treated sequencing data. Bioinformatics 28, 1698–1704 (2012).

  28. 28

    Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).

  29. 29

    Harris, E. Y., Ponts, N., Levchuk, A., Roch, K. L. & Lonardi, S. BRAT: bisulfite-treated reads analysis tool. Bioinformatics 26, 572–573 (2010).

  30. 30

    Harris, E. Y., Ponts, N., Le Roch, K. G. & Lonardi, S. BRAT-BW: efficient and accurate mapping of bisulfite-treated reads. Bioinformatics 28, 1795–1796 (2012).

  31. 31

    Chen, P. Y., Cokus, S. J. & Pellegrini, M. BS Seeker: precise mapping for bisulfite sequencing. BMC Bioinformatics 11, 203 (2010).

  32. 32

    Pedersen, B., Hsieh, T. F., Ibarra, C. & Fischer, R. L. MethylCoder: software pipeline for bisulfite-treated sequences. Bioinformatics 27, 2435–2436 (2011).

  33. 33

    Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

  34. 34

    Nielsen, R., Paul, J. S., Albrechtsen, A. & Song, Y. S. Genotype and SNP calling from next-generation sequencing data. Nature Rev. Genet. 12, 443–451 (2011).

  35. 35

    Liu, Y., Siegmund, K. D., Laird, P. W. & Berman, B. P. Bis-SNP: combined DNA methylation and SNP calling for Bisulfite-seq data. Genome Biol. 13, R61 (2012).

  36. 36

    Miller, C. A., Hampton, O., Coarfa, C. & Milosavljevic, A. ReadDepth: a parallel R package for detecting copy number alterations from short sequencing reads. PLoS ONE 6, e16327 (2011).

  37. 37

    Hansen, K. D. et al. Increased methylation variation in epigenetic domains across cancer types. Nature Genet. 43, 768–775 (2011). This reference describes a comprehensive and well-documented analysis of a cancer-specific DNA methylation data set.

  38. 38

    Krueger, F., Kreck, B., Franke, A. & Andrews, S. R. DNA methylome analysis using short bisulfite sequencing data. Nature Methods 9, 145–151 (2012).

  39. 39

    Chung, C. A. High-throughput sequencing of the methylome using two-base encoding. Methods Mol. Biol. 910, 71–86 (2012).

  40. 40

    Kreck, B. et al. B-SOLANA: an approach for the analysis of two-base encoding bisulfite sequencing data. Bioinformatics 28, 428–429 (2012).

  41. 41

    Ramsahoye, B. H. et al. Non-CpG methylation is prevalent in embryonic stem cells and may be mediated by DNA methyltransferase 3a. Proc. Natl Acad. Sci. USA 97, 5237–5242 (2000).

  42. 42

    Ziller, M. J. et al. Genomic distribution and inter-sample variation of non-CpG methylation across human cell types. PLoS Genet. 7, e1002389 (2011).

  43. 43

    Bibikova, M. et al. High density DNA methylation array with single CpG site resolution. Genomics 98, 288–295 (2011).

  44. 44

    Dedeurwaerder, S. et al. Evaluation of the Infinium Methylation 450K technology. Epigenomics 3, 771–784 (2011). This study carried out an independent empirical evaluation of the Illumina Infinium 450k assay.

  45. 45

    Makismovic, J., Gordon, L. & Oshlack, A. SWAN: Subset-quantile Within Array Normalization for Illumina Infinium HumanMethylation450 BeadChips. Genome Biol. 13, R44 (2012).

  46. 46

    Touleimat, N. & Tost, J. Complete pipeline for Infinium Human Methylation 450K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation. Epigenomics 4, 325–341 (2012).

  47. 47

    Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).

  48. 48

    Wang, D. et al. Comparison of different normalization assumptions for analyses of DNA methylation data from the cancer genome. Gene 506, 36–42 (2012).

  49. 49

    Du, P. et al. Comparison of beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics 11, 587 (2010).

  50. 50

    Teschendorff, A. E., Zhuang, J. & Widschwendter, M. Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies. Bioinformatics 27, 1496–1505 (2011).

  51. 51

    Leek, J. T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Rev. Genet. 11, 733–739 (2010). This Perspectives article highlights the prevalence of batch effects in genomic data and suggests ways of addressing this problem.

  52. 52

    Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).

  53. 53

    Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The SVA package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).

  54. 54

    Chen, Y. A. et al. Sequence overlap between autosomal and sex-linked probes on the Illumina HumanMethylation27 microarray. Genomics 97, 214–222 (2011).

  55. 55

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

  56. 56

    Down, T. A. et al. A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nature Biotech. 26, 779–785 (2008).

  57. 57

    Pelizzola, M. et al. MEDME: an experimental and analytical methodology for the estimation of DNA methylation levels based on microarray derived MeDIP-enrichment. Genome Res. 18, 1652–1659 (2008).

  58. 58

    Chavez, L. et al. Computational analysis of genome-wide DNA methylation during the differentiation of human embryonic stem cells along the endodermal lineage. Genome Res. 20, 1441–1450 (2010).

  59. 59

    Huang, J. et al. MeQA: a pipeline for MeDIP-seq data quality assessment and analysis. Bioinformatics 28, 587–588 (2012).

  60. 60

    Wilson, G. et al. Resources for methylome analysis suitable for gene knockout studies of potential epigenome modifiers. GigaScience 1, 3 (2012).

  61. 61

    Statham, A. L. et al. Repitools: an R package for the analysis of enrichment-based epigenomic data. Bioinformatics 26, 1662–1663 (2010).

  62. 62

    Singer, M. et al. MetMap enables genome-scale Methyltyping for determining methylation states in populations. PLoS Comput. Biol. 6, e1000888 (2010).

  63. 63

    Jing, Q., McLellan, A., Greally, J. M. & Suzuki, M. Automated computational analysis of genome-wide DNA methylation profiling data from HELP-tagging assays. Methods Mol. Biol. 815, 79–87 (2012).

  64. 64

    Bock, C. et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nature Biotech. 28, 1106–1114 (2010).

  65. 65

    Harris, R. A. et al. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications. Nature Biotech. 28, 1097–1105 (2010).

  66. 66

    Robinson, M. D. et al. Evaluation of affinity-based genome-wide DNA methylation data: effects of CpG density, amplification bias, and copy number variation. Genome Res. 20, 1719–1729 (2010). References 64, 65 and 66 carried out an empirical benchmarking of widely used methods for genome-wide DNA methylation mapping.

  67. 67

    Irizarry, R. A. et al. Comprehensive high-throughput arrays for relative methylation (CHARM). Genome Res. 18, 780–790 (2008).

  68. 68

    Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009).

  69. 69

    Kent, W. J., Zweig, A. S., Barber, G., Hinrichs, A. S. & Karolchik, D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26, 2204–2207 (2010).

  70. 70

    Karolchik, D. et al. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 36, D773–D779 (2008).

  71. 71

    Flicek, P. et al. Ensembl 2008. Nucleic Acids Res. 36, D707–D714 (2008).

  72. 72

    Zhou, X. et al. The Human Epigenome Browser at Washington University. Nature Methods 8, 989–990 (2011). This reference describes a useful Web-based software tool for visualization and graphical analysis of human reference epigenomes.

  73. 73

    Robinson, J. T. et al. Integrative genomics viewer. Nature Biotech. 29, 24–26 (2011).

  74. 74

    Nicol, J. W., Helt, G. A., Blanchard, S. G. Jr, Raja, A. & Loraine, A. E. The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets. Bioinformatics 25, 2730–2731 (2009).

  75. 75

    Shearstone, J. R. et al. Global DNA demethylation during mouse erythropoiesis in vivo. Science 334, 799–802 (2011).

  76. 76

    Smiraglia, D. J. et al. Excessive CpG island hypermethylation in cancer cell lines versus primary human malignancies. Hum. Mol. Genet. 10, 1413–1419 (2001).

  77. 77

    Anders, S. Visualization of genomic data with the Hilbert curve. Bioinformatics 25, 1231–1235 (2009).

  78. 78

    Bock, C. et al. Reference maps of human ES and iPS cell variation enable high-throughput characterization of pluripotent cell lines. Cell 144, 439–452 (2011).

  79. 79

    Bock, C. et al. DNA methylation dynamics during in vivo differentiation of blood and skin stem cells. Mol. Cell 47, 633–647 (2012).

  80. 80

    Figueroa, M. E. et al. Leukemic IDH1 and IDH2 mutations result in a hypermethylation phenotype, disrupt TET2 function, and impair hematopoietic differentiation. Cancer Cell 18, 553–567 (2010).

  81. 81

    Noushmehr, H. et al. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell 17, 510–522 (2010).

  82. 82

    Houseman, E. A. et al. Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of β distributions. BMC Bioinformatics 9, 365 (2008).

  83. 83

    Marjoram, P., Chang, J., Laird, P. W. & Siegmund, K. D. Cluster analysis for DNA methylation profiles having a detection threshold. BMC Bioinformatics 7, 361 (2006).

  84. 84

    Siegmund, K. D., Laird, P. W. & Laird-Offringa, I. A. A comparison of cluster analysis methods using DNA methylation data. Bioinformatics 20, 1896–1904 (2004).

  85. 85

    Xu, J. et al. Pioneer factor interactions and unmethylated CpG dinucleotides mark silent tissue-specific enhancers in embryonic stem cells. Proc. Natl Acad. Sci. USA 104, 12377–12382 (2007).

  86. 86

    Raval, A. et al. Downregulation of death-associated protein kinase 1 (DAPK1) in chronic lymphocytic leukemia. Cell 129, 879–890 (2007).

  87. 87

    Moser, D. et al. Functional analysis of a potassium-chloride co-transporter 3 (SLC12A6) promoter polymorphism leading to an additional DNA methylation site. Neuropsychopharmacology 34, 458–467 (2008).

  88. 88

    Deaton, A. M. & Bird, A. CpG islands and the regulation of transcription. Genes Dev. 25, 1010–1022 (2011).

  89. 89

    Wang, D. et al. IMA: an R package for high-throughput analysis of Illumina's 450K Infinium methylation data. Bioinformatics 28, 729–730 (2012).

  90. 90

    Wang, S. Method to detect differentially methylated loci with case-control designs using Illumina arrays. Genet. Epidemiol. 35, 686–694 (2011).

  91. 91

    Zhang, Y. et al. QDMR: a quantitative method for identification of differentially methylated regions by entropy. Nucleic Acids Res. 39, e58 (2011).

  92. 92

    Zhuang, J., Widschwendter, M. & Teschendorff, A. E. A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform. BMC Bioinformatics 13, 59 (2012).

  93. 93

    Chen, Z., Liu, Q. & Nadarajah, S. A new statistical approach to detecting differentially methylated loci for case control Illumina array methylation data. Bioinformatics 28, 1109–1113 (2012).

  94. 94

    Poage, G. M. et al. Identification of an epigenetic profile classifier that is associated with survival in head and neck cancer. Cancer Res. 72, 2728–2737 (2012). This study demonstrates how the aggregation by genomic sequence features can reduce the multiple-testing burden and increase the power of a small and otherwise underpowered EWAS.

  95. 95

    Robinson, M. D. et al. Copy-number-aware differential analysis of quantitative DNA sequencing data. Genome Res. 9 Aug 2012 (doi:10.1101/gr.139055.112).

  96. 96

    Jaffe, A. E. et al. Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int. J. Epidemiol. 41, 200–209 (2012). This study describes a flexible workflow for identifying DMRs in a statistically sound manner.

  97. 97

    Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).

  98. 98

    Kuan, P. F. & Chiang, D. Y. Integrating prior knowledge in multiple testing under dependence with applications to detecting differential DNA methylation. Biometrics 19 Jan 2012 (doi:10.1111/j.1541-0420.2011.01730.x).

  99. 99

    Ji, H. & Liu, X. S. Analyzing 'omics data using hierarchical models. Nature Biotech. 28, 337–340 (2010).

  100. 100

    Lugthart, S. et al. Aberrant DNA hypermethylation signature in acute myeloid leukemia directed by EVI1. Blood 117, 234–241 (2011).

  101. 101

    Barfield, R. T., Kilaru, V., Smith, A. K. & Conneely, K. N. CpGassoc: an R function for analysis of DNA methylation microarray data. Bioinformatics 28, 1280–1281 (2012).

  102. 102

    Kilaru, V., Barfield, R. T., Schroeder, J. W., Smith, A. K. & Conneely, K. N. MethLAB: a graphical user interface package for the analysis of array-based DNA methylation data. Epigenetics 7, 225–229 (2012).

  103. 103

    Kristensen, L. S. & Hansen, L. L. PCR-based methods for detecting single-locus DNA methylation biomarkers in cancer diagnostics, prognostics, and response to treatment. Clin. Chem. 55, 1471–1483 (2009).

  104. 104

    Sepulveda, A. R. et al. CpG methylation analysis-current status of clinical assays and potential applications in molecular diagnostics: a report of the Association for Molecular Pathology. J. Mol. Diagn. 11, 266–278 (2009).

  105. 105

    Schüffler, P., Mikeska, T., Waha, A., Lengauer, T. & Bock, C. MethMarker: user-friendly design and optimization of gene-specific DNA methylation assays. Genome Biol. 10, R105 (2009).

  106. 106

    Thompson, R. F., Suzuki, M., Lau, K. W. & Greally, J. M. A pipeline for the quantitative analysis of CG dinucleotide methylation using mass spectrometry. Bioinformatics 25, 2164–2170 (2009).

  107. 107

    Srivastava, G. P., Guo, J., Shi, H. & Xu, D. PRIMEGENS-v2: genome-wide primer design for analyzing DNA methylation patterns of CpG islands. Bioinformatics 24, 1837–1842 (2008).

  108. 108

    Lutsik, P. et al. BiQ Analyzer HT: locus-specific analysis of DNA methylation by high-throughput bisulfite sequencing. Nucleic Acids Res. 39, W551–W556 (2011).

  109. 109

    Potapova, A. et al. Systematic cross-validation of 454 sequencing and pyrosequencing for the exact quantification of DNA methylation patterns with single CpG resolution. BMC Biotechnol. 11, 6 (2011).

  110. 110

    Halachev, K., Bast, H., Albrecht, F., Lengauer, T. & Bock, C. EpiExplorer: live exploration and global analysis of large epigenomic datasets. Genome Biol. (in the press). This reference describes a Web-based software tool for interactive exploration and biological hypothesis generation based on epigenome data.

  111. 111

    Goecks, J., Nekrutenko, A. & Taylor, J. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010).

  112. 112

    Sandve, G. K. et al. The Genomic HyperBrowser: inferential genomics at the sequence level. Genome Biol. 11, R121 (2010).

  113. 113

    Nam, D. & Kim, S. Y. Gene-set approach for expression pattern analysis. Brief Bioinform. 9, 189–197 (2008).

  114. 114

    Marsit, C. J. et al. DNA methylation array analysis identifies profiles of blood-derived DNA methylation associated with bladder cancer. J. Clin. Oncol. 29, 1133–1139 (2011).

  115. 115

    McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nature Biotech. 28, 495–501 (2010).

  116. 116

    Bock, C., Halachev, K., Büch, J. & Lengauer, T. EpiGRAPH: user-friendly software for statistical analysis and prediction of (epi-) genomic data. Genome Biol. 10, R14 (2009).

  117. 117

    Hackenberg, M. & Matthiesen, R. Annotation-Modules: a tool for finding significant combinations of multisource annotations for gene lists. Bioinformatics 24, 1386–1393 (2008).

  118. 118

    Liu, T. et al. Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol. 12, R83 (2011).

  119. 119

    Irizarry, R. A. et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nature Genet. 41, 178–186 (2009).

  120. 120

    Bock, C., Walter, J., Paulsen, M. & Lengauer, T. CpG island mapping by epigenome prediction. PLoS Comput. Biol. 3, e110 (2007).

  121. 121

    Bjornsson, H. T. et al. Intra-individual change over time in DNA methylation with familial clustering. JAMA 299, 2877–2883 (2008).

  122. 122

    Bock, C., Walter, J., Paulsen, M. & Lengauer, T. Inter-individual variation of DNA methylation and its implications for large-scale epigenome mapping. Nucleic Acids Res. 36, e55 (2008).

  123. 123

    Gertz, J. et al. Analysis of DNA methylation in a three-generation family reveals widespread genetic influence on epigenetic regulation. PLoS Genet. 7, e1002228 (2011).

  124. 124

    Heijmans, B. T., Kremer, D., Tobi, E. W., Boomsma, D. I. & Slagboom, P. E. Heritable rather than age-related environmental and stochastic factors dominate variation in DNA methylation of the human IGF2/H19 locus. Hum. Mol. Genet. 16, 547–554 (2007).

  125. 125

    Xie, H. et al. Genome-wide quantitative assessment of variation in DNA methylation patterns. Nucleic Acids Res. 39, 4099–4108 (2011).

  126. 126

    Bell, J. T. et al. DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol. 12, R10 (2011).

  127. 127

    Gibbs, J. R. et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet. 6, e1000952 (2010).

  128. 128

    Zhang, D. et al. Genetic control of individual differences in gene-specific methylation in human brain. Am. J. Hum. Genet. 86, 411–419 (2010). References 125, 126 and 127 describe the systematic identification of genetic variants that are associated with DNA methylation levels across the genome.

  129. 129

    Ji, H. et al. Comprehensive methylome map of lineage commitment from haematopoietic progenitors. Nature 467, 338–342 (2010).

  130. 130

    Houseman, E. A. et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics 13, 86 (2012). This reference describes a computational method for inferring the cellular composition of heterogeneous tissues from aggregate DNA methylation data.

  131. 131

    Jaffe, A. E., Feinberg, A. P., Irizarry, R. A. & Leek, J. T. Significance analysis and statistical dissection of variably methylated regions. Biostatistics 13, 166–178 (2012).

  132. 132

    Teschendorff, A. E. & Widschwendter, M. Differential variability improves the identification of cancer risk markers in DNA methylation studies profiling precursor cancer lesions. Bioinformatics 28, 1487–1494 (2012).

  133. 133

    Feinberg, A. P. & Irizarry, R. A. Evolution in health and medicine Sackler colloquium: stochastic epigenetic variation as a driving force of development, evolutionary adaptation, and disease. Proc. Natl Acad. Sci. USA 107, S1757–S1764 (2010).

  134. 134

    Pujadas, E. & Feinberg, A. P. Regulated noise in the epigenetic landscape of development and disease. Cell 148, 1123–1131 (2012).

  135. 135

    Xie, W. et al. Base-resolution analyses of sequence and parent-of-origin dependent DNA methylation in the mouse genome. Cell 148, 816–831 (2012).

  136. 136

    Fang, F. et al. Genomic landscape of human allele-specific DNA methylation. Proc. Natl Acad. Sci. USA 109, 7332–7337 (2012).

  137. 137

    Peng, Q. & Ecker, J. R. Detection of allele-specific methylation through a generalized heterogeneous epigenome model. Bioinformatics 28, i163–i171 (2012).

  138. 138

    Relton, C. L. & Davey Smith, G. Two-step epigenetic Mendelian randomization: a strategy for establishing the causal role of epigenetic processes in pathways to disease. Int. J. Epidemiol. 41, 161–176 (2012).

  139. 139

    Hudson, T. J. et al. International network of cancer genome projects. Nature 464, 993–998 (2010).

  140. 140

    Clarke, J. et al. Continuous base identification for single-molecule nanopore DNA sequencing. Nature Nanotechnol. 4, 265–270 (2009).

  141. 141

    Wu, H. & Zhang, Y. Mechanisms and functions of Tet protein-mediated 5-methylcytosine oxidation. Genes Dev. 25, 2436–2452 (2011).

  142. 142

    Bock, C. & Lengauer, T. Managing drug resistance in cancer: lessons from HIV therapy. Nature Rev. Cancer 12, 494–501 (2012).

  143. 143

    Flusberg, B. A. et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nature Methods 7, 461–465 (2010).

  144. 144

    Booth, M. J. et al. Quantitative sequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution. Science 336, 934–937 (2012).

  145. 145

    Huang, Y. et al. The behaviour of 5-hydroxymethylcytosine in bisulfite sequencing. PLoS ONE 5, e8888 (2010).

  146. 146

    Gu, H. et al. Genome-scale DNA methylation mapping of clinical samples at single-nucleotide resolution. Nature Methods 7, 133–136 (2010).

  147. 147

    Diep, D. et al. Library-free methylation sequencing with bisulfite padlock probes. Nature Methods 9, 270–272 (2012).

  148. 148

    Lee, E. J. et al. Targeted bisulfite sequencing by solution hybrid selection and massively parallel sequencing. Nucleic Acids Res. 39, e127 (2011).

  149. 149

    Bernstein, B. E. et al. The NIH Roadmap Epigenomics Mapping Consortium. Nature Biotech. 28, 1045–1048 (2010).

  150. 150

    Adams, D. et al. BLUEPRINT to decode the epigenetic signature written in blood. Nature Biotech. 30, 224–226 (2012).

  151. 151

    Morton, B. B. A method for combining non-independent, one-sided tests of significance. Biometrics 31, 987–992 (1975).

  152. 152

    Allison, D. B., Cui, X., Page, G. P. & Sabripour, M. Microarray data analysis: from disarray to consolidation and consensus. Nature Rev. Genet. 7, 55–65 (2006).

  153. 153

    Shen-Orr, S. S. et al. Cell type-specific gene expression differences in complex tissues. Nature Methods 7, 287–289 (2010).

  154. 154

    Westra, H. J. et al. MixupMapper: correcting sample mix-ups in genome-wide datasets increases power to detect small genetic effects. Bioinformatics 27, 2104–2111 (2011).

  155. 155

    Pickrell, J. K., Gaffney, D. J., Gilad, Y. & Pritchard, J. K. False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions. Bioinformatics 27, 2144–2146 (2011).

  156. 156

    Zhang, X., Mu, W. & Zhang, W. On the analysis of the Illumina 450k array data: probes ambiguously mapped to the human genome. Front. Genet. 3, 73 (2012).

  157. 157

    Ehrich, M., Zoll, S., Sur, S. & van den Boom, D. A new method for accurate assessment of DNA quality after bisulfite treatment. Nucleic Acids Res. 35, e29 (2007).

  158. 158

    Warnecke, P. M. et al. Identification and resolution of artifacts in bisulfite sequencing. Methods 27, 101–107 (2002).

Download references

Acknowledgements

The author would like to thank S. Beck, M. Esteller, T. Lengauer, A. Meissner, H. Stunnenberg and J. Walter for helpful discussions and all past and present collaborators for sharing their ideas, data and insights.

Author information

Correspondence to Christoph Bock.

Ethics declarations

Competing interests

The author declares no competing financial interests.

Related links

Glossary

Biomarkers

Molecular assays that predict a clinical phenotype, such as disease status or response to a drug.

Reference epigenomes

Publicly available epigenome maps that comprise multiple epigenetic marks for the same cell type (for example, DNA methylation, several histone modifications and non-coding RNA expression).

Epigenome-wide association study

(EWAS). A study design that involves measuring an epigenetic mark in cases and controls to identify disease-associated differences.

Differentially methylated regions

(DMRs). Genomic regions that exhibit statistically significant differences in DNA methylation between sample groups.

Bisulphite

Bisulphite ions (HSO3) selectively deaminate unmethylated but not methylated Cs, giving rise to Us, which are replaced by Ts during subsequent PCR amplification.

Absolute DNA methylation levels

Percentage of methylated alleles for a given C; this value is always binary (0% or 100%) for single alleles but can take any value between 0% and 100% when averaging over many cells.

Sequence complexity

The diversity of the DNA sequence; it can be measured by the information content of the base composition.

Genotype calling

The determination of SNPs and other genetic variants in a given individual.

Genome Analysis Toolkit

(GATK). A widely used software tool for genotype calling based on next-generation sequencing data.

M bias plot

A quality-control diagram that plots mean DNA methylation levels for each position of the bisulphite-sequencing reads. Deviations from a horizontal line indicate biases.

R/Bioconductor

A powerful command-line tool for data processing, statistical analysis and visualization of biological data sets.

β-values

An alternative term for the absolute DNA methylation levels, which stems from the observation that the distribution of DNA methylation levels across the genome resembles a β-distribution.

M values

Logistically transformed β-values. The transformation mitigates some statistical problems of the β-value (namely, limited value range and strongly bimodal distribution) at the cost of reduced biological interpretability.

Batch effects

Systematic biases in the data that are unrelated to the research question but that arise from undesirable (and often unrecognized) differences in sample handling.

Confounding

A nonrandom relationship between the phenotype of interest and external factors (for example, batch effects or population structure) that can give rise to spurious associations.

Enrichment scores

The relative enrichment of DNA fragments from a given genomic region compared to a control experiment (such as sequencing of unenriched DNA).

Tiling map

Segmentation of the genome into tiling windows of a fixed and typically small size (for example, 100 bases).

Logistic regression model

A type of regression model used for modelling the relationship between a binary outcome variable and one or more predictor variables.

CpG methylation table

A data table that contains DNA methylation levels (and, optionally, confidence scores) for each assayed CpG in each sample after normalization and quality control.

False discovery rate

(FDR). A measure of significance that corrects for a large number of statistical tests being carried out on the same data set.

Effect size

A measure for the strength of association between two variables that provides important complementary information to P values and false discovery rates.

Bisulphite pyrosequencing

A locus-specific method for accurate quantification of DNA methylation levels at a small number of CpGs in many samples.

Combined bisulphite restriction analysis

(COBRA). A method that combines bisulphite treatment with sequence-specific restriction enzymes for locus-specific analysis of DNA methylation.

Methylation-specific PCR

(MSP). A method for highly sensitive detection of locus-specific DNA methylation using PCR amplification of bisulphite-converted DNA.

MethyLight

A variant of methylation-specific PCR that is highly quantitative and practical for measuring locus-specific DNA methylation levels in many samples.

EpiTYPER

An assay for measuring locus-specific DNA methylation in many samples on the basis of a combination of bisulphite treatment and mass spectrometry.

Cross-validation

A method for estimating the predictive power of a differentially methylated region or biomarker by carrying out training and validation on different portions of the same data set.

Quantitative trait loci

(QTLs). Genomic regions that control a phenotype of interest, such as the DNA methylation levels of another genomic region.

Mendelian randomization

Epidemiological method for assessing the causal role of an exposure for a phenotype of interest, using genetic variants that are affected neither by the exposure nor by the phenotype.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bock, C. Analysing and interpreting DNA methylation data. Nat Rev Genet 13, 705–719 (2012) doi:10.1038/nrg3273

Download citation

Further reading