Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Statistical and integrative system-level analysis of DNA methylation data

Key Points

  • Cell-type heterogeneity can be a major source of confounding and reverse causation in epigenome-wide association studies (EWAS). Adjustment for cell-type composition is therefore critical for an improved interpretation and understanding of EWAS.

  • For a given study, the best choice of cell-type deconvolution algorithm depends not only on the tissue and phenotype of interest but also on the presence of other confounders and the desired output.

  • Most variation in DNA methylation (DNAm) is driven by genetic factors and cell-type heterogeneity, with corresponding features — methylation quantitative trait loci (mQTLs) and cell-type-specific differentially methylated cytosines (DMCs) — readily identifiable using linear modelling.

  • Identification and interpretation of DNAm changes that accrue with age or exposure to environmental disease risk factors may benefit from differential variance statistics.

  • Analysing patterns of covariation in DNAm at regulatory elements can help to identify disrupted regulatory networks and gene modules in disease.

  • The inverse association between DNAm at regulatory elements and transcription factor binding can be exploited to elucidate the functional role of non-coding genome-wide association study (GWAS) single-nucleotide polymorphisms (SNPs) or functional effects caused by exposure to environmental disease risk factors.

  • Mendelian randomization can help to clarify the role of DNAm as a causal mediator between exposure to risk factors and disease.

Abstract

Epigenetics plays a key role in cellular development and function. Alterations to the epigenome are thought to capture and mediate the effects of genetic and environmental risk factors on complex disease. Currently, DNA methylation is the only epigenetic mark that can be measured reliably and genome-wide in large numbers of samples. This Review discusses some of the key statistical challenges and algorithms associated with drawing inferences from DNA methylation data, including cell-type heterogeneity, feature selection, reverse causation and system-level analyses that require integration with other data types such as gene expression, genotype, transcription factor binding and other epigenetic information.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: DNA methylation analysis of cell-type heterogeneity.
Figure 2: Variability, differential means and differential variability in DNA methylation data.
Figure 3: Examples of system-level integrative analysis of DNA methylation data.

References

  1. 1

    Deaton, A. M. & Bird, A. CpG islands and the regulation of transcription. Genes Dev. 25, 1010–1022 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  2. 2

    Ahuja, N., Li, Q., Mohan, A. L., Baylin, S. B. & Issa, J. P. Aging and DNA methylation in colorectal mucosa and cancer. Cancer Res. 58, 5489–5494 (1998).

    CAS  PubMed  Google Scholar 

  3. 3

    Fraga, M. F. et al. Epigenetic differences arise during the lifetime of monozygotic twins. Proc. Natl Acad. Sci. USA 102, 10604–10609 (2005).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  4. 4

    Teschendorff, A. E. et al. Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Res. 20, 440–446 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  5. 5

    Rakyan, V. K. et al. Human aging-associated DNA hypermethylation occurs preferentially at bivalent chromatin domains. Genome Res. 20, 434–439 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. 6

    Maegawa, S. et al. Widespread and tissue specific age-related DNA methylation changes in mice. Genome Res. 20, 332–340 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  7. 7

    Ahuja, N. & Issa, J. P. Aging, methylation and cancer. Histol. Histopathol. 15, 835–842 (2000).

    CAS  PubMed  Google Scholar 

  8. 8

    Horvath, S. DNA methylation age of human tissues and cell types. Genome Biol. 14, R115 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  9. 9

    Petronis, A. Epigenetics as a unifying principle in the aetiology of complex traits and diseases. Nature 465, 721–727 (2010).

    CAS  Article  PubMed  Google Scholar 

  10. 10

    Feinberg, A. P., Ohlsson, R. & Henikoff, S. The epigenetic progenitor origin of human cancer. Nat. Rev. Genet. 7, 21–33 (2006).

    CAS  Article  PubMed  Google Scholar 

  11. 11

    Beck, S. Taking the measure of the methylome. Nat. Biotechnol. 28, 1026–1028 (2010).

    CAS  Article  PubMed  Google Scholar 

  12. 12

    Sandoval, J. et al. Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics 6, 692–702 (2011).

    CAS  Article  PubMed  Google Scholar 

  13. 13

    Moran, S., Arribas, C. & Esteller, M. Validation of a DNA methylation microarray for 850,000 CpG sites of the human genome enriched in enhancer sequences. Epigenomics 8, 389–399 (2016).

    CAS  Article  PubMed  Google Scholar 

  14. 14

    Stunnenberg, H. G., The International Human Epigenome Consortium & Hirst, M. The International Human Epigenome Consortium: a blueprint for scientific collaboration and discovery. Cell 167, 1145–1149 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  15. 15

    Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  16. 16

    Guo, S. et al. Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA. Nat. Genet. 49, 635–642 (2017). This paper demonstrates how DNAm patterns detected from cell-free DNA in blood plasma can be used to detect cancer and its tissue of origin.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  17. 17

    Gao, X., Jia, M., Zhang, Y., Breitling, L. P. & Brenner, H. DNA methylation changes of whole blood cells in response to active smoking exposure in adults: a systematic review of DNA methylation studies. Clin. Epigenet. 7, 113 (2015).

    Article  CAS  Google Scholar 

  18. 18

    Wahl, S. et al. Epigenome-wide association study of body mass index, and the adverse outcomes of adiposity. Nature 541, 81–86 (2017).

    CAS  Article  PubMed  Google Scholar 

  19. 19

    Joehanes, R. et al. Epigenetic signatures of cigarette smoking. Circ. Cardiovasc. Genet. 9, 436–447 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  20. 20

    Zwamborn, R. A. et al. Prolonged high-fat diet induces gradual and fat depot-specific DNA methylation changes in adult mice. Sci. Rep. 7, 43261 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  21. 21

    Bock, C. Analysing and interpreting DNA methylation data. Nat. Genet. 13, 705–719 (2012).

    CAS  Article  Google Scholar 

  22. 22

    Morris, T. J. & Beck, S. Analysis pipelines and packages for Infinium HumanMethylation450 BeadChip (450k) data. Methods 72, 3–8 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  23. 23

    Assenov, Y. et al. Comprehensive analysis of DNA methylation data with RnBeads. Nat. Methods 11, 1138–1140 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  24. 24

    Albrecht, F., List, M., Bock, C. & Lengauer, T. DeepBlueR: large-scale epigenomic analysis in R. Bioinformatics 33, 2063–2064 (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. 25

    Liang, L. et al. An epigenome-wide association study of total serum immunoglobulin E concentration. Nature 520, 670–674 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. 26

    Gentles, A. J. et al. The prognostic landscape of genes and infiltrating immune cells across human cancers. Nat. Med. 21, 938–945 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  27. 27

    Teschendorff, A. E. et al. An epigenetic signature in peripheral blood predicts active ovarian cancer. PLoS ONE 4, e8274 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. 28

    Langevin, S. M. et al. Leukocyte-adjusted epigenome-wide association studies of blood from solid tumor patients. Epigenetics 9, 884–895 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  29. 29

    Koestler, D. C. et al. Peripheral blood immune cell methylation profiles are associated with nonhematopoietic cancers. Cancer Epidemiol. Biomarkers Prev. 21, 1293–1302 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  30. 30

    Liu, Y. et al. Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat. Biotechnol. 31, 142–147 (2013). This paper presents an EWAS demonstrating the dramatic impact adjusting for cell-type heterogeneity can have on the number of discoveries.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  31. 31

    Houseman, E. A. et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformat. 13, 86 (2012). This paper presents a reference-based cell-type deconvolution algorithm for EWAS.

    Article  Google Scholar 

  32. 32

    Houseman, E. A., Molitor, J. & Marsit, C. J. Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics 30, 1431–1439 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  33. 33

    Houseman, E. A. et al. Reference-free deconvolution of DNA methylation data and mediation by cell composition effects. BMC Bioinformat. 17, 259 (2016).

    Article  CAS  Google Scholar 

  34. 34

    Onuchic, V. et al. Epigenomic deconvolution of breast tumors reveals metabolic coupling between constituent cell types. Cell Rep. 17, 2075–2086 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  35. 35

    Koestler, D. C. et al. Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL). BMC Bioinformat. 17, 120 (2016).

    Article  CAS  Google Scholar 

  36. 36

    Chatfield, C. Model uncertainty, data mining and statistical inference. J. R. Statist. Soc. A 158, 419–466 (1995).

    Article  Google Scholar 

  37. 37

    Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457 (2015).

    CAS  Article  Google Scholar 

  38. 38

    Accomando, W. P., Wiencke, J. K., Houseman, E. A., Nelson, H. H. & Kelsey, K. T. Quantitative reconstruction of leukocyte subsets using DNA methylation. Genome Biol. 15, R50 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. 39

    Teschendorff, A. E., Breeze, C. E., Zheng, S. C. & Beck, S. A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies. BMC Bioinformat. 18, 105 (2017).

    Article  CAS  Google Scholar 

  40. 40

    Kang, S. et al. CancerLocator: non-invasive cancer diagnosis and tissue-of-origin prediction using methylation profiles of cell-free DNA. Genome Biol. 18, 53 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. 41

    Zheng, X. et al. MethylPurify: tumor purity deconvolution and differential methylation detection from single tumor DNA methylomes. Genome Biol. 15, 419 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  42. 42

    Zhang, N. et al. Predicting tumor purity from methylation microarray data. Bioinformatics 31, 3401–3405 (2015).

    CAS  Article  PubMed  Google Scholar 

  43. 43

    Zheng, X., Zhang, N., Wu, H. J. & Wu, H. Estimating and accounting for tumor purity in the analysis of DNA methylation data from cancer studies. Genome Biol. 18, 17 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. 44

    Leek, J. T. & Storey, J. D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, 1724–1735 (2007). This paper presents SVA, a powerful framework for feature selection in the presence of confounders, including cell-type composition and unknown factors.

    CAS  Article  Google Scholar 

  45. 45

    Leek, J. T. & Storey, J. D. A general framework for multiple testing dependence. Proc. Natl Acad. Sci. USA 105, 18718–18723 (2008).

    CAS  Article  PubMed  Google Scholar 

  46. 46

    Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  47. 47

    McGregor, K. et al. An evaluation of methods correcting for cell-type heterogeneity in DNA methylation studies. Genome Biol. 17, 84 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. 48

    Zheng, S. C. et al. Correcting for cell-type heterogeneity in epigenome-wide association studies: revisiting previous analyses. Nat. Methods 14, 216–217 (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  49. 49

    Kaushal, A. et al. Comparison of different cell type correction methods for genome-scale epigenetics studies. BMC Bioinformat. 18, 216 (2017).

    Article  CAS  Google Scholar 

  50. 50

    Teschendorff, A. E., Zhuang, J. & Widschwendter, M. Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies. Bioinformatics 27, 1496–1505 (2011).

    CAS  Article  PubMed  Google Scholar 

  51. 51

    Teschendorff, A. E. et al. Correlation of smoking-associated DNA methylation changes in buccal cells with DNA methylation changes in epithelial cancer. JAMA Oncol. 1, 476–485 (2015).

    Article  PubMed  Google Scholar 

  52. 52

    Hyvärinen, A. & Oja, E. Independent component analysis: algorithms and applications. Neural Netw. 13, 411–430 (2000).

    Article  PubMed  Google Scholar 

  53. 53

    Zou, J., Lippert, C., Heckerman, D., Aryee, M. & Listgarten, J. Epigenome-wide association studies without the need for cell-type composition. Nat. Methods 11, 309–311 (2014).

    CAS  Article  PubMed  Google Scholar 

  54. 54

    Rahmani, E. et al. Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies. Nat. Methods 13, 443–445 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  55. 55

    Teschendorff, A. E. et al. DNA methylation outliers in normal breast tissue identify field defects that are enriched in cancer. Nat. Commun. 7, 10478 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  56. 56

    Gagnon-Bartsch, J. A. & Speed, T. P. Using control genes to correct for unwanted variation in microarray data. Biostatistics 13, 539–552 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  57. 57

    Jaffe, A. E. & Irizarry, R. A. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol. 15, R31 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  58. 58

    Lutsik, P. et al. MeDeCom: discovery and quantification of latent components of heterogeneous methylomes. Genome Biol. 18, 55 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. 59

    Bakulski, K. M. et al. DNA methylation of cord blood cell types: Applications for mixed cell birth studies. Epigenetics 11, 354–362 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  60. 60

    Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  61. 61

    Hattab, M. W. et al. Correcting for cell-type effects in DNA methylation studies: reference-based method outperforms latent variable approaches in empirical studies. Genome Biol. 18, 24 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  62. 62

    Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).

    Article  Google Scholar 

  63. 63

    Landan, G. et al. Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues. Nat. Genet. 44, 1207–1214 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  64. 64

    Singer, Z. S. et al. Dynamic heterogeneity and DNA methylation in embryonic stem cells. Mol. Cell 55, 319–331 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  65. 65

    Busslinger, M. & Tarakhovsky, A. Epigenetic control of immunity. Cold Spring Harb. Perspect. Biol. 6, a019307 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. 66

    Landau, D. A. et al. Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia. Cancer Cell 26, 813–825 (2014). This paper uses WGBS data to estimate epigenetic clonal heterogeneity in cancer and to show that increased epigenetic heterogeneity is associated with a poor clinical outcome.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  67. 67

    Li, S. et al. Distinct evolution and dynamics of epigenetic and genetic heterogeneity in acute myeloid leukemia. Nat. Med. 22, 792–799 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  68. 68

    Teschendorff, A. E. et al. Epigenetic variability in cells of normal cytology is associated with the risk of future morphological transformation. Genome Med. 4, 24 (2012). This paper demonstrates that the risk of an epithelial cancer can be predicted from the DNAm patterns measured in normal cells, years before neoplastic transformation. The detection of DNAm risk markers was only possible using differential variability as a novel feature-selection paradigm in a risk prediction algorithm called EVORA.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  69. 69

    Li, S. et al. Dynamic evolution of clonal epialleles revealed by methclone. Genome Biol. 15, 472 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. 70

    van ' t Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002).

    Article  Google Scholar 

  71. 71

    Du, P. et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformat. 11, 587 (2010).

    CAS  Article  Google Scholar 

  72. 72

    Wang, X., Laird, P. W., Hinoue, T., Groshen, S. & Siegmund, K. D. Non-specific filtering of beta-distributed data. BMC Bioinformat. 15, 199 (2014).

    CAS  Article  Google Scholar 

  73. 73

    Zhuang, J., Widschwendter, M. & Teschendorff, A. E. A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform. BMC Bioinformat. 13, 59 (2012).

    CAS  Article  Google Scholar 

  74. 74

    Dedeurwaerder, S. et al. A comprehensive overview of Infinium HumanMethylation450 data processing. Briefings Bioinformat. 15, 929–941 (2014).

    CAS  Article  Google Scholar 

  75. 75

    Zhou, W., Laird, P. W. & Shen, H. Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes. Nucleic Acids Res. 45, e22 (2017).

    PubMed  Google Scholar 

  76. 76

    van Dongen, J. et al. Genetic and environmental influences interact with age and sex in shaping the human methylome. Nat. Commun. 7, 11115 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  77. 77

    Slieker, R. C. et al. Age-related accrual of methylomic variability is linked to fundamental ageing mechanisms. Genome Biol. 17, 191 (2016). This paper demonstrates the importance of differentially variable DNAm patterns in the context of ageing, linking age-associated DVCs to age-associated transcriptional changes. It provides a novel paradigm for understanding the role of age-associated DNAm changes in disease aetiology.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. 78

    Wettenhall, J. M. & Smyth, G. K. limmaGUI: a graphical user interface for linear modeling of microarray data. Bioinformatics 20, 3705–3706 (2004).

    CAS  Article  PubMed  Google Scholar 

  79. 79

    Smyth, G. K. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, Article3 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  80. 80

    Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. 81

    Eckhardt, F. et al. DNA methylation profiling of human chromosomes 6, 20 and 22. Nat. Genet. 38, 1378–1385 (2006).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  82. 82

    Libertini, E. et al. Saturation analysis for whole-genome bisulfite sequencing data. Nat. Biotechnol. 34, 691–693 (2016).

    CAS  Article  PubMed  Google Scholar 

  83. 83

    Libertini, E. et al. Information recovery from low coverage whole-genome bisulfite sequencing. Nat. Commun. 7, 11306 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  84. 84

    VanderKraats, N. D., Hiken, J. F., Decker, K. F. & Edwards, J. R. Discovering high-resolution patterns of differential DNA methylation that correlate with gene expression changes. Nucleic Acids Res. 41, 6816–6827 (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  85. 85

    Schlosberg, C. E., VanderKraats, N. D. & Edwards, J. R. Modeling complex patterns of differential DNA methylation that associate with gene expression changes. Nucleic Acids Res. 45, 5100–5111 (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  86. 86

    Jaffe, A. E. et al. Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int. J. Epidemiol. 41, 200–209 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  87. 87

    Aryee, M. J. et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30, 1363–1369 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  88. 88

    Irizarry, R. A. et al. Comprehensive high-throughput arrays for relative methylation (CHARM). Genome Res. 18, 780–790 (2008).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  89. 89

    Berman, B. P. et al. Regions of focal DNA hypermethylation and long-range hypomethylation in colorectal cancer coincide with nuclear lamina-associated domains. Nat. Genet. 44, 40–46 (2012).

    CAS  Article  Google Scholar 

  90. 90

    Timp, W. et al. Large hypomethylated blocks as a universal defining epigenetic alteration in human solid tumors. Genome Med. 6, 61 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. 91

    Yuan, T. et al. An integrative multi-scale analysis of the dynamic DNA methylation landscape in aging. PLoS Genet. 11, e1004996 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. 92

    Vandiver, A. R. et al. Age and sun exposure-related widespread genomic blocks of hypomethylation in nonmalignant skin. Genome Biol. 16, 80 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. 93

    Hansen, K. D. et al. Increased methylation variation in epigenetic domains across cancer types. Nature Genet. 43, 768–777 (2011).

    CAS  Article  PubMed  Google Scholar 

  94. 94

    Hansen, K. D. et al. Large-scale hypomethylated blocks associated with Epstein-Barr virus-induced B-cell immortalization. Genome Res. 24, 177–184 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  95. 95

    Peters, T. J. et al. De novo identification of differentially methylated regions in the human genome. Epigenetics Chromatin 8, 6 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. 96

    Pedersen, B. S., Schwartz, D. A., Yang, I. V. & Kechris, K. J. Comb-p: software for combining, analyzing, grouping and correcting spatially correlated P-values. Bioinformatics 28, 2986–2988 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  97. 97

    Snedecor, G. W. & Cochran, W. G. Statistical Methods (Wiley-Blackwell, 1989).

    Google Scholar 

  98. 98

    Teschendorff, A. E. & Widschwendter, M. Differential variability improves the identification of cancer risk markers in DNA methylation studies profiling precursor cancer lesions. Bioinformatics 28, 1487–1494 (2012).

    CAS  Article  PubMed  Google Scholar 

  99. 99

    Tian, L. & Tibshirani, R. Adaptive index models for marker-based risk stratification. Biostatistics 12, 68–86 (2011).

    Article  PubMed  Google Scholar 

  100. 100

    Phipson, B. & Oshlack, A. DiffVar: a new method for detecting differential variability with application to methylation in cancer and aging. Genome Biol. 15, 465 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. 101

    Wahl, S. et al. On the potential of models for location and scale for genome-wide DNA methylation data. BMC Bioinformat. 15, 232 (2014).

    Article  Google Scholar 

  102. 102

    Ahn, S. & Wang, T. A powerful statistical method for identifying differentially methylated markers in complex diseases. Pac. Symp. Biocomput. 2013, 69–79 (2012).

    Google Scholar 

  103. 103

    Teschendorff, A. E., Jones, A. & Widschwendter, M. Stochastic epigenetic outliers can define field defects in cancer. BMC Bioinformat. 17, 178 (2016).

    Article  CAS  Google Scholar 

  104. 104

    Hannum, G. et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol. Cell 49, 359–367 (2013).

    CAS  Article  PubMed  Google Scholar 

  105. 105

    Jaffe, A. E., Feinberg, A. P., Irizarry, R. A. & Leek, J. T. Significance analysis and statistical dissection of variably methylated regions. Biostatistics 13, 166–178 (2012).

    Article  PubMed  Google Scholar 

  106. 106

    Jenkinson, G., Pujadas, E., Goutsias, J. & Feinberg, A. P. Potential energy landscapes identify the information-theoretic nature of the epigenome. Nat. Genet. 49, 719–729 (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  107. 107

    Breeze, C. E. et al. eFORGE: A Tool for Identifying Cell Type-Specific Signal in Epigenomic Data. Cell Rep. 17, 2137–2150 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  108. 108

    Ziller, M. J. et al. Charting a dynamic DNA methylation landscape of the human genome. Nature 500, 477–481 (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  109. 109

    Geeleher, P. et al. Gene-set analysis is severely biased when applied to genome-wide methylation data. Bioinformatics 29, 1851–1857 (2013).

    CAS  Article  PubMed  Google Scholar 

  110. 110

    Phipson, B., Maksimovic, J. & Oshlack, A. missMethyl: an R package for analyzing data from Illumina's HumanMethylation450 platform. Bioinformatics 32, 286–288 (2016).

    CAS  PubMed  Google Scholar 

  111. 111

    West, J., Beck, S., Wang, X. & Teschendorff, A. E. An integrative network algorithm identifies age-associated differential methylation interactome hotspots targeting stem-cell differentiation pathways. Sci. Rep. 3, 1630 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. 112

    Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    CAS  Article  PubMed  Google Scholar 

  113. 113

    Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).

    Article  CAS  Google Scholar 

  114. 114

    Birney, E., Smith, G. D. & Greally, J. M. Epigenome-wide association studies and the interpretation of disease -omics. PLoS Genet. 12, e1006105 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. 115

    Lappalainen, T. & Greally, J. M. Associating cellular epigenetic models with human phenotypes. Nat. Rev. Genet. 18, 441–451 (2017).

    CAS  Article  PubMed  Google Scholar 

  116. 116

    Dekkers, K. F. et al. Blood lipids influence DNA methylation in circulating cells. Genome Biol. 17, 138 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. 117

    Karlic, R., Chung, H. R., Lasserre, J., Vlahovicek, K. & Vingron, M. Histone modification levels are predictive for gene expression. Proc. Natl Acad. Sci. USA 107, 2926–2931 (2010).

    CAS  Article  PubMed  Google Scholar 

  118. 118

    Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  119. 119

    Gaunt, T. R. et al. Systematic identification of genetic influences on methylation across the human life course. Genome Biol. 17, 61 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. 120

    Banovich, N. E. et al. Methylation QTLs are associated with coordinated changes in transcription factor binding, histone modifications, and gene expression levels. PLoS Genet. 10, e1004663 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. 121

    Bell, J. T. et al. DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol. 12, R10 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  122. 122

    Bonder, M. J. et al. Disease variants alter transcription factor levels and methylation of their binding sites. Nat. Genet. 49, 131–138 (2017). This paper demonstrates how genetic variants that affect the activity of a transcription factor in cis are associated in trans with coherent DNAm alteration at its binding sites. This principle provides a new strategy for elucidating the role of non-coding GWAS SNPs.

    CAS  Article  Google Scholar 

  123. 123

    Rahmani, E. et al. Genome-wide methylation data mirror ancestry information. Epigenetics Chromatin 10, 1 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. 124

    Relton, C. L. & Davey Smith, G. Two-step epigenetic Mendelian randomization: a strategy for establishing the causal role of epigenetic processes in pathways to disease. Int. J. Epidemiol. 41, 161–176 (2012). This is paper proposes the use of genotype as a causal anchor to strengthen causal inference in epigenetic studies. It sets out the principle of two-step Mendelian randomization for molecular mediation.

    Article  PubMed  PubMed Central  Google Scholar 

  125. 125

    Richardson, T. G. et al. Mendelian randomization analysis identifies CpG sites as putative mediators for genetic influences on cardiovascular disease risk. Am. J. Hum. Genet. 101, 590–602 (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  126. 126

    Caramaschi, D. et al. Exploring a causal role of DNA methylation in the relationship between maternal vitamin B12 during pregnancy and child's IQ at age 8, cognitive performance and educational attainment: a two-step Mendelian randomization study. Hum. Mol. Genet. 26, 3001–3013 (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  127. 127

    Aran, D., Sabato, S. & Hellman, A. DNA methylation of distal regulatory sites characterizes dysregulation of cancer genes. Genome Biol. 14, R21 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  128. 128

    Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  129. 129

    Whalen, S., Truty, R. M. & Pollard, K. S. Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat. Genet. 48, 488–496 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  130. 130

    Yang, X. et al. Gene body methylation can alter gene expression and is a therapeutic target in cancer. Cancer Cell 26, 577–590 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  131. 131

    Baylin, S. B. DNA methylation and gene silencing in cancer. Nat. Clin. Pract. Oncol. 2 (Suppl. 1), S4–S11 (2005).

    CAS  Article  PubMed  Google Scholar 

  132. 132

    Jones, P. A. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat. Rev. Genet. 13, 484–492 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  133. 133

    Jjingo, D., Conley, A. B., Yi, S. V., Lunyak, V. V. & Jordan, I. K. On the presence and role of human gene-body DNA methylation. Oncotarget 3, 462–474 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  134. 134

    Jiao, Y., Widschwendter, M. & Teschendorff, A. E. A systems-level integrative framework for genome-wide DNA methylation and gene expression data identifies differential gene expression modules under epigenetic control. Bioinformatics 30, 2360–2366 (2014).

    CAS  Article  PubMed  Google Scholar 

  135. 135

    Brenet, F. et al. DNA methylation of the first exon is tightly linked to transcriptional silencing. PLoS ONE 6, e14524 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  136. 136

    Walsh, C. P. & Bestor, T. H. Cytosine methylation and mammalian development. Genes Dev. 13, 26–34 (1999).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  137. 137

    Shen, R., Olshen, A. B. & Ladanyi, M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25, 2906–2912 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  138. 138

    Gao, Y. et al. The integrative epigenomic-transcriptomic landscape of ER positive breast cancer. Clin. Epigenet. 7, 126 (2015).

    Article  CAS  Google Scholar 

  139. 139

    Mo, Q. et al. Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc. Natl Acad. Sci. USA 110, 4245–4250 (2013).

    CAS  Article  PubMed  Google Scholar 

  140. 140

    Maurano, M. T. et al. Role of DNA methylation in modulating transcription factor occupancy. Cell Rep. 12, 1184–1195 (2015).

    CAS  Article  PubMed  Google Scholar 

  141. 141

    Domcke, S. et al. Competition between DNA methylation and transcription factors determines binding of NRF1. Nature 528, 575–579 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  142. 142

    Zhu, H., Wang, G. & Qian, J. Transcription factors as readers and effectors of DNA methylation. Nat. Rev. Genet. 17, 551–565 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  143. 143

    Yin, Y. et al. Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science 356, eaaj2239 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  144. 144

    Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  145. 145

    Stadler, M. B. et al. DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature 480, 490–495 (2011).

    CAS  Article  Google Scholar 

  146. 146

    Guilhamon, P. et al. Meta-analysis of IDH-mutant cancers identifies EBF1 as an interaction partner for TET2. Nat. Commun. 4, 2166 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. 147

    Yao, L., Shen, H., Laird, P. W., Farnham, P. J. & Berman, B. P. Inferring regulatory element landscapes and transcription factor networks from cancer methylomes. Genome Biol. 16, 105 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. 148

    Hovestadt, V. et al. Decoding the regulatory landscape of medulloblastoma using DNA methylation sequencing. Nature 510, 537–541 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  149. 149

    Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    CAS  Article  Google Scholar 

  150. 150

    Rhie, S. K. et al. Identification of activated enhancers and linked transcription factors in breast, prostate, and kidney tumors by tracing enhancer networks using epigenetic traits. Epigenetics Chromatin 9, 50 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  151. 151

    Dhingra, P. et al. Identification of novel prostate cancer drivers using RegNetDriver: a framework for integration of genetic and epigenetic alterations with tissue-specific regulatory network. Genome Biol. 18, 141 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  152. 152

    Jin, F. et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature 503, 290–294 (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  153. 153

    Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  154. 154

    Jones, A. et al. Role of DNA methylation and epigenetic silencing of HAND2 in endometrial cancer development. PLoS Med. 10, e1001551 (2013). This is paper uses a system-level integrative analysis of DNAm data, identifying HAND2 promoter methylation as a driver event in endometrial carcinogenesis. It presents an example of an epigenetically deregulated gene linking ageing and obesity, the two main risk factors for endometrial cancer.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  155. 155

    Dutkowski, J. & Ideker, T. Protein networks as logic functions in development and cancer. PLoS Computat. Biol. 7, e1002180 (2011).

    CAS  Article  Google Scholar 

  156. 156

    Chuang, H. Y., Lee, E., Liu, Y. T., Lee, D. & Ideker, T. Network-based classification of breast cancer metastasis. Mol. Systems Biol. 3, 140 (2007).

    Article  Google Scholar 

  157. 157

    Bandyopadhyay, S. et al. Rewiring of genetic networks in response to DNA damage. Science 330, 1385–1389 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  158. 158

    Ruan, P., Shen, J., Santella, R. M., Zhou, S. & Wang, S. NEpiC: a network-assisted algorithm for epigenetic studies using mean and variance combined signals. Nucleic Acids Res. 44, e134 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  159. 159

    Ma, X., Liu, Z., Zhang, Z., Huang, X. & Tang, W. Multiple network algorithm for epigenetic modules via the integration of genome-wide DNA methylation and gene expression data. BMC Bioinformat. 18, 72 (2017).

    Article  CAS  Google Scholar 

  160. 160

    Wijetunga, N. A. et al. SMITE: an R/Bioconductor package that identifies network modules by integrating genomic and epigenomic information. BMC Bioinformat. 18, 41 (2017).

    Article  CAS  Google Scholar 

  161. 161

    The Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

  162. 162

    Koboldt, D. C. et al. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).

    CAS  Article  Google Scholar 

  163. 163

    Teschendorff, A. E. et al. The multi-omic landscape of transcription factor inactivation in cancer. Genome Med. 8, 89 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  164. 164

    Zhang, S. et al. Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res. 40, 9379–9391 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  165. 165

    Shen, R. et al. Integrative subtype discovery in glioblastoma using iCluster. PLoS ONE 7, e35236 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  166. 166

    O'Connell, M. J. & Lock, E. F. R. JIVE for exploration of multi-source molecular data. Bioinformatics 32, 2877–2879 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  167. 167

    Lock, E. F., Hoadley, K. A., Marron, J. S. & Nobel, A. B. Joint and Individual Variation Explained (JIVE) for integrated analysis of multiple data types. Ann. Appl. Statist. 7, 523–542 (2013).

    Article  Google Scholar 

  168. 168

    Harshman, R. A. & Lundy, M. E. PARAFAC: Parallel factor analysis. Comput. Stat. Data Anal. 18, 39–72 (1994).

    Article  Google Scholar 

  169. 169

    Hore, V. et al. Tensor decomposition for multiple-tissue gene expression experiments. Nat. Genet. 48, 1094–1100 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  170. 170

    Wang, T. et al. Epigenetic aging signatures in mice livers are slowed by dwarfism, calorie restriction and rapamycin treatment. Genome Biol. 18, 57 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  171. 171

    Cole, J. J. et al. Diverse interventions that extend mouse lifespan suppress shared age-associated epigenetic changes at critical gene regulatory regions. Genome Biol. 18, 58 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  172. 172

    Hahn, O. et al. Dietary restriction protects from age-associated DNA methylation and induces epigenetic reprogramming of lipid metabolism. Genome Biol. 18, 56 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  173. 173

    Fasanelli, F. et al. Hypomethylation of smoking-related genes is associated with future lung cancer in four prospective cohorts. Nat. Commun. 6, 10192 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  174. 174

    Feinberg, A. P. & Irizarry, R. A. Evolution in health and medicine Sackler colloquium: stochastic epigenetic variation as a driving force of development, evolutionary adaptation, and disease. Proc. Natl Acad. Sci. USA 107 (Suppl. 1), 1757–1764 (2010).

    CAS  Article  PubMed  Google Scholar 

  175. 175

    Issa, J. P. Epigenetic variation and cellular Darwinism. Nat. Genet. 43, 724–726 (2011).

    CAS  Article  PubMed  Google Scholar 

  176. 176

    McDonald, O. G. et al. Epigenomic reprogramming during pancreatic cancer progression links anabolic glucose metabolism to distant metastasis. Nat. Genet. 49, 367–376 (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  177. 177

    Zhuang, J. et al. The dynamics and prognostic potential of DNA methylation changes at stem cell gene loci in women's cancer. PLoS Genet. 8, e1002517 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  178. 178

    Fortin, J. P. & Hansen, K. D. Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data. Genome Biol. 16, 180 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  179. 179

    Levine, M. E. et al. DNA methylation age of blood predicts future onset of lung cancer in the women's health initiative. Aging 7, 690–700 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  180. 180

    Yang, Z. et al. Correlation of an epigenetic mitotic clock with cancer risk. Genome Biol. 17, 205 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  181. 181

    Marioni, R. E. et al. DNA methylation age of blood predicts all-cause mortality in later life. Genome Biol. 16, 25 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  182. 182

    Zhang, Y. et al. DNA methylation signatures in peripheral blood strongly predict all-cause mortality. Nat. Commun. 8, 14617 (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  183. 183

    Breitling, L. P. et al. Frailty is associated with the epigenetic clock but not with telomere length in a German cohort. Clin. Epigenet. 8, 21 (2016).

    Article  CAS  Google Scholar 

  184. 184

    Lehmann-Werman, R. et al. Identification of tissue-specific cell death using methylation patterns of circulating DNA. Proc. Natl Acad. Sci. USA 113, E1826–E1834 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  185. 185

    Venteicher, A. S. et al. Decoupling genetics, lineages, and microenvironment in IDH-mutant gliomas by single-cell RNA-seq. Science 355, eaai8478 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  186. 186

    Cheow, L. F. et al. Single-cell multimodal profiling reveals cellular epigenetic heterogeneity. Nat. Methods 13, 833–836 (2016).

    CAS  Article  PubMed  Google Scholar 

  187. 187

    Stricker, S. H., Koferle, A. & Beck, S. From profiles to function in epigenomics. Nat. Rev. Genet. 18, 51–66 (2017).

    CAS  Article  PubMed  Google Scholar 

  188. 188

    Angermueller, C., Parnamaa, T., Parts, L. & Stegle, O. Deep learning for computational biology. Mol. Syst. Biol. 12, 878 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  189. 189

    Ernst, J. & Kellis, M. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nat. Biotechnol. 33, 364–376 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  190. 190

    MacArthur, B. D. & Lemischka, I. R. Statistical mechanics of pluripotency. Cell 154, 484–489 (2013).

    CAS  Article  PubMed  Google Scholar 

  191. 191

    Teschendorff, A. & Enver, T. Single-cell entropy for accurate estimation of differentiation potency from a cell's transcriptome. Nat. Commun. 8, 15599 (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  192. 192

    Teschendorff, A. E. et al. The dynamics of DNA methylation covariation patterns in carcinogenesis. PLoS Computat. Biol. 10, e1003709 (2014).

    Article  CAS  Google Scholar 

  193. 193

    Lang, A. H., Li, H., Collins, J. J. & Mehta, P. Epigenetic landscapes explain partially reprogrammed cells and identify key reprogramming genes. PLoS Computat. Biol. 10, e1003734 (2014).

    Article  CAS  Google Scholar 

  194. 194

    Mojtahedi, M. et al. Cell fate decision as high-dimensional critical state transition. PLoS Biol. 14, e2000640 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  195. 195

    Mar, J. C. & Quackenbush, J. Decomposition of gene expression state space trajectories. PLoS Computat. Biol. 5, e1000626 (2009).

    Article  CAS  Google Scholar 

  196. 196

    Teschendorff, A. E., Sollich, P. & Kuehn, R. Signalling entropy: a novel network-theoretical framework for systems analysis and interpretation of functional omic data. Methods 67, 282–293 (2014).

    CAS  Article  PubMed  Google Scholar 

  197. 197

    Garcia-Ojalvo, J. & Martinez Arias, A. Towards a statistical mechanics of cell fate decisions. Curr. Opin. Genet. Dev. 22, 619–626 (2012).

    CAS  Article  PubMed  Google Scholar 

  198. 198

    Stumpf, P. S., Ewing, R. & MacArthur, B. D. Single cell pluripotency regulatory networks. Proteomics 16, 2303–2312 (2016).

    CAS  Article  PubMed  Google Scholar 

  199. 199

    Baron, R. M. & Kenny, D. A. The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J. Pers. Soc. Psychol. 51, 1173–1182 (1986).

    CAS  Article  PubMed  Google Scholar 

  200. 200

    Davey Smith, G. & Hemani, G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum. Mol. Genet. 23, R89–R98 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  201. 201

    Mendelson, M. M. et al. Association of body mass index with DNA methylation and gene expression in blood cells and relations to cardiometabolic disease: a Mendelian randomization approach. PLoS Med. 14, e1002215 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  202. 202

    Morales, E. et al. Genome-wide DNA methylation study in human placenta identifies novel loci associated with maternal smoking during pregnancy. Int. J. Epidemiol. 45, 1644–1655 (2016).

    Article  PubMed  Google Scholar 

  203. 203

    Allard, C. et al. Mendelian randomization supports causality between maternal hyperglycemia and epigenetic regulation of leptin gene in newborns. Epigenetics 10, 342–351 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  204. 204

    Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  205. 205

    Bowden, J., Davey Smith, G., Haycock, P. C. & Burgess, S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40, 304–314 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  206. 206

    Taylor, A. E. et al. Investigating the possible causal association of smoking with depression and anxiety using Mendelian randomisation meta-analysis: the CARTA consortium. BMJ Open 4, e006141 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  207. 207

    Teschendorff, A. E. in Computational and Statistical Epigenomics (ed. Teschendorff, A. E.) 161–185 (Springer, 2015).

    Book  Google Scholar 

  208. 208

    Maksimovic, J., Gagnon-Bartsch, J. A., Speed, T. P. & Oshlack, A. Removing unwanted variation in a differential methylation analysis of Illumina HumanMethylation450 array data. Nucleic Acids Res. 43, e106 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  209. 209

    Hansen, K. D., Langmead, B. & Irizarry, R. A. BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol. 13, R83 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  210. 210

    Schmidt, F. et al. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction. Nucleic Acids Res. 45, 54–66 (2017).

    CAS  Article  PubMed  Google Scholar 

  211. 211

    Hemani, G. et al. MR-Base: a platform for systematic causal inference across the phenome using billions of genetic associations. Preprint at bioRxiv http://dx.doi.org/10.1101/078972 (2016).

  212. 212

    Chun, S. et al. Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat. Genet. 49, 600–605 (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  213. 213

    Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  214. 214

    Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709–717 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  215. 215

    Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Affiliations

Authors

Contributions

Both authors contributed to all aspects of manuscript researching, discussion, writing and editing.

Corresponding author

Correspondence to Andrew E. Teschendorff.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

PowerPoint slides

Glossary

Bisulfite conversion

A technique in which DNA is treated with bisulfite, resulting in modification (upon amplification) of unmethylated cytosines into thymines, whereas methylated cytosines are protected from modification.

Epigenome-wide-association studies

(EWAS). A study design that seeks associations between DNA methylation at many sites across the genome and an exposure, trait or disease of interest.

Intra-sample normalization

The procedure of adjusting the raw data profile of a biological sample for technical biases and artefacts. This is often followed by inter-sample normalization, in which adjustments are made to the data for technical and biological factors that otherwise cause unwanted (and often confounding) data variation across samples.

Confounding

When the relationship between an exposure and an outcome is not causal but is due to the effects of a third variable (the confounder) on the exposure and the outcome. White blood cell heterogeneity can act as a confounder in many epigenetic studies.

Feature selection

The statistical procedure of identifying features which, in some broad sense, correlate with an exposure or phenotype of interest (POI).

Differentially methylated cytosines

(DMCs). Cytosines (usually in a CpG context) that exhibit a statistically significant difference in DNA methylation between two groups of samples, according to some statistical test.

Condition number

In the context of reference-based cell-type deconvolution, the condition number of a reference matrix represents an index of the numerical stability of the inference. Formally, it measures the sensitivity of the regression parameters (also known as cell weights) to small perturbations or errors in the reference matrix.

Constrained projection

(CP). Also known as quadratic programming (QP). A widely used technique for performing multivariate linear regression with constraints (such as non-negativity and normalization) imposed on the regression coefficients. In the context of cell-type deconvolution, the coefficients correspond to cell-type proportions in a sample. By definition, these proportions are non-negative, and their sum must be ≤1.

Beta distributions

The distributions of beta values. The beta value is a statistical term used to describe the quantification of DNA methylation at a given cytosine, as the ratio of methylated alleles to the total number of alleles (methylated + unmethylated), a number that by definition must lie between 0 (fully unmethylated) and 1 (fully methylated).

Surrogate variable analysis

(SVA). A widely used technique for selecting features associated with a factor of interest, which is not confounded by other factors. SVA uses a model to identify the data variation that is orthogonal to the factor of interest and subsequently uses principal component analysis (PCA) on this orthogonal variation matrix to construct 'surrogate variables', which in theory should capture confounding sources of variation.

Phenotype of interest

(POI). The factor or variable of interest in an epigenome-wide association study (EWAS). This factor is often binary, representing case–control status, but could also represent an ordinal variable (for example, genotype) or be continuous (for example, age).

Blind source separation

(BSS). The problem of inferring the sources of variation gives rise to a data matrix without using any prior information ('blind'). Algorithms that can achieve this are called BSS algorithms, of which independent component analysis (ICA) is one example.

Independent component analysis

(ICA). An unsupervised dimensionality reduction algorithm that decomposes the data matrix into a sum of linear components of variation, which are as statistically independent from each other as possible. Statistical independence is a stronger condition than the linear uncorrelatedness of principal component analysis (PCA) components, allowing improved modelling of sources of variation in complex data.

Principal component analysis

(PCA). An unsupervised dimensionality reduction algorithm that decomposes the data matrix into a sum of linear principal components (PCs) of variation, ranked by decreased variance and uncorrelated to each other.

Latent components

Components or sources of data variation that are 'hidden' (or latent) and that are inferred from the data using an unsupervised algorithm.

Supervised

Of statistical inferences, using the phenotype of interest from the outset, for instance, when identifying features correlating with a phenotype.

Variably methylated cytosines

(VMCs). Cytosines (usually in a CpG context) that exhibit a significant amount of variance in DNA methylation, as assessed across independent samples and relative to other CpG sites.

Heteroscedastic

Of a statistical distribution or of a random sample thereof, the expected variance, or spread, being dependent on the mean.

Logit transformation

A mathematical transformation that takes values defined on the unit interval (0,1) (for example, beta values (β)) into values defined on the open interval (−∞,+∞), termed M-values. Mathematically, M = log2[β/(1 − β)].

Methylation quantitative trait loci

(mQTLs). CpG sites whose DNA methylation level is correlated with a single-nucleotide polymorphism (SNP). If the SNP occurs close to the CpG (for instance, within a 10 kb window), it is called cis-mQTL, otherwise trans-mQTL.

Differentially variable cytosines

(DVCs). Cytosines (usually in a CpG context) that exhibit a statistically significant difference in the variance of DNA methylation between two groups of samples, according to some statistical test.

Field defects

Genetic or epigenetic alterations that are thought to predate the development of cancer and that are usually seen in the normal tissue found adjacent to cancer.

Type 1 error rate

The probability of erroneously calling the result of a test significant (positive) when the underlying true hypothesis is the null. It corresponds to the fraction of true negatives that are called positive, also known as the false-positive rate.

Variably methylated regions

(VMRs). Contiguous genomic regions where DNA methylation is highly variable relative to a normal 'ground state'. A VMR can be defined for one given sample.

Differentially variable regions

(DVRs). Contiguous genomic regions containing a statistically significant number of differentially variable cytosines (DVCs). This is different from a variably methylated region (VMR) in that a DVR is derived by comparing a fairly large number of cases and controls.

Gene set enrichment analysis

(GSEA). A widely used statistical procedure to assess whether a derived gene list of interest is enriched for specific biological terms, usually including gene ontologies, signalling pathways, specific transcriptomic signatures or targets of gene regulators.

System epigenomics

An emerging field whereby cellular phenotypes in normal development and disease are modelled as complex systems, using tools from complexity science (for example, dynamical system theory or statistical physics) to understand them.

Pleiotropy

A phenomenon that occurs when a genetic variant is associated with multiple traits. Vertical pleiotropy occurs where the traits are all on the same pathway (and is generally less of a problem), whereas horizontal pleiotropy exists where a genetic variant is associated with multiple traits via separate pathways.

Expression quantitative trait loci

(eQTLs). Genes whose expression levels are correlated with single-nucleotide polymorphisms (SNPs). If the SNP occurs near (definitions vary, but it could range from 10 kb to a 1 Mb window centred on the transcription start site) the gene, it is called a cis-eQTL; otherwise, it is a trans-eQTL.

TF hubs

In the context of a regulatory network where edges represent regulatory interactions between transcription factors (TFs) and target genes, those TFs with the largest number of interactions.

Expression quantitative trait methylation loci

(eQTMs). Genes whose expression levels are correlated with the DNA methylation level of a CpG. If the CpG occurs close to the gene (within a 250 kb window), it is called a cis-eQTM.

Tensor

A multi-dimensional array with the number of dimensions often called the 'order' or 'rank' of the tensor and for which linear decomposition algorithms are available, analogous to linear matrix factorization algorithms for data matrices. Scalars, vectors and matrices are tensors of order 0, 1 and 2, respectively.

Mendelian randomization

A technique to estimate the effect of an exposure on an outcome using genetic variants and instrumental variables for the exposure. This approach can also be applied to assessing mediation.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Teschendorff, A., Relton, C. Statistical and integrative system-level analysis of DNA methylation data. Nat Rev Genet 19, 129–147 (2018). https://doi.org/10.1038/nrg.2017.86

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing