Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Statistical and integrative system-level analysis of DNA methylation data

Key Points

  • Cell-type heterogeneity can be a major source of confounding and reverse causation in epigenome-wide association studies (EWAS). Adjustment for cell-type composition is therefore critical for an improved interpretation and understanding of EWAS.

  • For a given study, the best choice of cell-type deconvolution algorithm depends not only on the tissue and phenotype of interest but also on the presence of other confounders and the desired output.

  • Most variation in DNA methylation (DNAm) is driven by genetic factors and cell-type heterogeneity, with corresponding features — methylation quantitative trait loci (mQTLs) and cell-type-specific differentially methylated cytosines (DMCs) — readily identifiable using linear modelling.

  • Identification and interpretation of DNAm changes that accrue with age or exposure to environmental disease risk factors may benefit from differential variance statistics.

  • Analysing patterns of covariation in DNAm at regulatory elements can help to identify disrupted regulatory networks and gene modules in disease.

  • The inverse association between DNAm at regulatory elements and transcription factor binding can be exploited to elucidate the functional role of non-coding genome-wide association study (GWAS) single-nucleotide polymorphisms (SNPs) or functional effects caused by exposure to environmental disease risk factors.

  • Mendelian randomization can help to clarify the role of DNAm as a causal mediator between exposure to risk factors and disease.

Abstract

Epigenetics plays a key role in cellular development and function. Alterations to the epigenome are thought to capture and mediate the effects of genetic and environmental risk factors on complex disease. Currently, DNA methylation is the only epigenetic mark that can be measured reliably and genome-wide in large numbers of samples. This Review discusses some of the key statistical challenges and algorithms associated with drawing inferences from DNA methylation data, including cell-type heterogeneity, feature selection, reverse causation and system-level analyses that require integration with other data types such as gene expression, genotype, transcription factor binding and other epigenetic information.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: DNA methylation analysis of cell-type heterogeneity.
Figure 2: Variability, differential means and differential variability in DNA methylation data.
Figure 3: Examples of system-level integrative analysis of DNA methylation data.

Similar content being viewed by others

References

  1. Deaton, A. M. & Bird, A. CpG islands and the regulation of transcription. Genes Dev. 25, 1010–1022 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Ahuja, N., Li, Q., Mohan, A. L., Baylin, S. B. & Issa, J. P. Aging and DNA methylation in colorectal mucosa and cancer. Cancer Res. 58, 5489–5494 (1998).

    CAS  PubMed  Google Scholar 

  3. Fraga, M. F. et al. Epigenetic differences arise during the lifetime of monozygotic twins. Proc. Natl Acad. Sci. USA 102, 10604–10609 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Teschendorff, A. E. et al. Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Res. 20, 440–446 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Rakyan, V. K. et al. Human aging-associated DNA hypermethylation occurs preferentially at bivalent chromatin domains. Genome Res. 20, 434–439 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Maegawa, S. et al. Widespread and tissue specific age-related DNA methylation changes in mice. Genome Res. 20, 332–340 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Ahuja, N. & Issa, J. P. Aging, methylation and cancer. Histol. Histopathol. 15, 835–842 (2000).

    CAS  PubMed  Google Scholar 

  8. Horvath, S. DNA methylation age of human tissues and cell types. Genome Biol. 14, R115 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Petronis, A. Epigenetics as a unifying principle in the aetiology of complex traits and diseases. Nature 465, 721–727 (2010).

    Article  CAS  PubMed  Google Scholar 

  10. Feinberg, A. P., Ohlsson, R. & Henikoff, S. The epigenetic progenitor origin of human cancer. Nat. Rev. Genet. 7, 21–33 (2006).

    Article  CAS  PubMed  Google Scholar 

  11. Beck, S. Taking the measure of the methylome. Nat. Biotechnol. 28, 1026–1028 (2010).

    Article  CAS  PubMed  Google Scholar 

  12. Sandoval, J. et al. Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics 6, 692–702 (2011).

    Article  CAS  PubMed  Google Scholar 

  13. Moran, S., Arribas, C. & Esteller, M. Validation of a DNA methylation microarray for 850,000 CpG sites of the human genome enriched in enhancer sequences. Epigenomics 8, 389–399 (2016).

    Article  CAS  PubMed  Google Scholar 

  14. Stunnenberg, H. G., The International Human Epigenome Consortium & Hirst, M. The International Human Epigenome Consortium: a blueprint for scientific collaboration and discovery. Cell 167, 1145–1149 (2016).

    Article  CAS  PubMed  Google Scholar 

  15. Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  16. Guo, S. et al. Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA. Nat. Genet. 49, 635–642 (2017). This paper demonstrates how DNAm patterns detected from cell-free DNA in blood plasma can be used to detect cancer and its tissue of origin.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Gao, X., Jia, M., Zhang, Y., Breitling, L. P. & Brenner, H. DNA methylation changes of whole blood cells in response to active smoking exposure in adults: a systematic review of DNA methylation studies. Clin. Epigenet. 7, 113 (2015).

    Article  CAS  Google Scholar 

  18. Wahl, S. et al. Epigenome-wide association study of body mass index, and the adverse outcomes of adiposity. Nature 541, 81–86 (2017).

    Article  CAS  PubMed  Google Scholar 

  19. Joehanes, R. et al. Epigenetic signatures of cigarette smoking. Circ. Cardiovasc. Genet. 9, 436–447 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Zwamborn, R. A. et al. Prolonged high-fat diet induces gradual and fat depot-specific DNA methylation changes in adult mice. Sci. Rep. 7, 43261 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Bock, C. Analysing and interpreting DNA methylation data. Nat. Genet. 13, 705–719 (2012).

    Article  CAS  Google Scholar 

  22. Morris, T. J. & Beck, S. Analysis pipelines and packages for Infinium HumanMethylation450 BeadChip (450k) data. Methods 72, 3–8 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Assenov, Y. et al. Comprehensive analysis of DNA methylation data with RnBeads. Nat. Methods 11, 1138–1140 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Albrecht, F., List, M., Bock, C. & Lengauer, T. DeepBlueR: large-scale epigenomic analysis in R. Bioinformatics 33, 2063–2064 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Liang, L. et al. An epigenome-wide association study of total serum immunoglobulin E concentration. Nature 520, 670–674 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Gentles, A. J. et al. The prognostic landscape of genes and infiltrating immune cells across human cancers. Nat. Med. 21, 938–945 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Teschendorff, A. E. et al. An epigenetic signature in peripheral blood predicts active ovarian cancer. PLoS ONE 4, e8274 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Langevin, S. M. et al. Leukocyte-adjusted epigenome-wide association studies of blood from solid tumor patients. Epigenetics 9, 884–895 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Koestler, D. C. et al. Peripheral blood immune cell methylation profiles are associated with nonhematopoietic cancers. Cancer Epidemiol. Biomarkers Prev. 21, 1293–1302 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Liu, Y. et al. Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat. Biotechnol. 31, 142–147 (2013). This paper presents an EWAS demonstrating the dramatic impact adjusting for cell-type heterogeneity can have on the number of discoveries.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Houseman, E. A. et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformat. 13, 86 (2012). This paper presents a reference-based cell-type deconvolution algorithm for EWAS.

    Article  Google Scholar 

  32. Houseman, E. A., Molitor, J. & Marsit, C. J. Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics 30, 1431–1439 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Houseman, E. A. et al. Reference-free deconvolution of DNA methylation data and mediation by cell composition effects. BMC Bioinformat. 17, 259 (2016).

    Article  CAS  Google Scholar 

  34. Onuchic, V. et al. Epigenomic deconvolution of breast tumors reveals metabolic coupling between constituent cell types. Cell Rep. 17, 2075–2086 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Koestler, D. C. et al. Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL). BMC Bioinformat. 17, 120 (2016).

    Article  CAS  Google Scholar 

  36. Chatfield, C. Model uncertainty, data mining and statistical inference. J. R. Statist. Soc. A 158, 419–466 (1995).

    Article  Google Scholar 

  37. Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Accomando, W. P., Wiencke, J. K., Houseman, E. A., Nelson, H. H. & Kelsey, K. T. Quantitative reconstruction of leukocyte subsets using DNA methylation. Genome Biol. 15, R50 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Teschendorff, A. E., Breeze, C. E., Zheng, S. C. & Beck, S. A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies. BMC Bioinformat. 18, 105 (2017).

    Article  CAS  Google Scholar 

  40. Kang, S. et al. CancerLocator: non-invasive cancer diagnosis and tissue-of-origin prediction using methylation profiles of cell-free DNA. Genome Biol. 18, 53 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Zheng, X. et al. MethylPurify: tumor purity deconvolution and differential methylation detection from single tumor DNA methylomes. Genome Biol. 15, 419 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Zhang, N. et al. Predicting tumor purity from methylation microarray data. Bioinformatics 31, 3401–3405 (2015).

    Article  CAS  PubMed  Google Scholar 

  43. Zheng, X., Zhang, N., Wu, H. J. & Wu, H. Estimating and accounting for tumor purity in the analysis of DNA methylation data from cancer studies. Genome Biol. 18, 17 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Leek, J. T. & Storey, J. D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, 1724–1735 (2007). This paper presents SVA, a powerful framework for feature selection in the presence of confounders, including cell-type composition and unknown factors.

    Article  CAS  PubMed  Google Scholar 

  45. Leek, J. T. & Storey, J. D. A general framework for multiple testing dependence. Proc. Natl Acad. Sci. USA 105, 18718–18723 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. McGregor, K. et al. An evaluation of methods correcting for cell-type heterogeneity in DNA methylation studies. Genome Biol. 17, 84 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Zheng, S. C. et al. Correcting for cell-type heterogeneity in epigenome-wide association studies: revisiting previous analyses. Nat. Methods 14, 216–217 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Kaushal, A. et al. Comparison of different cell type correction methods for genome-scale epigenetics studies. BMC Bioinformat. 18, 216 (2017).

    Article  CAS  Google Scholar 

  50. Teschendorff, A. E., Zhuang, J. & Widschwendter, M. Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies. Bioinformatics 27, 1496–1505 (2011).

    Article  CAS  PubMed  Google Scholar 

  51. Teschendorff, A. E. et al. Correlation of smoking-associated DNA methylation changes in buccal cells with DNA methylation changes in epithelial cancer. JAMA Oncol. 1, 476–485 (2015).

    Article  PubMed  Google Scholar 

  52. Hyvärinen, A. & Oja, E. Independent component analysis: algorithms and applications. Neural Netw. 13, 411–430 (2000).

    Article  PubMed  Google Scholar 

  53. Zou, J., Lippert, C., Heckerman, D., Aryee, M. & Listgarten, J. Epigenome-wide association studies without the need for cell-type composition. Nat. Methods 11, 309–311 (2014).

    Article  CAS  PubMed  Google Scholar 

  54. Rahmani, E. et al. Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies. Nat. Methods 13, 443–445 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Teschendorff, A. E. et al. DNA methylation outliers in normal breast tissue identify field defects that are enriched in cancer. Nat. Commun. 7, 10478 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Gagnon-Bartsch, J. A. & Speed, T. P. Using control genes to correct for unwanted variation in microarray data. Biostatistics 13, 539–552 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  57. Jaffe, A. E. & Irizarry, R. A. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol. 15, R31 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Lutsik, P. et al. MeDeCom: discovery and quantification of latent components of heterogeneous methylomes. Genome Biol. 18, 55 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Bakulski, K. M. et al. DNA methylation of cord blood cell types: Applications for mixed cell birth studies. Epigenetics 11, 354–362 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Hattab, M. W. et al. Correcting for cell-type effects in DNA methylation studies: reference-based method outperforms latent variable approaches in empirical studies. Genome Biol. 18, 24 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).

    Article  PubMed  Google Scholar 

  63. Landan, G. et al. Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues. Nat. Genet. 44, 1207–1214 (2012).

    Article  CAS  PubMed  Google Scholar 

  64. Singer, Z. S. et al. Dynamic heterogeneity and DNA methylation in embryonic stem cells. Mol. Cell 55, 319–331 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Busslinger, M. & Tarakhovsky, A. Epigenetic control of immunity. Cold Spring Harb. Perspect. Biol. 6, a019307 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Landau, D. A. et al. Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia. Cancer Cell 26, 813–825 (2014). This paper uses WGBS data to estimate epigenetic clonal heterogeneity in cancer and to show that increased epigenetic heterogeneity is associated with a poor clinical outcome.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Li, S. et al. Distinct evolution and dynamics of epigenetic and genetic heterogeneity in acute myeloid leukemia. Nat. Med. 22, 792–799 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Teschendorff, A. E. et al. Epigenetic variability in cells of normal cytology is associated with the risk of future morphological transformation. Genome Med. 4, 24 (2012). This paper demonstrates that the risk of an epithelial cancer can be predicted from the DNAm patterns measured in normal cells, years before neoplastic transformation. The detection of DNAm risk markers was only possible using differential variability as a novel feature-selection paradigm in a risk prediction algorithm called EVORA.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Li, S. et al. Dynamic evolution of clonal epialleles revealed by methclone. Genome Biol. 15, 472 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. van ' t Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002).

    Article  Google Scholar 

  71. Du, P. et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformat. 11, 587 (2010).

    Article  CAS  Google Scholar 

  72. Wang, X., Laird, P. W., Hinoue, T., Groshen, S. & Siegmund, K. D. Non-specific filtering of beta-distributed data. BMC Bioinformat. 15, 199 (2014).

    Article  CAS  Google Scholar 

  73. Zhuang, J., Widschwendter, M. & Teschendorff, A. E. A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform. BMC Bioinformat. 13, 59 (2012).

    Article  CAS  Google Scholar 

  74. Dedeurwaerder, S. et al. A comprehensive overview of Infinium HumanMethylation450 data processing. Briefings Bioinformat. 15, 929–941 (2014).

    Article  CAS  Google Scholar 

  75. Zhou, W., Laird, P. W. & Shen, H. Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes. Nucleic Acids Res. 45, e22 (2017).

    PubMed  Google Scholar 

  76. van Dongen, J. et al. Genetic and environmental influences interact with age and sex in shaping the human methylome. Nat. Commun. 7, 11115 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Slieker, R. C. et al. Age-related accrual of methylomic variability is linked to fundamental ageing mechanisms. Genome Biol. 17, 191 (2016). This paper demonstrates the importance of differentially variable DNAm patterns in the context of ageing, linking age-associated DVCs to age-associated transcriptional changes. It provides a novel paradigm for understanding the role of age-associated DNAm changes in disease aetiology.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Wettenhall, J. M. & Smyth, G. K. limmaGUI: a graphical user interface for linear modeling of microarray data. Bioinformatics 20, 3705–3706 (2004).

    Article  CAS  PubMed  Google Scholar 

  79. Smyth, G. K. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, Article3 (2004).

    Article  PubMed  Google Scholar 

  80. Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Eckhardt, F. et al. DNA methylation profiling of human chromosomes 6, 20 and 22. Nat. Genet. 38, 1378–1385 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Libertini, E. et al. Saturation analysis for whole-genome bisulfite sequencing data. Nat. Biotechnol. 34, 691–693 (2016).

    Article  CAS  PubMed  Google Scholar 

  83. Libertini, E. et al. Information recovery from low coverage whole-genome bisulfite sequencing. Nat. Commun. 7, 11306 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. VanderKraats, N. D., Hiken, J. F., Decker, K. F. & Edwards, J. R. Discovering high-resolution patterns of differential DNA methylation that correlate with gene expression changes. Nucleic Acids Res. 41, 6816–6827 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Schlosberg, C. E., VanderKraats, N. D. & Edwards, J. R. Modeling complex patterns of differential DNA methylation that associate with gene expression changes. Nucleic Acids Res. 45, 5100–5111 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Jaffe, A. E. et al. Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int. J. Epidemiol. 41, 200–209 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  87. Aryee, M. J. et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30, 1363–1369 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Irizarry, R. A. et al. Comprehensive high-throughput arrays for relative methylation (CHARM). Genome Res. 18, 780–790 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Berman, B. P. et al. Regions of focal DNA hypermethylation and long-range hypomethylation in colorectal cancer coincide with nuclear lamina-associated domains. Nat. Genet. 44, 40–46 (2012).

    Article  CAS  Google Scholar 

  90. Timp, W. et al. Large hypomethylated blocks as a universal defining epigenetic alteration in human solid tumors. Genome Med. 6, 61 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Yuan, T. et al. An integrative multi-scale analysis of the dynamic DNA methylation landscape in aging. PLoS Genet. 11, e1004996 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Vandiver, A. R. et al. Age and sun exposure-related widespread genomic blocks of hypomethylation in nonmalignant skin. Genome Biol. 16, 80 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Hansen, K. D. et al. Increased methylation variation in epigenetic domains across cancer types. Nature Genet. 43, 768–777 (2011).

    Article  CAS  PubMed  Google Scholar 

  94. Hansen, K. D. et al. Large-scale hypomethylated blocks associated with Epstein-Barr virus-induced B-cell immortalization. Genome Res. 24, 177–184 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Peters, T. J. et al. De novo identification of differentially methylated regions in the human genome. Epigenetics Chromatin 8, 6 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Pedersen, B. S., Schwartz, D. A., Yang, I. V. & Kechris, K. J. Comb-p: software for combining, analyzing, grouping and correcting spatially correlated P-values. Bioinformatics 28, 2986–2988 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Snedecor, G. W. & Cochran, W. G. Statistical Methods (Wiley-Blackwell, 1989).

    Google Scholar 

  98. Teschendorff, A. E. & Widschwendter, M. Differential variability improves the identification of cancer risk markers in DNA methylation studies profiling precursor cancer lesions. Bioinformatics 28, 1487–1494 (2012).

    Article  CAS  PubMed  Google Scholar 

  99. Tian, L. & Tibshirani, R. Adaptive index models for marker-based risk stratification. Biostatistics 12, 68–86 (2011).

    Article  PubMed  Google Scholar 

  100. Phipson, B. & Oshlack, A. DiffVar: a new method for detecting differential variability with application to methylation in cancer and aging. Genome Biol. 15, 465 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Wahl, S. et al. On the potential of models for location and scale for genome-wide DNA methylation data. BMC Bioinformat. 15, 232 (2014).

    Article  Google Scholar 

  102. Ahn, S. & Wang, T. A powerful statistical method for identifying differentially methylated markers in complex diseases. Pac. Symp. Biocomput. 2013, 69–79 (2012).

    Google Scholar 

  103. Teschendorff, A. E., Jones, A. & Widschwendter, M. Stochastic epigenetic outliers can define field defects in cancer. BMC Bioinformat. 17, 178 (2016).

    Article  CAS  Google Scholar 

  104. Hannum, G. et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol. Cell 49, 359–367 (2013).

    Article  CAS  PubMed  Google Scholar 

  105. Jaffe, A. E., Feinberg, A. P., Irizarry, R. A. & Leek, J. T. Significance analysis and statistical dissection of variably methylated regions. Biostatistics 13, 166–178 (2012).

    Article  PubMed  Google Scholar 

  106. Jenkinson, G., Pujadas, E., Goutsias, J. & Feinberg, A. P. Potential energy landscapes identify the information-theoretic nature of the epigenome. Nat. Genet. 49, 719–729 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Breeze, C. E. et al. eFORGE: A Tool for Identifying Cell Type-Specific Signal in Epigenomic Data. Cell Rep. 17, 2137–2150 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Ziller, M. J. et al. Charting a dynamic DNA methylation landscape of the human genome. Nature 500, 477–481 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Geeleher, P. et al. Gene-set analysis is severely biased when applied to genome-wide methylation data. Bioinformatics 29, 1851–1857 (2013).

    Article  CAS  PubMed  Google Scholar 

  110. Phipson, B., Maksimovic, J. & Oshlack, A. missMethyl: an R package for analyzing data from Illumina's HumanMethylation450 platform. Bioinformatics 32, 286–288 (2016).

    CAS  PubMed  Google Scholar 

  111. West, J., Beck, S., Wang, X. & Teschendorff, A. E. An integrative network algorithm identifies age-associated differential methylation interactome hotspots targeting stem-cell differentiation pathways. Sci. Rep. 3, 1630 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  113. Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).

    Article  CAS  PubMed  Google Scholar 

  114. Birney, E., Smith, G. D. & Greally, J. M. Epigenome-wide association studies and the interpretation of disease -omics. PLoS Genet. 12, e1006105 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. Lappalainen, T. & Greally, J. M. Associating cellular epigenetic models with human phenotypes. Nat. Rev. Genet. 18, 441–451 (2017).

    Article  CAS  PubMed  Google Scholar 

  116. Dekkers, K. F. et al. Blood lipids influence DNA methylation in circulating cells. Genome Biol. 17, 138 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Karlic, R., Chung, H. R., Lasserre, J., Vlahovicek, K. & Vingron, M. Histone modification levels are predictive for gene expression. Proc. Natl Acad. Sci. USA 107, 2926–2931 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Gaunt, T. R. et al. Systematic identification of genetic influences on methylation across the human life course. Genome Biol. 17, 61 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Banovich, N. E. et al. Methylation QTLs are associated with coordinated changes in transcription factor binding, histone modifications, and gene expression levels. PLoS Genet. 10, e1004663 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Bell, J. T. et al. DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol. 12, R10 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Bonder, M. J. et al. Disease variants alter transcription factor levels and methylation of their binding sites. Nat. Genet. 49, 131–138 (2017). This paper demonstrates how genetic variants that affect the activity of a transcription factor in cis are associated in trans with coherent DNAm alteration at its binding sites. This principle provides a new strategy for elucidating the role of non-coding GWAS SNPs.

    Article  CAS  PubMed  Google Scholar 

  123. Rahmani, E. et al. Genome-wide methylation data mirror ancestry information. Epigenetics Chromatin 10, 1 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. Relton, C. L. & Davey Smith, G. Two-step epigenetic Mendelian randomization: a strategy for establishing the causal role of epigenetic processes in pathways to disease. Int. J. Epidemiol. 41, 161–176 (2012). This is paper proposes the use of genotype as a causal anchor to strengthen causal inference in epigenetic studies. It sets out the principle of two-step Mendelian randomization for molecular mediation.

    Article  PubMed  PubMed Central  Google Scholar 

  125. Richardson, T. G. et al. Mendelian randomization analysis identifies CpG sites as putative mediators for genetic influences on cardiovascular disease risk. Am. J. Hum. Genet. 101, 590–602 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  126. Caramaschi, D. et al. Exploring a causal role of DNA methylation in the relationship between maternal vitamin B12 during pregnancy and child's IQ at age 8, cognitive performance and educational attainment: a two-step Mendelian randomization study. Hum. Mol. Genet. 26, 3001–3013 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  127. Aran, D., Sabato, S. & Hellman, A. DNA methylation of distal regulatory sites characterizes dysregulation of cancer genes. Genome Biol. 14, R21 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  128. Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  129. Whalen, S., Truty, R. M. & Pollard, K. S. Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat. Genet. 48, 488–496 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Yang, X. et al. Gene body methylation can alter gene expression and is a therapeutic target in cancer. Cancer Cell 26, 577–590 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  131. Baylin, S. B. DNA methylation and gene silencing in cancer. Nat. Clin. Pract. Oncol. 2 (Suppl. 1), S4–S11 (2005).

    Article  CAS  PubMed  Google Scholar 

  132. Jones, P. A. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat. Rev. Genet. 13, 484–492 (2012).

    Article  CAS  PubMed  Google Scholar 

  133. Jjingo, D., Conley, A. B., Yi, S. V., Lunyak, V. V. & Jordan, I. K. On the presence and role of human gene-body DNA methylation. Oncotarget 3, 462–474 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  134. Jiao, Y., Widschwendter, M. & Teschendorff, A. E. A systems-level integrative framework for genome-wide DNA methylation and gene expression data identifies differential gene expression modules under epigenetic control. Bioinformatics 30, 2360–2366 (2014).

    Article  CAS  PubMed  Google Scholar 

  135. Brenet, F. et al. DNA methylation of the first exon is tightly linked to transcriptional silencing. PLoS ONE 6, e14524 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Walsh, C. P. & Bestor, T. H. Cytosine methylation and mammalian development. Genes Dev. 13, 26–34 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Shen, R., Olshen, A. B. & Ladanyi, M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25, 2906–2912 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. Gao, Y. et al. The integrative epigenomic-transcriptomic landscape of ER positive breast cancer. Clin. Epigenet. 7, 126 (2015).

    Article  CAS  Google Scholar 

  139. Mo, Q. et al. Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc. Natl Acad. Sci. USA 110, 4245–4250 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  140. Maurano, M. T. et al. Role of DNA methylation in modulating transcription factor occupancy. Cell Rep. 12, 1184–1195 (2015).

    Article  CAS  PubMed  Google Scholar 

  141. Domcke, S. et al. Competition between DNA methylation and transcription factors determines binding of NRF1. Nature 528, 575–579 (2015).

    Article  CAS  PubMed  Google Scholar 

  142. Zhu, H., Wang, G. & Qian, J. Transcription factors as readers and effectors of DNA methylation. Nat. Rev. Genet. 17, 551–565 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. Yin, Y. et al. Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science 356, eaaj2239 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  144. Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  145. Stadler, M. B. et al. DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature 480, 490–495 (2011).

    Article  CAS  PubMed  Google Scholar 

  146. Guilhamon, P. et al. Meta-analysis of IDH-mutant cancers identifies EBF1 as an interaction partner for TET2. Nat. Commun. 4, 2166 (2013).

    Article  CAS  PubMed  Google Scholar 

  147. Yao, L., Shen, H., Laird, P. W., Farnham, P. J. & Berman, B. P. Inferring regulatory element landscapes and transcription factor networks from cancer methylomes. Genome Biol. 16, 105 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. Hovestadt, V. et al. Decoding the regulatory landscape of medulloblastoma using DNA methylation sequencing. Nature 510, 537–541 (2014).

    Article  CAS  PubMed  Google Scholar 

  149. Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  CAS  Google Scholar 

  150. Rhie, S. K. et al. Identification of activated enhancers and linked transcription factors in breast, prostate, and kidney tumors by tracing enhancer networks using epigenetic traits. Epigenetics Chromatin 9, 50 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  151. Dhingra, P. et al. Identification of novel prostate cancer drivers using RegNetDriver: a framework for integration of genetic and epigenetic alterations with tissue-specific regulatory network. Genome Biol. 18, 141 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  152. Jin, F. et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature 503, 290–294 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  153. Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  154. Jones, A. et al. Role of DNA methylation and epigenetic silencing of HAND2 in endometrial cancer development. PLoS Med. 10, e1001551 (2013). This is paper uses a system-level integrative analysis of DNAm data, identifying HAND2 promoter methylation as a driver event in endometrial carcinogenesis. It presents an example of an epigenetically deregulated gene linking ageing and obesity, the two main risk factors for endometrial cancer.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  155. Dutkowski, J. & Ideker, T. Protein networks as logic functions in development and cancer. PLoS Computat. Biol. 7, e1002180 (2011).

    Article  CAS  Google Scholar 

  156. Chuang, H. Y., Lee, E., Liu, Y. T., Lee, D. & Ideker, T. Network-based classification of breast cancer metastasis. Mol. Systems Biol. 3, 140 (2007).

    Article  Google Scholar 

  157. Bandyopadhyay, S. et al. Rewiring of genetic networks in response to DNA damage. Science 330, 1385–1389 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  158. Ruan, P., Shen, J., Santella, R. M., Zhou, S. & Wang, S. NEpiC: a network-assisted algorithm for epigenetic studies using mean and variance combined signals. Nucleic Acids Res. 44, e134 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  159. Ma, X., Liu, Z., Zhang, Z., Huang, X. & Tang, W. Multiple network algorithm for epigenetic modules via the integration of genome-wide DNA methylation and gene expression data. BMC Bioinformat. 18, 72 (2017).

    Article  CAS  Google Scholar 

  160. Wijetunga, N. A. et al. SMITE: an R/Bioconductor package that identifies network modules by integrating genomic and epigenomic information. BMC Bioinformat. 18, 41 (2017).

    Article  CAS  Google Scholar 

  161. The Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

  162. Koboldt, D. C. et al. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).

    Article  CAS  Google Scholar 

  163. Teschendorff, A. E. et al. The multi-omic landscape of transcription factor inactivation in cancer. Genome Med. 8, 89 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  164. Zhang, S. et al. Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res. 40, 9379–9391 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  165. Shen, R. et al. Integrative subtype discovery in glioblastoma using iCluster. PLoS ONE 7, e35236 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  166. O'Connell, M. J. & Lock, E. F. R. JIVE for exploration of multi-source molecular data. Bioinformatics 32, 2877–2879 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  167. Lock, E. F., Hoadley, K. A., Marron, J. S. & Nobel, A. B. Joint and Individual Variation Explained (JIVE) for integrated analysis of multiple data types. Ann. Appl. Statist. 7, 523–542 (2013).

    Article  Google Scholar 

  168. Harshman, R. A. & Lundy, M. E. PARAFAC: Parallel factor analysis. Comput. Stat. Data Anal. 18, 39–72 (1994).

    Article  Google Scholar 

  169. Hore, V. et al. Tensor decomposition for multiple-tissue gene expression experiments. Nat. Genet. 48, 1094–1100 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  170. Wang, T. et al. Epigenetic aging signatures in mice livers are slowed by dwarfism, calorie restriction and rapamycin treatment. Genome Biol. 18, 57 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  171. Cole, J. J. et al. Diverse interventions that extend mouse lifespan suppress shared age-associated epigenetic changes at critical gene regulatory regions. Genome Biol. 18, 58 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  172. Hahn, O. et al. Dietary restriction protects from age-associated DNA methylation and induces epigenetic reprogramming of lipid metabolism. Genome Biol. 18, 56 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  173. Fasanelli, F. et al. Hypomethylation of smoking-related genes is associated with future lung cancer in four prospective cohorts. Nat. Commun. 6, 10192 (2015).

    Article  CAS  PubMed  Google Scholar 

  174. Feinberg, A. P. & Irizarry, R. A. Evolution in health and medicine Sackler colloquium: stochastic epigenetic variation as a driving force of development, evolutionary adaptation, and disease. Proc. Natl Acad. Sci. USA 107 (Suppl. 1), 1757–1764 (2010).

    Article  CAS  PubMed  Google Scholar 

  175. Issa, J. P. Epigenetic variation and cellular Darwinism. Nat. Genet. 43, 724–726 (2011).

    Article  CAS  PubMed  Google Scholar 

  176. McDonald, O. G. et al. Epigenomic reprogramming during pancreatic cancer progression links anabolic glucose metabolism to distant metastasis. Nat. Genet. 49, 367–376 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  177. Zhuang, J. et al. The dynamics and prognostic potential of DNA methylation changes at stem cell gene loci in women's cancer. PLoS Genet. 8, e1002517 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  178. Fortin, J. P. & Hansen, K. D. Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data. Genome Biol. 16, 180 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  179. Levine, M. E. et al. DNA methylation age of blood predicts future onset of lung cancer in the women's health initiative. Aging 7, 690–700 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  180. Yang, Z. et al. Correlation of an epigenetic mitotic clock with cancer risk. Genome Biol. 17, 205 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  181. Marioni, R. E. et al. DNA methylation age of blood predicts all-cause mortality in later life. Genome Biol. 16, 25 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  182. Zhang, Y. et al. DNA methylation signatures in peripheral blood strongly predict all-cause mortality. Nat. Commun. 8, 14617 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  183. Breitling, L. P. et al. Frailty is associated with the epigenetic clock but not with telomere length in a German cohort. Clin. Epigenet. 8, 21 (2016).

    Article  CAS  Google Scholar 

  184. Lehmann-Werman, R. et al. Identification of tissue-specific cell death using methylation patterns of circulating DNA. Proc. Natl Acad. Sci. USA 113, E1826–E1834 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  185. Venteicher, A. S. et al. Decoupling genetics, lineages, and microenvironment in IDH-mutant gliomas by single-cell RNA-seq. Science 355, eaai8478 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  186. Cheow, L. F. et al. Single-cell multimodal profiling reveals cellular epigenetic heterogeneity. Nat. Methods 13, 833–836 (2016).

    Article  CAS  PubMed  Google Scholar 

  187. Stricker, S. H., Koferle, A. & Beck, S. From profiles to function in epigenomics. Nat. Rev. Genet. 18, 51–66 (2017).

    Article  CAS  PubMed  Google Scholar 

  188. Angermueller, C., Parnamaa, T., Parts, L. & Stegle, O. Deep learning for computational biology. Mol. Syst. Biol. 12, 878 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  189. Ernst, J. & Kellis, M. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nat. Biotechnol. 33, 364–376 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  190. MacArthur, B. D. & Lemischka, I. R. Statistical mechanics of pluripotency. Cell 154, 484–489 (2013).

    Article  CAS  PubMed  Google Scholar 

  191. Teschendorff, A. & Enver, T. Single-cell entropy for accurate estimation of differentiation potency from a cell's transcriptome. Nat. Commun. 8, 15599 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  192. Teschendorff, A. E. et al. The dynamics of DNA methylation covariation patterns in carcinogenesis. PLoS Computat. Biol. 10, e1003709 (2014).

    Article  CAS  Google Scholar 

  193. Lang, A. H., Li, H., Collins, J. J. & Mehta, P. Epigenetic landscapes explain partially reprogrammed cells and identify key reprogramming genes. PLoS Computat. Biol. 10, e1003734 (2014).

    Article  CAS  Google Scholar 

  194. Mojtahedi, M. et al. Cell fate decision as high-dimensional critical state transition. PLoS Biol. 14, e2000640 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  195. Mar, J. C. & Quackenbush, J. Decomposition of gene expression state space trajectories. PLoS Computat. Biol. 5, e1000626 (2009).

    Article  CAS  Google Scholar 

  196. Teschendorff, A. E., Sollich, P. & Kuehn, R. Signalling entropy: a novel network-theoretical framework for systems analysis and interpretation of functional omic data. Methods 67, 282–293 (2014).

    Article  CAS  PubMed  Google Scholar 

  197. Garcia-Ojalvo, J. & Martinez Arias, A. Towards a statistical mechanics of cell fate decisions. Curr. Opin. Genet. Dev. 22, 619–626 (2012).

    Article  CAS  PubMed  Google Scholar 

  198. Stumpf, P. S., Ewing, R. & MacArthur, B. D. Single cell pluripotency regulatory networks. Proteomics 16, 2303–2312 (2016).

    Article  CAS  PubMed  Google Scholar 

  199. Baron, R. M. & Kenny, D. A. The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J. Pers. Soc. Psychol. 51, 1173–1182 (1986).

    Article  CAS  PubMed  Google Scholar 

  200. Davey Smith, G. & Hemani, G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum. Mol. Genet. 23, R89–R98 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  201. Mendelson, M. M. et al. Association of body mass index with DNA methylation and gene expression in blood cells and relations to cardiometabolic disease: a Mendelian randomization approach. PLoS Med. 14, e1002215 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  202. Morales, E. et al. Genome-wide DNA methylation study in human placenta identifies novel loci associated with maternal smoking during pregnancy. Int. J. Epidemiol. 45, 1644–1655 (2016).

    Article  PubMed  Google Scholar 

  203. Allard, C. et al. Mendelian randomization supports causality between maternal hyperglycemia and epigenetic regulation of leptin gene in newborns. Epigenetics 10, 342–351 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  204. Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  205. Bowden, J., Davey Smith, G., Haycock, P. C. & Burgess, S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40, 304–314 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  206. Taylor, A. E. et al. Investigating the possible causal association of smoking with depression and anxiety using Mendelian randomisation meta-analysis: the CARTA consortium. BMJ Open 4, e006141 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  207. Teschendorff, A. E. in Computational and Statistical Epigenomics (ed. Teschendorff, A. E.) 161–185 (Springer, 2015).

    Book  Google Scholar 

  208. Maksimovic, J., Gagnon-Bartsch, J. A., Speed, T. P. & Oshlack, A. Removing unwanted variation in a differential methylation analysis of Illumina HumanMethylation450 array data. Nucleic Acids Res. 43, e106 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  209. Hansen, K. D., Langmead, B. & Irizarry, R. A. BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol. 13, R83 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  210. Schmidt, F. et al. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction. Nucleic Acids Res. 45, 54–66 (2017).

    Article  CAS  PubMed  Google Scholar 

  211. Hemani, G. et al. MR-Base: a platform for systematic causal inference across the phenome using billions of genetic associations. Preprint at bioRxiv http://dx.doi.org/10.1101/078972 (2016).

  212. Chun, S. et al. Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat. Genet. 49, 600–605 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  213. Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  214. Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709–717 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  215. Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Both authors contributed to all aspects of manuscript researching, discussion, writing and editing.

Corresponding author

Correspondence to Andrew E. Teschendorff.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

PowerPoint slides

Glossary

Bisulfite conversion

A technique in which DNA is treated with bisulfite, resulting in modification (upon amplification) of unmethylated cytosines into thymines, whereas methylated cytosines are protected from modification.

Epigenome-wide-association studies

(EWAS). A study design that seeks associations between DNA methylation at many sites across the genome and an exposure, trait or disease of interest.

Intra-sample normalization

The procedure of adjusting the raw data profile of a biological sample for technical biases and artefacts. This is often followed by inter-sample normalization, in which adjustments are made to the data for technical and biological factors that otherwise cause unwanted (and often confounding) data variation across samples.

Confounding

When the relationship between an exposure and an outcome is not causal but is due to the effects of a third variable (the confounder) on the exposure and the outcome. White blood cell heterogeneity can act as a confounder in many epigenetic studies.

Feature selection

The statistical procedure of identifying features which, in some broad sense, correlate with an exposure or phenotype of interest (POI).

Differentially methylated cytosines

(DMCs). Cytosines (usually in a CpG context) that exhibit a statistically significant difference in DNA methylation between two groups of samples, according to some statistical test.

Condition number

In the context of reference-based cell-type deconvolution, the condition number of a reference matrix represents an index of the numerical stability of the inference. Formally, it measures the sensitivity of the regression parameters (also known as cell weights) to small perturbations or errors in the reference matrix.

Constrained projection

(CP). Also known as quadratic programming (QP). A widely used technique for performing multivariate linear regression with constraints (such as non-negativity and normalization) imposed on the regression coefficients. In the context of cell-type deconvolution, the coefficients correspond to cell-type proportions in a sample. By definition, these proportions are non-negative, and their sum must be ≤1.

Beta distributions

The distributions of beta values. The beta value is a statistical term used to describe the quantification of DNA methylation at a given cytosine, as the ratio of methylated alleles to the total number of alleles (methylated + unmethylated), a number that by definition must lie between 0 (fully unmethylated) and 1 (fully methylated).

Surrogate variable analysis

(SVA). A widely used technique for selecting features associated with a factor of interest, which is not confounded by other factors. SVA uses a model to identify the data variation that is orthogonal to the factor of interest and subsequently uses principal component analysis (PCA) on this orthogonal variation matrix to construct 'surrogate variables', which in theory should capture confounding sources of variation.

Phenotype of interest

(POI). The factor or variable of interest in an epigenome-wide association study (EWAS). This factor is often binary, representing case–control status, but could also represent an ordinal variable (for example, genotype) or be continuous (for example, age).

Blind source separation

(BSS). The problem of inferring the sources of variation gives rise to a data matrix without using any prior information ('blind'). Algorithms that can achieve this are called BSS algorithms, of which independent component analysis (ICA) is one example.

Independent component analysis

(ICA). An unsupervised dimensionality reduction algorithm that decomposes the data matrix into a sum of linear components of variation, which are as statistically independent from each other as possible. Statistical independence is a stronger condition than the linear uncorrelatedness of principal component analysis (PCA) components, allowing improved modelling of sources of variation in complex data.

Principal component analysis

(PCA). An unsupervised dimensionality reduction algorithm that decomposes the data matrix into a sum of linear principal components (PCs) of variation, ranked by decreased variance and uncorrelated to each other.

Latent components

Components or sources of data variation that are 'hidden' (or latent) and that are inferred from the data using an unsupervised algorithm.

Supervised

Of statistical inferences, using the phenotype of interest from the outset, for instance, when identifying features correlating with a phenotype.

Variably methylated cytosines

(VMCs). Cytosines (usually in a CpG context) that exhibit a significant amount of variance in DNA methylation, as assessed across independent samples and relative to other CpG sites.

Heteroscedastic

Of a statistical distribution or of a random sample thereof, the expected variance, or spread, being dependent on the mean.

Logit transformation

A mathematical transformation that takes values defined on the unit interval (0,1) (for example, beta values (β)) into values defined on the open interval (−∞,+∞), termed M-values. Mathematically, M = log2[β/(1 − β)].

Methylation quantitative trait loci

(mQTLs). CpG sites whose DNA methylation level is correlated with a single-nucleotide polymorphism (SNP). If the SNP occurs close to the CpG (for instance, within a 10 kb window), it is called cis-mQTL, otherwise trans-mQTL.

Differentially variable cytosines

(DVCs). Cytosines (usually in a CpG context) that exhibit a statistically significant difference in the variance of DNA methylation between two groups of samples, according to some statistical test.

Field defects

Genetic or epigenetic alterations that are thought to predate the development of cancer and that are usually seen in the normal tissue found adjacent to cancer.

Type 1 error rate

The probability of erroneously calling the result of a test significant (positive) when the underlying true hypothesis is the null. It corresponds to the fraction of true negatives that are called positive, also known as the false-positive rate.

Variably methylated regions

(VMRs). Contiguous genomic regions where DNA methylation is highly variable relative to a normal 'ground state'. A VMR can be defined for one given sample.

Differentially variable regions

(DVRs). Contiguous genomic regions containing a statistically significant number of differentially variable cytosines (DVCs). This is different from a variably methylated region (VMR) in that a DVR is derived by comparing a fairly large number of cases and controls.

Gene set enrichment analysis

(GSEA). A widely used statistical procedure to assess whether a derived gene list of interest is enriched for specific biological terms, usually including gene ontologies, signalling pathways, specific transcriptomic signatures or targets of gene regulators.

System epigenomics

An emerging field whereby cellular phenotypes in normal development and disease are modelled as complex systems, using tools from complexity science (for example, dynamical system theory or statistical physics) to understand them.

Pleiotropy

A phenomenon that occurs when a genetic variant is associated with multiple traits. Vertical pleiotropy occurs where the traits are all on the same pathway (and is generally less of a problem), whereas horizontal pleiotropy exists where a genetic variant is associated with multiple traits via separate pathways.

Expression quantitative trait loci

(eQTLs). Genes whose expression levels are correlated with single-nucleotide polymorphisms (SNPs). If the SNP occurs near (definitions vary, but it could range from 10 kb to a 1 Mb window centred on the transcription start site) the gene, it is called a cis-eQTL; otherwise, it is a trans-eQTL.

TF hubs

In the context of a regulatory network where edges represent regulatory interactions between transcription factors (TFs) and target genes, those TFs with the largest number of interactions.

Expression quantitative trait methylation loci

(eQTMs). Genes whose expression levels are correlated with the DNA methylation level of a CpG. If the CpG occurs close to the gene (within a 250 kb window), it is called a cis-eQTM.

Tensor

A multi-dimensional array with the number of dimensions often called the 'order' or 'rank' of the tensor and for which linear decomposition algorithms are available, analogous to linear matrix factorization algorithms for data matrices. Scalars, vectors and matrices are tensors of order 0, 1 and 2, respectively.

Mendelian randomization

A technique to estimate the effect of an exposure on an outcome using genetic variants and instrumental variables for the exposure. This approach can also be applied to assessing mediation.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Teschendorff, A., Relton, C. Statistical and integrative system-level analysis of DNA methylation data. Nat Rev Genet 19, 129–147 (2018). https://doi.org/10.1038/nrg.2017.86

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg.2017.86

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing