Review Article | Published:

Maximizing ecological and evolutionary insight in bisulfite sequencing data sets


Genome-scale bisulfite sequencing approaches have opened the door to ecological and evolutionary studies of DNA methylation in many organisms. These approaches can be powerful. However, they introduce new methodological and statistical considerations, some of which are particularly relevant to non-model systems. Here, we highlight how these considerations influence a study’s power to link methylation variation with a predictor variable of interest. Relative to current practice, we argue that sample sizes will need to increase to provide robust insights. We also provide recommendations for overcoming common challenges and an R Shiny app to aid in study design.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Feil, R. & Fraga, M. F. Epigenetics and the environment: emerging patterns and implications. Nat. Rev. Genet. 13, 97–109 (2011).

  2. 2.

    Jones, P. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat. Rev. Genet. 13, 484–492 (2012).

  3. 3.

    Smith, Z. D. & Meissner, A. DNA methylation: roles in mammalian development. Nat. Rev. Genet. 14, 204–220 (2013).

  4. 4.

    Seymour, D. K. & Becker, C. The causes and consequences of DNA methylome variation in plants. Curr. Opin. Plant Biol. 36, 56–63 (2017).

  5. 5.

    Verhoeven, K. J. F., Jansen, J. J., van Dijk, P. J. & Biere, A. Stress-induced DNA methylation changes and their heritability in asexual dandelions. New Phytol. 185, 1108–1118 (2010).

  6. 6.

    Zhao, Y. et al. Adaptive methylation regulation of p53 pathway in sympatric speciation of blind mole rats, Spalax. Proc. Natl Acad. Sci. USA 113, 2146–2151 (2016).

  7. 7.

    Durand, S., Bouché, N., Perez Strand, E., Loudet, O. & Camilleri, C. Rapid establishment of genetic incompatibility through natural epigenetic variation. Curr. Biol. 22, 326–331 (2012).

  8. 8.

    Hernando-Herraez, I. et al. Dynamics of DNA methylation in recent human and great ape evolution. PLoS Genet. 9, e1003763 (2013).

  9. 9.

    Hernando-Herraez, I., Garcia-Perez, R., Sharp, A. J. & Marques-Bonet, T. DNA methylation: insights into human evolution. PLoS Genet. 11, e1005661 (2015).

  10. 10.

    Snell-Rood, E. The importance of epigenetics for behavioral ecologists (and vice versa). Behav. Ecol. 19, 2012 (2012).

  11. 11.

    Ledon-Rettig, C. C., Richards, C. L. & Martin, L. B. Epigenetics for behavioral ecologists. Behav. Ecol. 24, 311–324 (2012).

  12. 12.

    Glastad, K. M., Hunt, B. G. & Goodisman, M. A. Evolutionary insights into DNA methylation in insects. Curr. Opin. Insect Sci. 1, 25–30 (2014).

  13. 13.

    Feng, S. et al. Conservation and divergence of methylation patterning in plants and animals. Proc. Natl Acad. Sci. USA 107, 8689–8694 (2010).

  14. 14.

    Schmitz, R. J. et al. Patterns of population epigenomic diversity. Nature 495, 193–198 (2013).

  15. 15.

    Schmitz, R. J. et al. Transgenerational epigenetic instability is a source of novel methylation variants. Science 334, 369–373 (2011).

  16. 16.

    Cortijo, S. et al. Mapping the epigenetic basis of complex traits. Science 343, 1145–1148 (2014).

  17. 17.

    Gu, H. et al. Preparation of reduced representation bisulfite sequencing libraries for genome-scale DNA methylation profiling. Nat. Protoc. 6, 468–481 (2011).

  18. 18.

    Lister, R., Pelizzola, M., Dowen, R. & Hawkins, R. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009).

  19. 19.

    Cokus, S. J. et al. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452, 215–219 (2008).

  20. 20.

    Dolzhenko, E. & Smith, A. D. Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments. BMC Bioinform. 15, 215 (2014).

  21. 21.

    Sun, D. et al. MOABS: model based analysis of bisulfite sequencing data. Genome Biol. 15, R38 (2014).

  22. 22.

    Feng, H., Conneely, K. N. & Wu, H. A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data. Nucleic Acids Res. 42, 1–11 (2014).

  23. 23.

    Hansen, K., Langmead, B. & Irizarry, R. BSmooth : from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol. 13, R83 (2012).

  24. 24.

    Tsai, P. C. & Bell, J. T. Power and sample size estimation for epigenome-wide association scans to detect differential DNA methylation. Int. J. Epidemiol. 44, 1429–1441 (2015).

  25. 25.

    Ziller, M. J., Hansen, K. D., Meissner, A. & Aryee, M. J. Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing. Nat. Methods 12, 2–5 (2014).

  26. 26.

    Rakyan, V. K., Down, Ta, Balding, D. J. & Beck, S. Epigenome-wide association studies for common human diseases. Nat. Rev. Genet. 12, 529–41 (2011).

  27. 27.

    Harris, R. A. et al. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications. Nat. Biotechnol. 28, 1097–1105 (2010).

  28. 28.

    Hansen, K. D. et al. Increased methylation variation in epigenetic domains across cancer types. Nat. Genet. 43, 768–775 (2011).

  29. 29.

    Pacis, A. et al. Bacterial infection remodels the DNA methylation landscape of human dendritic cells. Genome Res. 25, 1801–1811 (2015).

  30. 30.

    Zemach, A., McDaniel, I. E., Silva, P. & Zilberman, D. Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science 328, 916–919 (2010).

  31. 31.

    Takuno, S., Ran, J.-H. & Gaut, B. S. Evolutionary patterns of genic DNA methylation vary across land plants. Nat. Plants 2, 15222 (2016).

  32. 32.

    Klughammer, J. et al. Differential DNA methylation analysis without a reference genome. Cell Rep. 13, 2621–2633 (2015).

  33. 33.

    Verhoeven, K. J. F., VonHoldt, B. M. & Sork, V. L. Epigenetics in ecology and evolution: what we know and what we need to know. Mol. Ecol. 25, 1631–1638 (2016).

  34. 34.

    Becker, C. et al. Spontaneous epigenetic variation in the Arabidopsis thaliana methylome. Nature 480, 245–249 (2011).

  35. 35.

    Lea, A., Tung, J. & Zhou, X. A flexible, efficient binomial mixed model for identifying differential DNA methylation in bisulfite sequencing data. PLoS Genet. 11, e1005650 (2015).

  36. 36.

    Lea, A. J., Altmann, J., Alberts, S. C. & Tung, J. Resource base influences genome-wide DNA methylation levels in wild baboons (Papio cynocephalus). Mol. Ecol. 25, 1681–1696 (2016).

  37. 37.

    Tung, J. et al. Social environment is associated with gene regulatory variation in the rhesus macaque immune system. Proc. Natl Acad. Sci. USA 109, 6490–6495 (2012).

  38. 38.

    Banovich, N. E. et al. Methylation QTLs are associated with coordinated changes in transcription factor binding, histone modifications, and gene expression levels. PLoS Genet. 10, e1004663 (2014).

  39. 39.

    Zhang, X. et al. Genome-wide high-resolution mapping and functional analysis of DNA methylation in Arabidopsis. Cell 126, 1189–1201 (2006).

  40. 40.

    Libbrecht, R., Oxley, P. R., Keller, L. & Kronauer, D. J. C. Robust DNA methylation in the clonal raider ant brain. Curr. Biol. 26, 391–395 (2016).

  41. 41.

    Boyle, P., Clement, K., Gu, H. & Smith, Z. Gel-free multiplexed reduced representation bisulfite sequencing for large-scale DNA methylation profiling. Genome Biol. 13, R92 (2012).

  42. 42.

    Krueger, F. Trim Galore! v. 0.4.1 (2015).

  43. 43.

    Murgatroyd, C. et al. Dynamic DNA methylation programs persistent adverse effects of early-life stress. Nat. Neurosci. 12, 1559–1566 (2009).

  44. 44.

    Elliott, E., Ezra-Nevo, G., Regev, L., Neufeld-Cohen, A. & Chen, A. Resilience to social stress coincides with functional DNA methylation of the CRF gene in adult mice. Nat. Neurosci. 13, 1351–1353 (2010).

  45. 45.

    Tobi, E. W. et al. DNA methylation signatures link prenatal famine exposure to growth and metabolism. Nat. Commun. 5, 5592 (2014).

  46. 46.

    Dubin, M. J. et al. DNA methylation variation in Arabidopsis has a genetic basis and appears to be involved in local adaptation. eLife 4, e05255 (2015).

  47. 47.

    Hernando-Herraez, I. et al. The interplay between DNA methylation and sequence divergence in recent human evolution. Nucleic Acids Res. 43, 8204–8214 (2015).

  48. 48.

    Janowitz Koch, I. et al. The concerted impact of domestication and transposon insertions on methylation patterns between dogs and grey wolves. Mol. Ecol. 25, 1838–1855 (2016).

  49. 49.

    Taudt, A., Colomé-Tatché, M. & Johannes, F. Genetic sources of population epigenomic variation. Nat. Rev. Genet. 17, 319–332 (2016).

  50. 50.

    Gugger, P. F., Fitz-Gibbon, S., Pellegrini, M. & Sork, V. L. Species-wide patterns of DNA methylation variation in Quercus lobata and its association with climate gradients. Mol. Ecol. 25, 1665–1680 (2016).

  51. 51.

    Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).

  52. 52.

    Kang, H. M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).

  53. 53.

    Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–8 (2006).

  54. 54.

    Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).

  55. 55.

    Liu, Y., Siegmund, K. D., Laird, P. W. & Berman, B. P. Bis-SNP: Combined DNA methylation and SNP calling for bisulfite-seq data. Genome Biol. 13, R61 (2012).

  56. 56.

    Gao, S. et al. BS-SNPer: SNP calling in bisulfite-seq data. Bioinformatics 31, 4006–4008 (2015).

  57. 57.

    Jablonka, E. & Raz, G. Transgenerational epigenetic inheritance: prevalence, mechanisms, and implications for the study of heredity and evolution. Q. Rev. Biol. 84, 131–176 (2009).

  58. 58.

    Heard, E. & Martienssen, R. A. Transgenerational epigenetic inheritance: myths and mechanisms. Cell 157, 95–109 (2014).

  59. 59.

    Bewick, A. J., Vogel, K. J., Moore, A. J. & Schmitz, R. J. Evolution of DNA methylation across insects. Mol. Biol. Evol. 34, msw264 (2016).

  60. 60.

    Bonasio, R. et al. Genome-wide and caste-specific DNA methylomes of the ants Camponotus floridanus and Harpegnathos saltator. Curr. Biol. 22, 1755–1764 (2012).

  61. 61.

    Lyko, F. et al. The honey bee epigenomes: differential methylation of brain DNA in queens and workers. PLoS Biol. 8, e1000506 (2010).

  62. 62.

    Wang, J. & Fan, C. A neutrality test for detecting selection on DNA methylation using single methylation polymorphism frequency spectrum. Genome Biol. Evol. 7, 154–171 (2014).

  63. 63.

    Vidalis, A. et al. Methylome evolution in plants. Genome Biol. 17, 264 (2016).

  64. 64.

    Shah, S. et al. Genetic and environmental exposures constrain epigenetic drift over the human life course. Genome Res. 24, 1725–1733 (2014).

  65. 65.

    McRae, A. F. et al. Contribution of genetic variation to transgenerational inheritance of DNA methylation. Genome Biol. 15, R73 (2014).

  66. 66.

    Weigel, D. & Colot, V. Epialleles in plant evolution. Genome Biol. 13, 249 (2012).

  67. 67.

    Hansen, K. D. et al. BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol. 13, R83 (2012).

  68. 68.

    Charlesworth, B. & Jain, K. Purifying selection, drift, and reversible mutation with arbitrarily high mutation rates. Genetics 198, 1587–1602 (2014).

  69. 69.

    Jaffe, A. E. & Irizarry, R. A. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol. 15, R31 (2014).

  70. 70.

    Beldomenico, P. M. et al. Poor condition and infection: a vicious circle in natural populations. Proc. R. Soc. B 275, 1753–1759 (2008).

  71. 71.

    Charruau, P. et al. Pervasive effects of aging on gene expression in wild wolves. Mol. Biol. Evol. 33, 1967–1978 (2016).

  72. 72.

    Merino, S., Moreno, J., Sanz, J. J. & Arriero, E. Are avian blood parasites pathogenic in the wild? A medication experiment in blue tits (Parus caeruleus). Proc. R. Soc. B 267, 2507–2510 (2000).

  73. 73.

    Ots, I., Murumägi, A. & Hõrak, P. Haematological health state indices of reproducing great tits: methodology and sources of natural variation. Funct. Ecol. 12, 700–707 (1998).

  74. 74.

    Watkins, N. A. et al. A HaemAtlas: characterizing gene expression in differentiated human blood cells. Blood 113, e1–e9 (2009).

  75. 75.

    Kawakatsu, T. et al. Unique cell-type-specific patterns of DNA methylation in the root meristem. Nat. Plants 2, 16058 (2016).

  76. 76.

    Houseman, E. A. et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinform. 13, 86 (2012).

  77. 77.

    Hattab, M. W. et al. Correcting for cell-type effects in DNA methylation studies: reference-based method outperforms latent variable approaches in empirical studies. Genome Biol. 18, 24 (2017).

  78. 78.

    Zheng, S. C. et al. Correcting for cell-type heterogeneity in epigenome-wide association studies: revisiting previous analyses. Nat. Methods 14, 216–217 (2017).

  79. 79.

    Zou, J., Lippert, C., Heckerman, D., Aryee, M. & Listgarten, J. Epigenome-wide association studies without the need for cell-type composition. Nat. Methods 11, 309–11 (2014).

  80. 80.

    Leek, J. T. & Storey, J. D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, e161 (2009).

  81. 81.

    Houseman, E. A., Molitor, J. & Marsit, C. J. Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics 30, 1431–1439 (2014).

  82. 82.

    Eckhardt, F. et al. DNA methylation profiling of human chromosomes 6, 20 and 22. Nat. Genet. 38, 1378–1385 (2006).

  83. 83.

    Bell, J. T. et al. DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol. 12, R10 (2011).

  84. 84.

    Klein, H. U. & Hebestreit, K. An evaluation of methods to test predefined genomic regions for differential methylation in bisulfite sequencing data. Brief. Bioinform. 17, 796–807 (2016).

  85. 85.

    Akalin, A. & Kormaksson, M. methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol. 13, R87 (2012).

  86. 86.

    Jaffe, A. E. et al. Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int. J. Epidemiol. 41, 200–209 (2012).

  87. 87.

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995).

  88. 88.

    Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).

  89. 89.

    Jühling, F. et al. Metilene: fast and sensitive calling of differentially methylated regions from bisulfite sequencing data. Genome Res. 26, 256–262 (2016).

  90. 90.

    Li, S. et al. An optimized algorithm for detecting and annotating regional differential methylation. BMC Bioinform. 14(suppl. 5), S10 (2013).

  91. 91.

    Akalin, A. et al. methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol. 13, R87 (2012).

  92. 92.

    Hebestreit, K., Dugas, M. & Klein, H. U. Detection of significantly differentially methylated regions in targeted bisulfite sequencing data. Bioinformatics 29, 1647–1653 (2013).

  93. 93.

    Virdi, K. S. et al. Arabidopsis MSH1 mutation alters the epigenome and produces heritable changes in plant growth. Nat. Commun. 6, 6386 (2015).

  94. 94.

    Rockman, M. V. The QTN program and the alleles that matter for evolution: all that’s gold does not glitter. Evolution 66, 1–17 (2012).

  95. 95.

    Klug, M. & Rehli, M. Functional analysis of promoter CpG methylation using a CpG-free luciferase reporter vector. Epigenetics 1, 127–130 (2006).

  96. 96.

    Vojta, A. et al. Repurposing the CRISPR-Cas9 system for targeted DNA methylation. Nucleic Acids Res. 44, 5615–5628 (2016).

  97. 97.

    Wu, C., DeWan, A., Hoh, J. & Wang, Z. A comparison of association methods correcting for population stratification in case-control studies. Ann. Hum. Genet. 75, 418–27 (2011).

  98. 98.

    Perry, G. et al. Comparative RNA sequencing reveals substantial genetic variation in endangered primates. Genome Res. 22, 602–610 (2012).

  99. 99.

    Piskol, R., Ramaswami, G. & Li, J. B. Reliable identification of genomic variants from RNA-seq data. Am. J. Hum. Genet. 93, 641–651 (2013).

  100. 100.

    Horton, M. W. et al. Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nat. Genet. 44, 212–216 (2012).

  101. 101.

    Paradis, E., Claude, J. & Strimmer, K. APE: Analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004).

Download references


We thank K. Hansen and I. Hernando-Herraez for providing processed file formats from their previously published work. We also thank N. Snyder-Mackler, L. Barreiro and X. Zhou for helpful comments and suggestions, M. Cetinkaya-Rundel for coding suggestions on the R Shiny app, M. Gavery for beta-testing it, the Baylor College of Medicine Human Genome Sequencing Center for access to the current version of the baboon genome assembly (Panu 2.0). This work was supported by NIH R21-AG049936 and 1R01GM102562 to J.T., NSF BCS-1455808 to J.T. and A.J.L.; P.A.P.D. is supported by NIH K12GM000678 from the Training, Workforce Development and Diversity division of the National Institute of General Medical Sciences.

Author information

A.J.L. and J.T. conceived the study; A.J.L., T.P.V. and P.A.P.D. analysed previously published and simulated data; T.P.V. wrote the R Shiny app; and A.J.L. and J.T. wrote the manuscript, with input from all co-authors. All authors gave final approval for publication.

Competing interests

The authors declare no competing financial interests.

Correspondence to Amanda J. Lea or Jenny Tung.

Electronic supplementary material

  1. Supplementary Information

    Supplementary Methods, Supplementary Tables 1–2, Supplementary Figures 1–9, Supplementary References

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading

Figure 1: Overview of reduced representation bisulfite sequencing and whole-genome bisulfite sequencing.
Figure 2: Estimates of effect sizes and their impact on the power of differential methylation analysis.
Figure 3: Properties of CpG methylation levels vary across data sets and influence power.