Comprehensive analysis of DNA methylation data with RnBeads

Article metrics

Abstract

RnBeads is a software tool for large-scale analysis and interpretation of DNA methylation data, providing a user-friendly analysis workflow that yields detailed hypertext reports (http://rnbeads.mpi-inf.mpg.de/). Supported assays include whole-genome bisulfite sequencing, reduced representation bisulfite sequencing, Infinium microarrays and any other protocol that produces high-resolution DNA methylation data. Notable applications of RnBeads include the analysis of epigenome-wide association studies and epigenetic biomarker discovery in cancer cohorts.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: RnBeads workflow for analyzing large-scale DNA methylation data.
Figure 2: Analysis of DNA methylation during adult stem cell differentiation.

References

  1. 1

    Lister, R. et al. Nature 462, 315–322 (2009).

  2. 2

    Gu, H. et al. Nat. Methods 7, 133–136 (2010).

  3. 3

    Bibikova, M. et al. Genomics 98, 288–295 (2011).

  4. 4

    Down, T.A. et al. Nat. Biotechnol. 26, 779–785 (2008).

  5. 5

    Harris, R.A. et al. Nat. Biotechnol. 28, 1097–1105 (2010).

  6. 6

    Stevens, M. et al. Genome Res. 23, 1541–1553 (2013).

  7. 7

    Bock, C. et al. Nat. Biotechnol. 28, 1106–1114 (2010).

  8. 8

    Bock, C. Nat. Rev. Genet. 13, 705–719 (2012).

  9. 9

    Krueger, F. & Andrews, S.R. Bioinformatics 27, 1571–1572 (2011).

  10. 10

    Xi, Y. et al. Bioinformatics 28, 430–432 (2012).

  11. 11

    Liu, Y., Siegmund, K.D., Laird, P.W. & Berman, B.P. Genome Biol. 13, R61 (2012).

  12. 12

    Lienhard, M., Grimm, C., Morkel, M., Herwig, R. & Chavez, L. Bioinformatics 30, 284–286 (2014).

  13. 13

    Wilson, G.A. et al. GigaScience 1, 3 (2012).

  14. 14

    Riebler, A. et al. Genome Biol. 15, R35 (2014).

  15. 15

    Meyer, L.R. et al. Nucleic Acids Res. 41, D64–D69 (2013).

  16. 16

    Flicek, P. et al. Nucleic Acids Res. 41, D48–D55 (2013).

  17. 17

    Giardine, B. et al. Genome Res. 15, 1451–1455 (2005).

  18. 18

    Zhou, X. et al. Nat. Methods 8, 989–990 (2011).

  19. 19

    Halachev, K., Bast, H., Albrecht, F., Lengauer, T. & Bock, C. Genome Biol. 13, R96 (2012).

  20. 20

    Weisenberger, D.J. J. Clin. Invest. 124, 17–23 (2014).

  21. 21

    Bock, C. et al. Mol. Cell 47, 633–647 (2012).

  22. 22

    Bock, C. Genome Med. 6, 41 (2014).

  23. 23

    Gentleman, R.C. et al. Genome Biol. 5, R80 (2004).

  24. 24

    Gentleman, R. & Temple Lang, D. Bioconductor Project Working Paper 2 (2004).

  25. 25

    Akman, K., Haaf, T., Gravina, S., Vijg, J. & Tresch, A. Bioinformatics 30, 1933–1934 (2014).

  26. 26

    Hebestreit, K., Dugas, M. & Klein, H.U. Bioinformatics 29, 1647–1653 (2013).

  27. 27

    Saito, Y., Tsuji, J. & Mituyama, T. Nucleic Acids Res. 42, e45 (2014).

  28. 28

    Hansen, K.D., Langmead, B. & Irizarry, R.A. Genome Biol. 13, R83 (2012).

  29. 29

    Morris, T.J. et al. Bioinformatics 30, 428–430 (2014).

  30. 30

    Warden, C.D. et al. Nucleic Acids Res. 41, e117 (2013).

  31. 31

    Barfield, R.T., Kilaru, V., Smith, A.K. & Conneely, K.N. Bioinformatics 28, 1280–1281 (2012).

  32. 32

    He, J., Sun, X., Shao, X., Liang, L. & Xie, H. Bioinformatics 29, 2044–2045 (2013).

  33. 33

    Zhang, Y., Su, J., Yu, D., Wu, Q. & Yan, H. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2013, 655–658 (2013).

  34. 34

    Wu, D., Gu, J. & Zhang, M.Q. PLoS ONE 8, e74275 (2013).

  35. 35

    Mancuso, F.M., Montfort, M., Carreras, A., Alibes, A. & Roma, G. BMC Res. Notes 4, 546 (2011).

  36. 36

    Wang, D. et al. Bioinformatics 28, 729–730 (2012).

  37. 37

    Kuan, P.F., Wang, S., Zhou, X. & Chu, H. Bioinformatics 26, 2849–2855 (2010).

  38. 38

    Kilaru, V., Barfield, R.T., Schroeder, J.W., Smith, A.K. & Conneely, K.N. Epigenetics 7, 225–229 (2012).

  39. 39

    Akalin, A. et al. Genome Biol. 13, R87 (2012).

  40. 40

    Park, Y., Figueroa, M.E., Rozek, L.S. & Sartor, M.A. Bioinformatics 30, 2414–2422 (2014).

  41. 41

    Aryee, M.J. et al. Bioinformatics 30, 1363–1369 (2014).

  42. 42

    Pidsley, R. et al. BMC Genomics 14, 293 (2013).

  43. 43

    Maksimovic, J., Gordon, L. & Oshlack, A. Genome Biol. 13, R44 (2012).

  44. 44

    Teschendorff, A.E. et al. Bioinformatics 29, 189–196 (2013).

  45. 45

    Triche, T.J. Jr., Weisenberger, D.J., Van Den Berg, D., Laird, P.W. & Siegmund, K.D. Nucleic Acids Res. 41, e90 (2013).

  46. 46

    Xi, Y. & Li, W. BMC Bioinformatics 10, 232 (2009).

  47. 47

    Chavez, L. et al. Genome Res. 20, 1441–1450 (2010).

  48. 48

    Du, P. et al. BMC Bioinformatics 11, 587 (2010).

  49. 49

    Westra, H.J. et al. Bioinformatics 27, 2104–2111 (2011).

  50. 50

    Nordlund, J. et al. Genome Biol. 14, r105 (2013).

  51. 51

    Sandve, G.K. et al. Nucleic Acids Res. 41, W133–W141 (2013).

  52. 52

    Bock, C., Halachev, K., Büch, J. & Lengauer, T. Genome Biol. 10, R14 (2009).

  53. 53

    Bock, C., Walter, J., Paulsen, M. & Lengauer, T. Nucleic Acids Res. 36, e55 (2008).

  54. 54

    Smyth, G.K. Stat. Appl. Genet. Mol. Biol. 3, Article3 (2004).

  55. 55

    Houseman, E.A., Molitor, J. & Marsit, C.J. Bioinformatics 30, 1431–1439 (2014).

  56. 56

    Makambi, K.H. J. Appl. Stat. 30, 225–234 (2003).

  57. 57

    Leek, J.T. et al. Nat. Rev. Genet. 11, 733–739 (2010).

  58. 58

    Leek, J.T., Johnson, W.E., Parker, H.S., Jaffe, A.E. & Storey, J.D. Bioinformatics 28, 882–883 (2012).

  59. 59

    Johnson, W.E., Li, C. & Rabinovic, A. Biostatistics 8, 118–127 (2007).

  60. 60

    Teschendorff, A.E., Zhuang, J. & Widschwendter, M. Bioinformatics 27, 1496–1505 (2011).

  61. 61

    Gagnon-Bartsch, J.A. & Speed, T.P. Biostatistics 13, 539–552 (2012).

  62. 62

    Jaffe, A.E. & Irizarry, R.A. Genome Biol. 15, R31 (2014).

  63. 63

    Houseman, E.A. et al. BMC Bioinformatics 13, 86 (2012).

  64. 64

    Michels, K.B. et al. Nat. Methods 10, 949–955 (2013).

  65. 65

    Reinius, L.E. et al. PLoS ONE 7, e41361 (2012).

  66. 66

    Zou, J., Lippert, C., Heckerman, D., Aryee, M. & Listgarten, J. Nat. Methods 11, 309–311 (2014).

  67. 67

    Afgan, E. et al. BMC Bioinformatics 11 (suppl. 12), S4 (2010).

  68. 68

    Ziller, M.J. et al. Nature 500, 477–481 (2013).

  69. 69

    Satterlee, J.S., Schübeler, D. & Ng, H.H. Nat. Biotechnol. 28, 1039–1044 (2010).

  70. 70

    ENCODE Project Consortium. Science 306, 636–640 (2004).

  71. 71

    Varley, K.E. et al. Genome Res. 23, 555–567 (2013).

Download references

Acknowledgements

We thank D. Brocks, H. Hernandez-Vargas, A. Houseman, E. Schneider, A. Schönegger and all users of RnBeads for their extensive testing and feedback. We also thank G. Friedrich, J. Büch and the Information Services and Technology team at the Max Planck Institute for technical support. This work is funded in part by the European Union's Seventh Framework Programme (FP7/2007-2013) grant agreement no. 282510 (BLUEPRINT) and grant agreement no. 267038 (NOTOX), as well as by the German Science Ministry grant no. 01KU1216A (DEEP).

Author information

Y.A., F.M. and P.L. developed and maintain RnBeads; J.W., T.L. and C.B. supervised the project; all authors contributed to the writing of the manuscript.

Correspondence to Fabian Müller or Christoph Bock.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data.

RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by increased DNA methylation levels (termed G-CIMP+), and to predict the G-CIMP status for a total of 124 patients using Infinium 450k data obtained from the TCGA project (http://cancergenome.nih.gov).(a) Detection of genetic duplicates among the patient samples (columns) using a clustered heatmap of intensity values for the genotyping probes that are present on the Infinium microarray (rows). The inset shows that two samples exhibit a high level of genetic identity, and they are indeed derived from tumors of the same patient.(b) Quality control plot summarizing the outcome of the data filtering. The bar plots on the top left show that the majority of CpG sites (top) and samples (bottom) are of good quality and can be retained. The relatively straight line in the quantile-quantile plot indicates that the probe filtering does not have a major impact on the distribution of DNA methylation in the dataset.(c) Identification of a small but clearly distinguished cluster of G-CIMP+ glioblastoma samples with elevated DNA methylation levels especially in CpG-rich genomic regions (light blue in the leftmost column). In the heatmap, blue colors denote high levels of DNA methylation, red indicates low levels and grey represents intermediate levels. For visualization purposes, only the 100 gene promoters (rows) with the highest levels of inter-sample variation in DNA methylation are shown (columns), but the hierarchical clustering is based on the full set of promoters.(d) Global assessment of the similarity between the DNA methylation profiles, plotting all glioblastoma samples according to their second and third principal components. The samples exhibit strong separation according to the G-CIMP status (denoted by point shape) and IDH1 mutation status (denoted by point color).(e) Analysis of significant associations between all user-provided sample annotations. Significant p-values (<0.05) are highlighted in the left triangle, and the corresponding statistical tests are annotated in the right triangle (orange: Pearson correlation followed by permutation-based estimation of the p-value; green: Fisher’s exact test; blue: Wilcoxon rank sum test; violet: Kruskal-Wallis one-way analysis of variance).(f) Genome-scale comparison between the DNA methylation levels of G-CIMP positive (y-axis) and G-CIMP negative (x-axis) tumor samples, focusing on CpG islands (left scatterplot) and on 5-kilobase tiling regions with a CpG content in the bottom quartile (right scatterplot), respectively. Genomic regions that are differentially methylated with an FDR below 0.05 are presented as red points. All other regions are displayed in blue, and color brightness denotes point density.

Supplementary Figure 2 RnBeads-based Methylome Resource of reference epigenome data sets.

Screenshot of the Methylome Resource (http://rnbeads.mpi-inf.mpg.de/methylomes.php), which makes large DNA methylation datasets more readily available for follow-up research. On the one hand, it provides detailed analysis reports for publicly available methylome datasets that can be explored interactively. On the other hand, the Methylome Resource website lets RnBeads users download all data and configurations that are needed to re-run all or part of the DNA methylation analyses in their local or cloud-based computing environment. These re-runnable analysis configurations make it straightforward for RnBeads users to analyze their own DNA methylation data in the context of publicly available reference epigenome maps.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1 and 2, Supplementary Table 2 and Supplementary Note (PDF 886 kb)

Supplementary Table 1

Comparison between software tools for DNA methylation analysis (XLSX 37 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Assenov, Y., Müller, F., Lutsik, P. et al. Comprehensive analysis of DNA methylation data with RnBeads. Nat Methods 11, 1138–1140 (2014) doi:10.1038/nmeth.3115

Download citation

Further reading