Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Reference-free deconvolution, visualization and interpretation of complex DNA methylation data using DecompPipeline, MeDeCom and FactorViz

Abstract

DNA methylation profiling offers unique insights into human development and diseases. Often the analysis of complex tissues and cell mixtures is the only feasible option to study methylation changes across large patient cohorts. Since DNA methylomes are highly cell type specific, deconvolution methods can be used to recover cell type–specific information in the form of latent methylation components (LMCs) from such ‘bulk’ samples. Reference-free deconvolution methods retrieve these components without the need for DNA methylation profiles of purified cell types. Currently no integrated and guided procedure is available for data preparation and subsequent interpretation of deconvolution results. Here, we describe a three-stage protocol for reference-free deconvolution of DNA methylation data comprising: (i) data preprocessing, confounder adjustment using independent component analysis (ICA) and feature selection using DecompPipeline, (ii) deconvolution with multiple parameters using MeDeCom, RefFreeCellMix or EDec and (iii) guided biological inference and validation of deconvolution results with the R/Shiny graphical user interface FactorViz. Our protocol simplifies the analysis and guides the initial interpretation of DNA methylation data derived from complex samples. The harmonized approach is particularly useful to dissect and evaluate cell heterogeneity in complex systems such as tumors. We apply the protocol to lung cancer methylomes from The Cancer Genome Atlas (TCGA) and show that our approach identifies the proportions of stromal cells and tumor-infiltrating immune cells, as well as associations of the detected components with clinical parameters. The protocol takes slightly >3 d to complete and requires basic R skills.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the proposed deconvolution protocol.
Fig. 2: Evaluation of ICA on the TCGA LUAD dataset.
Fig. 3: Interpreting MeDeCom results with FactorViz.

Similar content being viewed by others

Data availability

The results shown here are wholly or partially based upon data generated by the TCGA (TCGA-LUAD dataset) Research Network: https://www.cancer.gov/tcga. The Ewing sarcoma dataset is available from the Gene Expression Omnibus GEO, accession number GSE88826.

Code availability

All R packages are available from public code repositories:

DecompPipeline: https://github.com/CompEpigen/DecompPipeline

MeDeCom: https://github.com/lutsik/MeDeCom

FactorViz: https://github.com/CompEpigen/FactorViz

consensusICA: https://gitlab.com/biomodlih/consica.

The pipeline behind our protocol is available as R source packages under open-source licenses (DecompPiepline, MeDeCom, FactorViz: GPLv3; consensusICA: Standard MIT license) and is also implemented as a Docker container available from DockerHub: https://hub.docker.com/r/mscherer/medecom. Supplementary resources and R scripts used to generate the figures are available from http://epigenomics.dkfz.de/DecompProtocol/. The code in this paper has been peer-reviewed.

References

  1. Durek, P. et al. Epigenomic profiling of human CD4+ T cells supports a linear differentiation model and highlights molecular regulators of memory development. Immunity 45, 1148–1161 (2016).

    Article  CAS  PubMed  Google Scholar 

  2. Karpinski, P., Pesz, K. & Sasiadek, M. M. Pan-cancer analysis reveals presence of pronounced DNA methylation drift in CpG island methylator phenotype clusters. Epigenomics 9, 1341–1352 (2017).

    Article  CAS  PubMed  Google Scholar 

  3. Møller, M. et al. Heterogeneous patterns of DNA methylation-based field effects in histologically normal prostate tissue from cancer patients. Sci. Rep. 7, 40636 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. Vidal, E. et al. A DNA methylation map of human cancer at single base-pair resolution. Oncogene 36, 5648–5657 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Azuara, D. et al. New methylation biomarker panel for early diagnosis of dysplasia or cancer in high-risk inflammatory bowel disease patients. Inflamm. Bowel Dis. 24, 2555–2564 (2018).

    PubMed  Google Scholar 

  6. Horvath, S. & Raj, K. DNA methylation-based biomarkers and the epigenetic clock theory of ageing. Nat. Rev. Genet. 19, 371–384 (2018).

    Article  CAS  PubMed  Google Scholar 

  7. Stunnenberg, H. G. et al. The International Human Epigenome Consortium: a blueprint for scientific collaboration and discovery. Cell 167, 1145–1149 (2016).

    Article  CAS  PubMed  Google Scholar 

  8. Adams, D. et al. BLUEPRINT to decode the epigenetic signature written in blood. Nat. Biotechnol. 30, 224–226 (2012).

    Article  CAS  PubMed  Google Scholar 

  9. Bock, C. Analysing and interpreting DNA methylation data. Nat. Rev. Genet. 13, 705–719 (2012).

    Article  CAS  PubMed  Google Scholar 

  10. Teschendorff, A. E. & Relton, C. L. Statistical and integrative system-level analysis of DNA methylation data. Nat. Rev. Genet. 19, 129–147 (2017).

    Article  PubMed  CAS  Google Scholar 

  11. Houseman, E. A. et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinforma. 13, 86 (2012).

    Article  Google Scholar 

  12. Teschendorff, A. E., Breeze, C. E., Zheng, S. C. & Beck, S. A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies. BMC Bioinforma. 18, 105 (2017).

    Article  CAS  Google Scholar 

  13. Zheng, S. C. et al. A novel cell-type deconvolution algorithm reveals substantial contamination by immune cells in saliva, buccal and cervix. Epigenomics 10, 925–940 (2018).

    Article  CAS  PubMed  Google Scholar 

  14. Chakravarthy, A. et al. Pan-cancer deconvolution of tumour composition using DNA methylation. Nat. Commun. 9, 3220 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Hicks, S. C. & Irizarry, R. A. methylCC: technology-independent estimation of cell type composition using differentially methylated regions. Genome Biol. 20, 261 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Salas, L. A. et al. An optimized library for reference-based deconvolution of whole-blood biospecimens assayed using the Illumina HumanMethylationEPIC BeadArray. Genome Biol. 19, 64 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Kaushal, A. et al. Comparison of different cell type correction methods for genome-scale epigenetics studies. BMC Bioinforma. 18, 216 (2017).

    Article  CAS  Google Scholar 

  18. Zou, J., Lippert, C., Heckerman, D., Aryee, M. & Listgarten, J. Epigenome-wide association studies without the need for cell-type composition. Nat. Methods 11, 309–311 (2014).

    Article  CAS  PubMed  Google Scholar 

  19. Rahmani, E. et al. Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies. Nat. Methods 13, 443–445 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Rahmani, E. et al. BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference. Genome Biol. 19, 141 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. Houseman, E. A. et al. Reference-free deconvolution of DNA methylation data and mediation by cell composition effects. BMC Bioinforma. 17, 259 (2016).

    Article  CAS  Google Scholar 

  22. Onuchic, V. et al. Epigenomic deconvolution of breast tumors reveals metabolic coupling between constituent cell types. Cell Rep. 17, 2075–2086 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Lutsik, P. et al. MeDeCom: discovery and quantification of latent components of heterogeneous methylomes. Genome Biol. 18, 55 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. Rahmani, E. et al. Cell-type-specific resolution epigenetics without the need for cell sorting or single-cell biology. Nat. Commun. 10, 3417 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Thompson, M., Chen, Z. J., Rahmani, E. & Halperin, E. CONFINED: distinguishing biological from technical sources of variation by leveraging multiple methylation datasets. Genome Biol. 20, 138 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Decamps, C. et al. Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software. BMC Bioinforma. 21, 16 (2020).

    Article  CAS  Google Scholar 

  27. Assenov, Y. et al. Comprehensive analysis of DNA methylation data with RnBeads. Nat. Methods 11, 1138–1140 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Müller, F. et al. RnBeads 2.0: comprehensive analysis of DNA methylation data. Genome Biol. 20, 55 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. Heyn, H. et al. Distinct DNA methylomes of newborns and centenarians. Proc. Natl Acad. Sci. USA 109, 10522–10527 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Horvath, S. DNA methylation age of human tissues and cell types. Genome Biol. 14, R115 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Sompairac, N. et al. Independent component analysis for unraveling the complexity of cancer omics datasets. Int. J. Mol. Sci. 20, 4414 (2019).

    Article  CAS  PubMed Central  Google Scholar 

  32. Everson, T. M. et al. Cadmium-associated differential methylation throughout the placental genome: epigenome-wide association study of two U.S. birth cohorts. Environ. Health Perspect. 126, 017010 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Carlström, K. E. et al. Therapeutic efficacy of dimethyl fumarate in relapsing-remitting multiple sclerosis associates with ROS pathway in monocytes. Nat. Commun. 10, 3081 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. Goeppert, B. et al. Integrative analysis defines distinct prognostic subgroups of intrahepatic cholangiocarcinoma. Hepatology 69, 2091–2106 (2019).

    Article  CAS  PubMed  Google Scholar 

  35. Man, Y. G. et al. Tumor-infiltrating immune cells promoting tumor invasion and metastasis: Existing theories. J. Cancer 4, 84–95 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. Reinius, L. E. et al. Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS ONE 7, e41361 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Luo, C. et al. Single-cell methylomes identify neuronal subtypes and regulatory elements in mammalian cortex. Science 357, 600–604 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Mulqueen, R. M. et al. Highly scalable generation of DNA methylation profiles in single cells. Nat. Biotechnol. 36, 428–431 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Meissner, A. et al. Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis. Nucleic Acids Res. 33, 5868–5877 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Bibikova, M. et al. High density DNA methylation array with single CpG site resolution. Genomics 98, 288–295 (2011).

    Article  CAS  PubMed  Google Scholar 

  42. Aryee, M. J. et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30, 1363–1369 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Pidsley, R. et al. A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics 14, 293 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Tian, Y. et al. ChAMP: updated methylation analysis pipeline for Illumina BeadChips. Bioinformatics 33, 3982–3984 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Troyanskaya, O. et al. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001).

    Article  CAS  PubMed  Google Scholar 

  46. Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Chen, Y. A. et al. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics 8, 203–209 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Pidsley, R. et al. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biol. 17, 208 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  49. Aran, D., Sirota, M. & Butte, A. J. Systematic pan-cancer analysis of tumour purity. Nat. Commun. 6, 8971 (2015).

    Article  CAS  PubMed  Google Scholar 

  50. Dirkse, A. et al. Stem cell-associated heterogeneity in Glioblastoma results from intrinsic tumor plasticity shaped by the microenvironment. Nat. Commun. 10, 1787 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  51. Nazarov, P. V. et al. Deconvolution of transcriptomes and miRNomes by independent component analysis provides insights into biological processes and clinical outcomes of melanoma patients. BMC Med. Genomics 12, 132 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  52. Therneau, T. M. & Grambsch, P. M. Data: Extending the Cox Model (Springer, 2000).

  53. Falcon, S. & Gentleman, R. Using GOstats to test gene lists for GO term association. Bioinformatics 23, 257–258 (2007).

    Article  CAS  PubMed  Google Scholar 

  54. Sheffield, N. C. & Bock, C. LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor. Bioinformatics 32, 587–589 (2016).

    Article  CAS  PubMed  Google Scholar 

  55. The Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).

    Article  PubMed Central  CAS  Google Scholar 

  56. Testa, U., Castelli, G. & Pelosi, E. Lung cancers: molecular characterization, clonal heterogeneity and evolution, and cancer stem cells. Cancers (Basel) 10, 248 (2018).

    Article  CAS  Google Scholar 

  57. Teschendorff, A. E. et al. A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics 29, 189–196 (2013).

    Article  CAS  PubMed  Google Scholar 

  58. Cerami, E. et al. The cBio Cancer Genomics Portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).

    Article  PubMed  Google Scholar 

  59. Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun. 4, 2612 (2013).

    Article  PubMed  CAS  Google Scholar 

  60. Travaglini, K. J. et al. A molecular cell atlas of the human lung from single cell RNA sequencing. Preprint at https://www.biorxiv.org/content/10.1101/742320v2 (2020).

  61. Hahn, M. A. et al. Methylation of Polycomb target genes in intestinal cancer is mediated by inflammation. Cancer Res. 68, 10280 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Varambally, S. et al. The polycomb group protein EZH2 is involved in progression of prostate cancer. Nature 419, 624–629 (2002).

    Article  CAS  PubMed  Google Scholar 

  63. Cai, Y. et al. Epigenetic alterations to Polycomb targets precede malignant transition in a mouse model of breast cancer. Sci. Rep. 8, 5535 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  64. Ward, M. J. et al. Tumour-infiltrating lymphocytes predict for outcome in HPV-positive oropharyngeal cancer. Br. J. Cancer 110, 489–500 (2014).

    Article  CAS  PubMed  Google Scholar 

  65. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  66. Colaprico, A. et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 44, e71 (2016).

    Article  PubMed  Google Scholar 

  67. Malta, T. M. et al. Machine learning identifies stemness features associated with oncogenic dedifferentiation. Cell 173, 338–354.e15 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Harris, T. et al. Both gene amplification and allelic loss occur at 14q13.3 in lung cancer. Clin. Cancer Res. 17, 690–699 (2011).

    Article  CAS  PubMed  Google Scholar 

  69. Sheffield, N. C. et al. DNA methylation heterogeneity defines a disease spectrum in Ewing sarcoma. Nat. Med. 23, 386–395 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).

    Google Scholar 

  71. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  PubMed Central  CAS  Google Scholar 

  72. Sánchez-Castillo, M. et al. CODEX: a next-generation sequencing experiment database for the haematopoietic and embryonic stem cell communities. Nucleic Acids Res. 43, D1117–D1123 (2015).

    Article  PubMed  CAS  Google Scholar 

  73. Liu, T. et al. Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol. 12, R83 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Prive, F., Aschard, H., Ziyatdinov, A. & Blum, M. G. B. Efficient analysis of large-scale genome-wide data with two R packages: Bigstatsr and bigsnpr. Bioinformatics 34, 2781–2787 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47–e47 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  76. Houseman, E. A., Molitor, J. & Marsit, C. J. Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics 30, 1431–1439 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Jaffe, A. E. & Irizarry, R. A. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol. 15, R31 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank the HADACA consortium (Health Data Challenge, Aussois, Dec 2018 and Nov 2019) for valuable input and D. Gupta for thoroughly testing the proposed pipeline. We are grateful to K. Breuer for testing the Docker container, and to F. Azuaje for supporting the collaboration. This work was funded in part by the German Epigenome Project (DEEP, German Science Ministry grant no. 01KU1216A), de.NBI-epi (German Science Ministry grant nos. 031L0101A and 031L0101D) and the EU H2020 project SYSCID (733100). P.V.N. and T.K. were supported by the Luxembourg National Research Fund (C17/BM/11664971/DEMICS). P.L. was supported by the DKFZ Postdoctoral Fellowship and the AMPro Project of the Helmholtz Association (ZT00026).

Author information

Authors and Affiliations

Authors

Contributions

M.S. and P.L. implemented most of the computational procedures. P.L. and N.V. previously developed, published and recently updated MeDeCom for installation on Windows. S.S, M.S. and P.L. implemented FactorViz. P.V.N. and T.K. implemented consensus ICA. M.S. performed the analysis of the example datasets, and created all figures and tables. P.V.N., R.T. and V.M. provided crucial input to the analysis and interpretation, and thoroughly tested the protocol. P.L., J.W., T.L. and C.P. jointly supervised the project. M.S. and P.L. wrote the manuscript, with contributions from all co-authors. All authors read and approved the final text.

Corresponding author

Correspondence to Pavlo Lutsik.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Protocols thanks Lucas Salas and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Lutsik, P. et al. Genome Biol. 18, 55 (2017): https://doi.org/10.1186/s13059-017-1182-6

Müller, F. et al. Genome Biol. 20, 55 (2019): https://doi.org/10.1186/s13059-019-1664-9

Nazarov, P. et al. BMC Med. Genomics 12, 132 (2019): https://doi.org/10.1186/s12920-019-0578-4

Goeppert, B. et al. Hepatology 69, 2091–2106 (2019): https://doi.org/10.1002/hep.30493

Decamps, C. et al. BMC Bioinforma. 21, 16 (2020): https://doi.org/10.1186/s12859-019-3307-2

Key data used in this protocol

The Cancer Genome Atlas Research Network, Nature 511, 543–550 (2014): https://doi.org/10.1038/nature13385

Sheffield, N. et al. Nat. Med. 23, 386–395 (2017): https://doi.org/10.1038/nm.4273

Extended data

Extended Data Fig. 1 Quality control of TCGA data.

a Boxplot for hybridization control probes for the green and the red channel, respectively. Boxplot lines represent the median, the 25th- and 75th- percentiles, and 1.5 times the inter-quartile range. b Sex prediction based on the intensities of the probes on the sex chromosomes. A logistic regression classifier was employed to differentiate between female and male samples. c Outline of the CpG filtering procedure. The sites on the 450k array are filtered according to quality scores (coverage, overall intensity), genomic sequence context (SNPs, sex chromosomes), and cross-reactive sites are discarded.

Extended Data Fig. 2 Selecting the number of components and the regularization parameter for MeDeCom.

a Cross-validation error plotted against the number of latent components K for different values of the regularization parameter λ. Differences across the values of K mask the differences between the five λ values. b Objective value and cross-validation error for different values of λ after fixing the number of components to 7. c Multidimensional scaling of the LMC data matrix after fixing the number of components to 7 and the regularization parameter to 0.001. Shown are the first two multidimensional components. d Violin plots of the LMC methylation matrix for the selected parameters. Boxplot lines represent the median, the 25th- and 75th- percentiles, and 1.5 times the inter-quartile range.

Extended Data Fig. 3 Interpreting RefFreeCellMix results with FactorViz.

a Heatmap of LMC proportions in TCGA-LUAD cohort samples (K=7 components). The samples were hierarchically clustered according to the Euclidean distance between the proportions using complete linkage. We annotated samples using disease status and with the sample-specific LUMP estimate. b Associations between the phenotypic traits and proportions. For quantitative traits, the Pearson correlations are shown as ellipses that are directed to the upper right for positive and to the lower right for negative correlations, respectively. For qualitative traits, the absolute difference of the proportions in the two groups (for example, female vs. male) is shown. P values (two-sided correlation test for quantitative and two-sided t-test for categorical variables) less than 0.01 are indicated by bold borders. LOLA (c) and GO (d) enrichment analysis of the LMC-specific hypomethylated sites for components 1, 2 and 4. No significant GO enrichment was found for components 1 and 4. Sites were defined as LMC-specific hypomethylated if the difference between the value of the methylation component and the median of all other components was less than 0.5. P values have been adjusted for multiple testing with the Benjamini-Hochberg method. e Scatterplots between proportions per sample and known marker gene expression of different lung cell types. The gene expression was measured using counts per million (CPM).

Extended Data Fig. 4 Survival analysis using the survival R-package52 comparing different levels of LMC proportions.

Shown are Kaplan-Meier curves, while samples were stratified according to the LMC proportions into two groups according to the median (high vs. low proportions). P values were computed using the Cox proportional hazards model with the LMC proportions as input, and age, sex, and tumor stage as covariates.

Extended Data Fig. 5 Interpreting MeDeCom results on the Ewing sarcoma RRBS data set69 with FactorViz.

a Heatmap of LMC proportions in the Ewing sarcoma samples (K=6 components, λ=0.001). The samples were hierarchically clustered according to the Euclidean distance between the proportions using complete linkage. We annotated samples using the tumor location and with the sample-specific LUMP estimate. b Associations between the phenotypic traits and proportions. For quantitative traits, the Pearson correlations are shown as ellipses that are directed to the upper right for positive and to the lower right for negative correlations, respectively. For qualitative traits, the absolute difference of the proportions in the two groups (for example mutation vs. wildtype) is shown. P values (two-sided correlation test for quantitative and two-sided t-test for categorical variables) less than 0.01 are indicated by bold borders. GO (c) and LOLA (d) enrichment analysis of the LMC6-specific hypomethylated sites. No significant LOLA and GO enrichments were found for the remaining LMCs. Sites were defined as LMC-specific hypomethylated if the difference between the value of the LMC and the median of all other components was less than 0.5. P values have been adjusted for multiple testing with the Benjamini-Hochberg method. No matched gene-expression values were available for this data set.

Supplementary information

Supplementary Information

Supplementary Note, Supplementary Tables 1–3 and Supplementary Figures 1–8.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Scherer, M., Nazarov, P.V., Toth, R. et al. Reference-free deconvolution, visualization and interpretation of complex DNA methylation data using DecompPipeline, MeDeCom and FactorViz. Nat Protoc 15, 3240–3263 (2020). https://doi.org/10.1038/s41596-020-0369-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41596-020-0369-6

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer