Correcting for cell-type heterogeneity in DNA methylation: a comprehensive evaluation

Rahmani, Elior; Zaitlen, Noah; Baran, Yael; Eng, Celeste; Hu, Donglei; Galanter, Joshua; Oh, Sam; Burchard, Esteban G; Eskin, Eleazar; Zou, James; Halperin, Eran

doi:10.1038/nmeth.4190

Correspondence
Published: 28 February 2017

Correcting for cell-type heterogeneity in DNA methylation: a comprehensive evaluation

Elior Rahmani¹,
Noah Zaitlen²,
Yael Baran³,
Celeste Eng²,
Donglei Hu²,
Joshua Galanter^2,4^nAff9,
Sam Oh²,
Esteban G Burchard^2,4,
Eleazar Eskin^5,6,
James Zou⁷ &
…
Eran Halperin^5,8

Nature Methods volume 14, pages 218–219 (2017)Cite this article

2921 Accesses
24 Citations
1 Altmetric
Metrics details

Subjects

This article has been updated

Access through your institution

Buy or subscribe

Rahmani et al. reply:

Zheng et al.¹ discuss potential pitfalls in our evaluation of ReFACTor², a reference-free method to account for cell-type heterogeneity. Below, we reproduce their analysis and demonstrate that conclusions cannot be drawn on the basis of their results owing to conceptual and technical flaws in their analysis. We show with our reanalysis and further evidence from experiments on a total of 10 data sets that ReFACTor has improved performance over alternative methods, including the reference-based method of Houseman et al.³.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

DNA methylation changes in cord blood and the developmental origins of health and disease – a systematic review and replication study
- Loubna Akhabir
- , Randa Stringer
- … Michael A. Zulyniak
BMC Genomics Open Access 19 March 2022
Detecting cord blood cell type-specific epigenetic associations with gestational diabetes mellitus and early childhood growth
- Tianyuan Lu
- , Andres Cardenas
- … Celia M. T. Greenwood
Clinical Epigenetics Open Access 26 June 2021
Estimands in epigenome-wide association studies
- Jochen Kruppa
- , Miriam Sieg
- … Anne Pohrt
Clinical Epigenetics Open Access 29 April 2021

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Change history

14 March 2017
In the version of this article initially published, some numbers in Table 1 did not appear in boldface. In the HTML originally posted online, the author affiliation for Elior Rahmani was incorrect; Rahmani is affiliated with only the Tel-Aviv University, Israel. The Supplementary Information file has been replaced to correct for additional callouts of Supplementary Notes in the Supplementary Figure legends. The errors have been corrected in the HTML and PDF files as of 14 March 2017.

References

Zheng, S.C. et al. Nat. Methods 14, 216–217 (2017).
Article CAS Google Scholar
Rahmani, E. et al. Nat. Methods 13, 443–445 (2016).
Article CAS Google Scholar
Houseman, E.A. et al. BMC Bioinformatics 13, 86 (2012).
Article Google Scholar
Leek, J.T. & Storey, J.D. PLoS Genet. 3, e161 (2007).
Article Google Scholar
Teschendorff, A.E. et al. Nat. Commun. 7, 10478 (2016).
Article CAS Google Scholar
Koestler, D.C. et al. BMC Bioinformatics 17, 120 (2016).
Article Google Scholar
Teschendorff, A.E., Zhuang, J. & Widschwendter, M. Bioinformatics 27, 1496–1505 (2011).
Article CAS Google Scholar
Tan, Q. et al. BMC Genomics 15, 1062 (2014).
Article Google Scholar
Liu, Y. et al. Nat. Biotechnol. 31, 142–147 (2013).
Article CAS Google Scholar
Hannon, E. et al. Genome Biol. 17, 176 (2016).
Article Google Scholar

Download references

Acknowledgements

This research was partially supported by the Edmond J. Safra Center for Bioinformatics at Tel-Aviv University, the Israel Science Foundation (1425/13 to E.R. and E.H.), US National Science Foundation grant 1331176 and United States Israel Binational Science Foundation grant 2012304 (to E.R., Y.B. and E.H.). E.R. was supported by Len Blavatnik and the Blavatnik Research Foundation. N.Z. was supported in part by a US National Institutes of Health (NIH) career development award from the NHLBI (K25HL121295). C.E., S.H., D.H., J.G., S.O. and E.G.B. were supported by the Sandler Family Foundation, the American Asthma Foundation, Hind Distinguished Professorships and NIH grants 1P60MD006902, 1R01HL117004, R21ES24844, R01Hl128439 and TRDRP 24RT-0025. E.E. was supported by NSF grants 1065276, 1302448, 1320589 and 1331176 and NIH grants R01-GM083198, R01-ES021801, R01-MH101782, R01-ES022282 and U54EB020403.

Author information

Joshua Galanter
Present address: Present address: Genentech, South San Francisco, California, USA.,

Authors and Affiliations

Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel
Elior Rahmani
Department of Medicine, University of California, San Francisco, San Francisco, California, USA
Noah Zaitlen, Celeste Eng, Donglei Hu, Joshua Galanter, Sam Oh & Esteban G Burchard
Department of Computer Science and Applied Mathematics, The Weizmann Institute of Science, Rehovot, Israel
Yael Baran
Department of Bioengineering and Therapeutic Science, University of California, San Francisco, San Francisco, California, USA
Joshua Galanter & Esteban G Burchard
Department of Computer Science, University of California, Los Angeles, Los Angeles, California, USA
Eleazar Eskin & Eran Halperin
Department of Human Genetics, University of California, Los Angeles, Los Angeles, California, USA
Eleazar Eskin
Department of Biomedical Data Science, Stanford University, Palo Alto, California, USA
James Zou
Department of Anesthesiology and Perioperative Medicine, University of California, Los Angeles, Los Angeles, California, USA
Eran Halperin

Authors

Elior Rahmani
View author publications
You can also search for this author in PubMed Google Scholar
Noah Zaitlen
View author publications
You can also search for this author in PubMed Google Scholar
Yael Baran
View author publications
You can also search for this author in PubMed Google Scholar
Celeste Eng
View author publications
You can also search for this author in PubMed Google Scholar
Donglei Hu
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Galanter
View author publications
You can also search for this author in PubMed Google Scholar
Sam Oh
View author publications
You can also search for this author in PubMed Google Scholar
Esteban G Burchard
View author publications
You can also search for this author in PubMed Google Scholar
Eleazar Eskin
View author publications
You can also search for this author in PubMed Google Scholar
James Zou
View author publications
You can also search for this author in PubMed Google Scholar
Eran Halperin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eran Halperin.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Evaluation of the number of expected spurious associations when using a small number of controls in EWAS.

The histogram of significant associations found across 100 EWAS experiments on the data used by Zheng et al. for constructing the gold standard list of “true positives”. The red line marks 23,258, the number of sites defined in the “true positives” list by Zheng et al.

Supplementary Figure 2 Capturing cell-type composition in breast cancer data using ReFACTor.

(a) A reconstruction of Figure S1 from Zheng et al., showing correlation of the cell-types and disease status (N/C) with each of the first 25 principal components of the data (n=355). Here, as well as in the following subfigures, the colors correspond to the logarithm of the P-values of the correlations. (b) The correlation of the first 25 ReFACTor components with the cell-types and disease status, as well as with the variation of the disease status that is independent of the cell composition (Adj. N/C). (c) The correlation of the first 25 SVA components (SVs) with the cell-types and with the unadjusted and adjusted disease status. (d) The mean R² levels, across the nine estimated cell-types, of linear models fitted for each cell-type using an increasing number of ReFACTor component and using an increasing number of SVs. For any given number of components, ReFACTor has better R² level than SVA.

Supplementary Figure 3 Capturing cell-type composition variation in the GALA II dataset.

(a)-(d) R² values of the linear model predicting flow-cytometric estimates for blood cell-types, as a function of the number of ReFACTor components included in the model (blue data points and lines) for the GALA II dataset (n=84). Horizontal blue lines indicate the R² values of the model using the ReFACTor components with significant likelihood ratio test (LRT) P-values (significant components are marked with squares). The reference-based estimates of the entire GALA II dataset (n=560) were used to determine the number of significant ReFACTor components. Horizontal orange lines indicate the performance of the reference-based method. (e) The mean R2 level over all cell-types.

Supplementary Figure 4 Capturing cell-type composition variation in the Koestler et al. dataset.

(a)-(f) R² values of the linear model predicting flow-cytometric estimates for blood cell-types, as a function of the number of ReFACTor components included in the model (blue data points and lines) for the Koestler et al. dataset (n=18). Horizontal blue lines indicate the R2 values of the model using the ReFACTor components with significant likelihood ratio test (LRT) P-values (significant components are marked with squares). Horizontal orange lines indicate the performance of the reference-based method. (g) The mean R² level over all cell-types.

Supplementary Figure 5 Performance for capturing cell-type composition in small data is highly variable.

(a) Sampling 100 subsets of 18 individuals with cell counts from the GALA II dataset (n=84) reveals that the performance (measured in mean R² across all cell-types) of both ReFACTor and the reference-based method are highly variable due to the small number of samples used. (b) Distribution of the performance after sampling 100 subsets of 15 individuals from the Koestler et al. data (n=18).

Supplementary Figure 6 RMT estimates of the dimension in data as a function of the sample size.

The estimated dimensions by the RMT method (Teschendorff et al. 2011) as a function of the number of samples (each time adding a new randomly selected sample) using (a) the GALA II dataset (n=560), (b) a dataset by Liu et al. (n=686) and (c)-(d) two independent datasets by Hannon et al (n=675 and n=847). Linear regression lines (indicated in red) demonstrate a nearly perfect linear relation (P-value<10⁻⁹³ in all plots).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6, Supplementary Tables 1–6, Supplementary Methods and Supplementary Notes 1–5 (PDF 720 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rahmani, E., Zaitlen, N., Baran, Y. et al. Correcting for cell-type heterogeneity in DNA methylation: a comprehensive evaluation. Nat Methods 14, 218–219 (2017). https://doi.org/10.1038/nmeth.4190

Download citation

Published: 28 February 2017
Issue Date: March 2017
DOI: https://doi.org/10.1038/nmeth.4190

This article is cited by

DNA methylation changes in cord blood and the developmental origins of health and disease – a systematic review and replication study
- Loubna Akhabir
- Randa Stringer
- Michael A. Zulyniak
BMC Genomics (2022)
Estimands in epigenome-wide association studies
- Jochen Kruppa
- Miriam Sieg
- Anne Pohrt
Clinical Epigenetics (2021)
Detecting cord blood cell type-specific epigenetic associations with gestational diabetes mellitus and early childhood growth
- Tianyuan Lu
- Andres Cardenas
- Celia M. T. Greenwood
Clinical Epigenetics (2021)
Epigenetic pacemaker: closed form algebraic solutions
- Sagi Snir
BMC Genomics (2020)
Comparing DNA methylation profiles across different tissues associated with the diagnosis of pediatric asthma
- Ping-I Lin
- Huan Shu
- Tesfaye B. Mersha
Scientific Reports (2020)

Correcting for cell-type heterogeneity in DNA methylation: a comprehensive evaluation

Subjects

Relevant articles

DNA methylation changes in cord blood and the developmental origins of health and disease – a systematic review and replication study

Detecting cord blood cell type-specific epigenetic associations with gestational diabetes mellitus and early childhood growth

Estimands in epigenome-wide association studies

Access options

Change history

14 March 2017

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary Figure 1 Evaluation of the number of expected spurious associations when using a small number of controls in EWAS.

Supplementary Figure 2 Capturing cell-type composition in breast cancer data using ReFACTor.

Supplementary Figure 3 Capturing cell-type composition variation in the GALA II dataset.

Supplementary Figure 4 Capturing cell-type composition variation in the Koestler et al. dataset.

Supplementary Figure 5 Performance for capturing cell-type composition in small data is highly variable.

Supplementary Figure 6 RMT estimates of the dimension in data as a function of the sample size.

Supplementary information

Supplementary Text and Figures

Rights and permissions

About this article

Cite this article

This article is cited by

DNA methylation changes in cord blood and the developmental origins of health and disease – a systematic review and replication study

Estimands in epigenome-wide association studies

Detecting cord blood cell type-specific epigenetic associations with gestational diabetes mellitus and early childhood growth

Epigenetic pacemaker: closed form algebraic solutions

Comparing DNA methylation profiles across different tissues associated with the diagnosis of pediatric asthma

Correcting for cell-type heterogeneity in epigenome-wide association studies: revisiting previous analyses

Editor's note on Zheng et al. and Rahmani et al.

Search

Quick links

Subjects

Relevant articles

Access options

Change history

14 March 2017

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links