Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies

Article metrics


In epigenome-wide association studies (EWAS), different methylation profiles of distinct cell types may lead to false discoveries. We introduce ReFACTor, a method based on principal component analysis (PCA) and designed for the correction of cell type heterogeneity in EWAS. ReFACTor does not require knowledge of cell counts, and it provides improved estimates of cell type composition, resulting in improved power and control for false positives in EWAS. Corresponding software is available at

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: The fraction of variance explained in each of the cell types for which cell counts were available in the GALA II data set (78 samples).
Figure 2: Results of the RA methylation analysis, presented by quantile–quantile plots of the −log10 P values for the association tests.

Accession codes

Primary accessions

Gene Expression Omnibus


  1. 1

    Jaffe, A.E. & Irizarry, R.A. Genome Biol. 15, R31 (2014).

  2. 2

    Zou, J., Lippert, C., Heckerman, D., Aryee, M. & Listgarten, J. Nat. Methods 11, 309–311 (2014).

  3. 3

    Houseman, E.A. et al. BMC Bioinformatics 13, 86 (2012).

  4. 4

    Reinius, L.E. et al. PLoS One 7, e41361 (2012).

  5. 5

    Houseman, E.A., Molitor, J. & Marsit, C.J. Bioinformatics 30, 1431–1439 (2014).

  6. 6

    Koestler, D.C. et al. Epigenetics 8, 816–826 (2013).

  7. 7

    Pino-Yanes, M. et al. J. Allergy Clin. Immunol. 135, 228–235 (2015).

  8. 8

    Liu, Y. et al. Nat. Biotechnol. 31, 142–147 (2013).

  9. 9

    Goronzy, J.J. et al. J. Clin. Invest. 94, 2068–2076 (1994).

  10. 10

    Horvath, S. Genome Biol. 14, R115 (2013).

  11. 11

    Singmann, P. et al. Epigenetics Chromatin 8, 43 (2015).

  12. 12

    Zeilinger, S. et al. PLoS One 8, e63812 (2013).

  13. 13

    Shoemaker, R., Deng, J., Wang, W. & Zhang, K. Genome Res. 20, 883–889 (2010).

  14. 14

    Wagner, J.R. et al. Genome Biol. 15, R37 (2014).

  15. 15

    Halko, N., Martinsson, P.G. & Tropp, J.A. SIAM Rev. 53, 217–288 (2011).

  16. 16

    Abraham, G. & Inouye, M. PLoS One 9, e93766 (2014).

  17. 17

    Maksimovic, J., Gordon, L. & Oshlack, A. Genome Biol. 13, R44 (2012).

  18. 18

    Johnson, W.E., Li, C. & Rabinovic, A. Biostatistics 8, 118–127 (2007).

  19. 19

    Aryee, M.J. et al. Bioinformatics 30, 1363–1369 (2014).

Download references


The authors acknowledge the families and patients for their participation and thank the numerous health care providers and community clinics involved for their support and participation in GALA II. The research was partially supported by the Edmond J. Safra Center for Bioinformatics at Tel-Aviv University. E.H. and E.R. were supported in part by the Israel Science Foundation (Grant 1425/13), Y.B. and E.H. by the United States-Israel Binational Science Foundation (Grant 2012304). Y.B., E.H., and E.R. were partially supported by the German-Israeli Foundation (Grant 1094-33.2/2010) and by the National Science Foundation (Grant III-1217615). E.R. was supported by Len Blavatnik and the Blavatnik Family Foundation. E.E. was supported by National Science Foundation grants 1065276, 1302448, 1320589 and 1331176, and National Institutes of Health grants R01-GM083198, R01-ES021801, R01-MH101782, R01-ES022282 and U54EB020403. This research was supported in part by the Sandler Foundation, the American Asthma Foundation, and the National Institutes of Health (R01 ES015794, R01 HL088133, M01 RR000083, R01 HL078885, R01 HL104608, P60 MD006902, U19 AI077439, M01 RR00188). N.Z. was supported in part by an NIH career development award from the NHLBI (K25HL121295). J.G. was supported in part by NIH training grants GM007546, K23HL111636, and KL2TR000143 and by the Hewett Fellowship.

Author information

E.R. and E.H. designed research, performed research, contributed analytic tools, analyzed data and wrote the paper. N.Z. and E.E. helped with experimental design, data interpretation, and drafting of the paper. Y.B. and J.Z. contributed expertise. C.E., D.H., J.G., S.O. and E.G.B. generated and contributed the data. D.H. also performed quality control analysis.

Correspondence to Eran Halperin.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–16, Supplementary Tables 1–4 and Supplementary Note 1 (PDF 1529 kb)

Supplementary Text

NMETH-BC25074F.pdf (PDF 607 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading