Polycomb-like proteins link the PRC2 complex to CpG islands


The Polycomb repressive complex 2 (PRC2) mainly mediates transcriptional repression1,2 and has essential roles in various biological processes including the maintenance of cell identity and proper differentiation. Polycomb-like (PCL) proteins, such as PHF1, MTF2 and PHF19, are PRC2-associated factors that form sub-complexes with PRC2 core components3, and have been proposed to modulate the enzymatic activity of PRC2 or the recruitment of PRC2 to specific genomic loci4,5,6,7,8,9,10,11,12,13. Mammalian PRC2-binding sites are enriched in CG content, which correlates with CpG islands that display a low level of DNA methylation14. However, the mechanism of PRC2 recruitment to CpG islands is not fully understood. Here we solve the crystal structures of the N-terminal domains of PHF1 and MTF2 with bound CpG-containing DNAs in the presence of H3K36me3-containing histone peptides. We show that the extended homologous regions of both proteins fold into a winged-helix structure, which specifically binds to the unmethylated CpG motif but in a completely different manner from the canonical winged-helix DNA recognition motif. We also show that the PCL extended homologous domains are required for efficient recruitment of PRC2 to CpG island-containing promoters in mouse embryonic stem cells. Our research provides the first, to our knowledge, direct evidence to demonstrate that PCL proteins are crucial for PRC2 recruitment to CpG islands, and further clarifies the roles of these proteins in transcriptional regulation in vivo.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: PHF1 domain architecture, its free form structure and the binding analysis with various double-stranded DNAs.
Figure 2: Structural details of PHF1 with bound DNA, mutational analysis of the PCL cassettes, and identification of DNA motifs recognized by PHF1 and MTF2 through protein-binding microarrays.
Figure 3: Binding analysis of the PHF1 and MTF2 cassettes with various histone peptides and structural details of PHF1/MTF2 cassette–H3K36me3–DNA ternary complexes.
Figure 4: The MTF2 EH domain is essential for PRC2 recruitment in mouse ES cells.

Accession codes

Primary accessions

Gene Expression Omnibus

Protein Data Bank


  1. 1

    Comet, I., Riising, E. M., Leblanc, B. & Helin, K. Maintaining cell identity: PRC2-mediated regulation of transcription and cancer. Nat. Rev. Cancer 16, 803–810 (2016)

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2

    Margueron, R. & Reinberg, D. The Polycomb complex PRC2 and its mark in life. Nature 469, 343–349 (2011)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  3. 3

    Hauri, S. et al. A high-density map for navigating the human Polycomb complexome. Cell Reports 17, 583–595 (2016)

    CAS  PubMed  Google Scholar 

  4. 4

    Ballaré, C. et al. Phf19 links methylated Lys36 of histone H3 to regulation of Polycomb activity. Nat. Struct. Mol. Biol. 19, 1257–1265 (2012)

    PubMed  PubMed Central  Google Scholar 

  5. 5

    Boulay, G., Rosnoblet, C., Guérardel, C., Angrand, P. O. & Leprince, D. Functional characterization of human Polycomb-like 3 isoforms identifies them as components of distinct EZH2 protein complexes. Biochem. J. 434, 333–342 (2011)

    CAS  PubMed  Google Scholar 

  6. 6

    Brien, G. L. et al. Polycomb PHF19 binds H3K36me3 and recruits PRC2 and demethylase NO66 to embryonic stem cell genes during differentiation. Nat. Struct. Mol. Biol. 19, 1273–1281 (2012)

    CAS  PubMed  Google Scholar 

  7. 7

    Cai, L. et al. An H3K36 methylation-engaging Tudor motif of polycomb-like proteins mediates PRC2 complex targeting. Mol. Cell 49, 571–582 (2013)

    CAS  PubMed  Google Scholar 

  8. 8

    Cao, R. et al. Role of hPHF1 in H3K27 methylation and Hox gene silencing. Mol. Cell. Biol. 28, 1862–1872 (2008)

    CAS  PubMed  Google Scholar 

  9. 9

    Casanova, M. et al. Polycomblike 2 facilitates the recruitment of PRC2 Polycomb group complexes to the inactive X chromosome and to target loci in embryonic stem cells. Development 138, 1471–1482 (2011)

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10

    Hunkapiller, J. et al. Polycomb-like 3 promotes polycomb repressive complex 2 binding to CpG islands and embryonic stem cell self-renewal. PLoS Genet. 8, e1002576 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11

    Musselman, C. A. et al. Molecular basis for H3K36me3 recognition by the Tudor domain of PHF1. Nat. Struct. Mol. Biol. 19, 1266–1272 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  12. 12

    Walker, E. et al. Polycomb-like 2 associates with PRC2 and regulates transcriptional networks during mouse embryonic stem cell self-renewal and differentiation. Cell Stem Cell 6, 153–166 (2010)

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13

    Sarma, K., Margueron, R., Ivanov, A., Pirrotta, V. & Reinberg, D. Ezh2 requires PHF1 to efficiently catalyze H3 lysine 27 trimethylation in vivo. Mol. Cell. Biol. 28, 2718–2731 (2008)

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14

    Ku, M. et al. Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains. PLoS Genet. 4, e1000242 (2008)

    PubMed  PubMed Central  Google Scholar 

  15. 15

    Kycia, I. et al. The Tudor domain of the PHD finger protein 1 is a dual reader of lysine trimethylation at lysine 36 of histone H3 and lysine 27 of histone variant H3t. J. Mol. Biol. 426, 1651–1660 (2014)

    CAS  PubMed  Google Scholar 

  16. 16

    Qin, S. et al. Tudor domains of the PRC2 components PHF1 and PHF19 selectively bind to histone H3K36me3. Biochem. Biophys. Res. Commun. 430, 547–553 (2013)

    CAS  PubMed  Google Scholar 

  17. 17

    Holm, L. & Rosenström, P. Dali server: conservation mapping in 3D. Nucleic Acids Res. 38, W545–9 (2010)

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18

    Callebaut, I. & Mornon, J. P. The PWAPA cassette: Intimate association of a PHD-like finger and a winged-helix domain in proteins included in histone-modifying complexes. Biochimie 94, 2006–2012 (2012)

    CAS  PubMed  Google Scholar 

  19. 19

    Clark, K. L., Halay, E. D., Lai, E. & Burley, S. K. Co-crystal structure of the HNF-3/fork head DNA-recognition motif resembles histone H5. Nature 364, 412–420 (1993)

    ADS  CAS  PubMed  Google Scholar 

  20. 20

    Biggs, W. H., III, Cavenee, W. K. & Arden, K. C. Identification and characterization of members of the FKHR (FOX O) subclass of winged-helix transcription factors in the mouse. Mamm. Genome 12, 416–425 (2001)

    CAS  PubMed  Google Scholar 

  21. 21

    Jones, P. A. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat. Rev. Genet. 13, 484–492 (2012)

    CAS  PubMed  Google Scholar 

  22. 22

    Gajiwala, K. S. et al. Structure of the winged-helix protein hRFX1 reveals a new mode of DNA binding. Nature 403, 916–921 (2000)

    ADS  CAS  PubMed  Google Scholar 

  23. 23

    Berger, M. F. et al. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24, 1429–1435 (2006)

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24

    Li, X. et al. Mammalian polycomb-like Pcl2/Mtf2 is a novel regulatory component of PRC2 that can differentially modulate polycomb activity both at the Hox gene cluster and at Cdkn2a genes. Mol. Cell. Biol. 31, 351–364 (2011)

    ADS  CAS  PubMed  Google Scholar 

  25. 25

    Kloet, S. L. et al. The dynamic interactome and genomic targets of Polycomb complexes during stem-cell differentiation. Nat. Struct. Mol. Biol. 23, 682–690 (2016)

    CAS  PubMed  PubMed Central  Google Scholar 

  26. 26

    Voo, K. S., Carlone, D. L., Jacobsen, B. M., Flodin, A. & Skalnik, D. G. Cloning of a mammalian transcriptional activator that binds unmethylated CpG motifs and shares a CXXC domain with DNA methyltransferase, human trithorax, and methyl-CpG binding domain protein 1. Mol. Cell. Biol. 20, 2108–2121 (2000)

    CAS  PubMed  Google Scholar 

  27. 27

    Xu, C., Bian, C., Lam, R., Dong, A. & Min, J. The structural basis for selective binding of non-methylated CpG islands by the CFP1 CXXC domain. Nat. Commun. 2, 227 (2011)

    ADS  PubMed  PubMed Central  Google Scholar 

  28. 28

    Berger, M. F. & Bulyk, M. L. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nat. Protocols 4, 393–411 (2009)

    CAS  PubMed  Google Scholar 

  29. 29

    Xiao, S. et al. Comparative epigenomic annotation of regulatory DNA. Cell 149, 1381–1392 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30

    Adams, P. D . et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D 66, 213–221 (2010)

    CAS  Google Scholar 

  31. 31

    Langer, G., Cohen, S. X., Lamzin, V. S. & Perrakis, A. Automated macromolecular model building for X-ray crystallography using ARP/wARP version 7. Nat. Protocols 3, 1171–1179 (2008)

    CAS  Google Scholar 

  32. 32

    Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D 66, 486–501 (2010)

    CAS  Google Scholar 

  33. 33

    Sanjana, N. E., Shalem, O. & Zhang, F. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods 11, 783–784 (2014)

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34

    Liefke, R., Karwacki-Neisius, V. & Shi, Y. EPOP interacts with Elongin BC and USP7 to modulate the chromatin landscape. Mol. Cell 64, 659–672 (2016)

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35

    Kalb, R. et al. Histone H2A monoubiquitination promotes histone H3 methylation in Polycomb repression. Nat. Struct. Mol. Biol. 21, 569–571 (2014)

    CAS  PubMed  Google Scholar 

  36. 36

    Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protocols 7, 562–578 (2012)

    CAS  PubMed  Google Scholar 

  37. 37

    Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009)

    PubMed  PubMed Central  Google Scholar 

  38. 38

    Ramírez, F., Dündar, F., Diehl, S., Grüning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014)

    PubMed  PubMed Central  Google Scholar 

  39. 39

    Liu, T. et al. Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol. 12, R83 (2011)

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40

    Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004)

    PubMed  PubMed Central  Google Scholar 

  41. 41

    Bailey, T. L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009)

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank the staff from the BL17U1, BL18U1 and BL19U1 beamlines of National Facility for Protein Science in Shanghai (NFPS) at Shanghai Synchrotron Radiation Facility (SSRF) in China for assistance during data collection, and C. Yun for help in data collection in the United States. We thank G. Jiang for help with ITC studies. We thank A. Jambhekar for critical reading of the manuscript. This work was supported by the National Natural Science Foundation of China (31370719 and 31570729), Beijing Natural Science Foundation (5152015) and the Fundamental Research Funds for the Central Universities (2014KJJCA09 and 2017EYT19) to Z.W. Early research on PCL protein expression undertaken by Z.W. in the laboratory of D.J.P. was supported by the Leukemia and Lymphoma Society and by the Memorial Sloan-Kettering Cancer Center Core Grant (P30 CA008748). Research on PCL proteins was supported by the German Research Foundation (DFG, LI 2057/1-1) to R.L, NIH/NHGRI R01 grant HG003985 to M.L.B., and the National Cancer Institute (R01 CA118487) and funds from Boston Children’s Hospital to Y.S. Y.S. is an American Cancer Society Research Professor.

Author information




H.L. performed the protein expression, purification and the crystallographic studies. R.L. performed experiments in mouse ES cells, genome-wide analyses and MTF2 complex experiments. J.J. did the ITC and EMSA assays. J.V.K. performed protein binding microarray experiments. W.T., P.D., W.Z. and Q.H. assisted in cloning and protein purification. M.L.B. supervised the protein binding microarray research. All authors analysed the data. Z.W. initialized the project, determined the crystal structures, designed the experiments with R.L. and Y.S., and wrote the paper with the help of R.L., D.J.P., J.V.K., M.L.B. and Y.S.

Corresponding author

Correspondence to Zhanxin Wang.

Ethics declarations

Competing interests

Y.S. is a co-founder of Constellation Pharmaceuticals and a member of its scientific advisory board, and a consultant for Active Motif, Inc. The remaining authors declare no competing financial interests.

Additional information

Reviewer Information Nature thanks L. Chen, L. Di Croce and R. Klose for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Figure 1 Sequence alignment of human PCL proteins, or the EH/WH regions from various species.

a, Sequence alignment of the N-terminal domains of human PCL proteins. Residues with high similarity are coloured in red. Key residues mentioned in the text are highlighted yellow and indicated with blue triangles at the bottom. b, Sequence alignment of the EH domains from various species of PCL proteins and two typical winged-helix motifs. Conserved IKK(K/R)K motifs within the W1 loop of various PCL proteins are indicated in a blue box. Species abbreviations: h, Homo sapiens; m, Mus musculus; dr, Danio rerio; xl, Xenopus laevis; dm, Drosophila melanogaster.

Extended Data Figure 2 Binding analysis of PCL proteins with different CpG-motif substitutions or with CpG-containing DNAs varying in their flanking sequences.

a, EMSA results of the PHF1(26–360) fragment with different DNA duplexes bearing base substitutions in the CpG motif. b, c, EMSA results of PHF1(165–360) (b) or MTF2(180–378) (c) with various NCpGN-containing DNA motifs; N denotes any DNA base. The protein-to-DNA molar ratio is shown at the top. Data are representative of at least three independent experiments. Uncropped gels are shown in Supplementary Fig. 1.

Extended Data Figure 3 Comparisons of DNA-bound PHF1 or MTF2 EH domains with two DNA-bound winged-helix motifs and a CXXC domain.

a, Electrostatic surface of the PHF1 cassette, with basic regions shown in blue and acidic regions in red. Bound DNA is shown in a cartoon representation. b, Superimposition of the PHF1-bound DNA (coloured in orange) with a canonical B-form DNA (coloured in blue; PDB code 1HQ7). ce, Comparison of the DNA-recognizing details of the PHF1 EH domain (c) with the winged-helix motifs of HNF-3γ (d; PDB code 1VTN) and hRFX1 (e; PDB code 1DP7) when all three domains were structurally aligned. fh, Comparison of the CpG-recognition details of the MTF2 EH domain (f) and the PHF1 EH domain (g) with that of the CFP1 CXXC (h; PDB code 3QMC). Of note, both cytosine residues of the CpG duplex form hydrogen bonds with the main-chain carbonyl oxygens, while both guanines of the CpG duplex were also recognized by forming hydrogen bonds with the side chains.

Extended Data Figure 4 Creation of MTF2-knockout mouse ES cells and qPCR experiments.

a, Representative western blot of endogenous MTF2 in mouse ES cells. Three distinct isoforms are indicated. b, Schematic overview of the three MTF2 isoforms and their corresponding translational start sites. Positions of four test CRISPR gRNA targets are shown. c, Western blot of mouse ES cells expressing a control of CRISPR construct or CRISPR constructs targeting the Mtf2 gene as depicted in b. CRISPR 4 (in red) was used to obtain single cell clones. d, Sequence validation of two single cell clones. e, Western blotting of nucleoplasm and chromatin fractions from two MTF2-knockout clones and control cells. Data are representative of two independent experiments. f, g, RT–qPCR of control cells and two MTF2-knockout clones (f) or control, knockout, or MTF2-knockout cells rescued with wild-type or MTF2(Lys339Ala) (g). Data are mean ± s.d. of three biological replicates. h, ChIP–qPCR experiments in control, MTF2-knockout, and rescued cells with the antibodies shown. Data are mean ± s.d. of two biological replicates. Uncropped blots are shown in Supplementary Fig. 1.

Extended Data Figure 5 Analysis of the ChIP–seq experiments and PHF19-knockdown ChIP–seq data.

a, Comparison of normalized ChIP–seq promoter reads (as in Fig. 4f) of three biological replicates for MTF2, SUZ12 and H3K27me3. The whisker-box plots represent the lower quartile, median and upper quartile of the data with 5% and 95% whiskers. b, Comparison of MTF2 ChIP–seq data in control and MTF2-knockout cells (replicate 3) at the three promoter groups described in Fig. 4a. c, Promoter profiles of SUZ12 and H3K27me3 in control and PHF19-knockdown cells using publically available data10.

Extended Data Figure 6 EMSA and HMTase experiments with purified MTF2–PRC2 complex.

a, Silver staining of purified wild-type or Lys339Ala mutant human MTF2–PRC2 complexes (and mock control) from HeLa-S cells. F/H, Flag–HA-tagged. b, Western blotting of the eluates from a. c, EMSA experiment with equal volumes (0, 0.5, 1, 2, 3 μl) of the eluates using the 12-mer-CpG sequence. Data are representative of two independent experiments. d, Histone methyltransferase (HMTase) experiment using equal volumes (15 μl) of the eluates from a. Two technical replicates are shown. H3K27me3 levels were investigated by western blotting. Uncropped blots are shown in Supplementary Fig. 1.

Extended Data Table 1 X-ray statistics of the PHF1 and MTF2 Tudor–PHD1–PHD2–EH cassettes in the free or DNA- and/or histone-bound states
Extended Data Table 2 ITC-based binding affinity measurements for the PCL cassettes or their mutants with DNAs or histones
Extended Data Table 3 Names and sequences of the double-stranded DNAs used
Extended Data Table 4 Primers used for ChIP–qPCR and RT–qPCR

Supplementary information

Supplementary Figures

This file contains Supplementary Figure 1: uncropped blot and EMSA images. (PDF 4734 kb)

Reporting Summary (PDF 80 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, H., Liefke, R., Jiang, J. et al. Polycomb-like proteins link the PRC2 complex to CpG islands. Nature 549, 287–291 (2017). https://doi.org/10.1038/nature23881

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.