Naive and primed pluripotent human embryonic stem cells bear transcriptional similarity to pre- and post-implantation epiblast and thus constitute a developmental model for understanding the pluripotent stages in human embryo development. To identify new transcription factors that differentially regulate the unique pluripotent stages, we mapped open chromatin using ATAC-seq and found enrichment of the activator protein-2 (AP2) transcription factor binding motif at naive-specific open chromatin. We determined that the AP2 family member TFAP2C is upregulated during primed to naive reversion and becomes widespread at naive-specific enhancers. TFAP2C functions to maintain pluripotency and repress neuroectodermal differentiation during the transition from primed to naive by facilitating the opening of enhancers proximal to pluripotency factors. Additionally, we identify a previously undiscovered naive-specific POU5F1 (OCT4) enhancer enriched for TFAP2C binding. Taken together, TFAP2C establishes and maintains naive human pluripotency and regulates OCT4 expression by mechanisms that are distinct from mouse.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors thank the UCLA Broad Stem Cell Research Center (BSCRC) Flow Cytometry core and the UCLA BSCRC High Throughput Sequencing Core for technical assistance. W.A.P. was supported by the Jane Coffin Childs Memorial Fund for Medical Research and a UCLA BSCRC Postdoctoral Training Fellowship. D.C. is supported by a UCLA BSCRC Postdoctoral Training Fellowship. W.L. is supported by the Philip J. Whitcome Fellowship from the UCLA Molecular Biology Institute and a scholarship from the Chinese Scholarship Council. Work was funded by R01 HD079546 (ATC) and a NHMRC project grant APP1104560 (to J.M.P.) and a Sylvia and Charles Viertel Senior Medical Research Fellowships (to J.M.P.). All work with human pre-implantation embryos was funded by UCLA BSCRC and not the National Institute of Health. S.E.J. is a fellow of the Howard Hughes Medical Institute.
Integrated supplementary information
a, Size distribution of distance between paired end reads in ATAC-seq libraries used in this paper. Note the spikes corresponding to helix pitch and the peaks at nucleosome distance, consistent with a successful ATAC-seq library. b, Metaplot of ATAC-seq read density over genes categorized by RPKM. The positive correlation between gene RPKM and ATAC-seq read density near the TSS further validates the ATAC-seq libraries. c, Location of ATAC-seq peak summits in primed and naive cells, and for primed or naive-specific summits, defined by eightfold enrichment in one state relative to the other. Fraction in each category is shown in absolute terms and relative to the genome as a whole. Note that naive and primed-specific peaks are less promoter-enriched than the general sets of primed and naive peaks, reflecting the fact that most promoters are open in both conditions, but enhancer utilization changes more strongly. d, Schematic for assessing the role of a given peak in regulating adjacent genes. All interactions between the peak and the adjacent genes are assessed and binned. Genes with multiple annotated TSS (e.g. Gene C) and TSS with multiple genes (e.g. Gene E and Gene F) are excluded from the analysis. e,f Percentage the time a gene whose transcriptional start site is a given distance from a naive-specific ATAC peak (e) or primed-specific ATAC peak (f) is upregulated or downregulated in the naive state. The left panel shows frequencies for ATAC peaks upstream of the gene TSS and the right panel shows frequencies for ATAC peaks downstream of gene TSS. ATAC peaks are strongly predictive of gene upregulation in in both cases.
All ATAC-seq or 5mC-seq data are plotted over the naive-specific set defined in 5iLAF (5032 peaks), a subset that contained an AP2 motif but no KLF motif (1054 peaks), a subset that contained a KLF motif but no AP2 motif (1551 peaks) and primed-specific peaks (2562 peaks). a, Blastocyst ATAC-seq reads are strongly enriched over naive-specific ATAC-seq peaks, including the AP2+KLF- and KLF+AP2- motif subsets, but are less enriched over primed-specific peaks. b, DNA methylation from primed and naive cells plotted over different ATAC-seq peak sets. Note that cells cultured in t2iLGö show drops in methylation over all naive-specific peak sets, including the AP2+ peak set, indicating that this same regions are likely to be open chromatin in the t2iLGö conditions. c, DNA methylation from oocyte and blastocyst are plotted relative to ATAC-seq peak sets. Note sharp loss of DNA methylation in blastocyst over the naive-specific sets. d, After fertilization, the paternal genome is demethylated while the maternal genome remains methylated, so thoughout the blastocyst genome the DNA methylation level is very close to half the Oocyte methylation level12. Thus, we plotted (methylation in blastocyst – 50% methylation level in oocyte) to determine regions that have undergone localized demethylation during embryogenesis. Sharp demethylation is observed over naive-specific peak sets, validating their identity as enhancers.
Supplementary Fig. 3 TFAP2C deficient CRISPR lines were generated and are phenotypically normal in the primed state.
a, Diagram indicating mutations observed in the two TFAP2C−/− deficient lines used in this paper. Note that both mutations cause frame shifts. b, Control and TFAP2C−/− lines are karyotypically normal. c,d,c, Expression of OCT4 (c), NANOG (d), SOX2 (e) in control and TFAP2C−/− hESCs cultured in primed conditions. The OCT4 blot is representative of 3 independent experiments. f, TRA-1-85 positive (human) cells were gated and stained for the pluripotency markers TRA-1-61 and TRA-1-80. Data represent 1 out of 3 independent experiments with similar results. g, Expression of lineage markers in primed hESCs and in embryoid bodies formed from control and TFAP2C−/− primed hESCs (n = 4 biological replicates for primed, WT EB and TFAPC2−/− EB) Mean +/− standard error is shown. Note loss of pluripotency markers upon EB differentiation and gain of neural and non-neural ectoderm markers, with a neural bias in TFAP2C−/−. Uncropped Western blot images are available in Supplementary Fig. 9. Source data for g is in Supplementary Table 8.
a,b Untransfected UCLA1 hESCs, a Control line nucleofected in parallel with the TFAP2C−/− lines, and TFAP2C−/− lines 1 and 2 are treated with 5iLAF media for 3 days (a) and 5 days (b). The TFAP2C−/− colonies show formation of morphologically distinct, flat colonies by 3 days. Scale bar indicates 100 μm. c, Top statistically enriched GO terms of genes upregulated in the TFAP2C−/− cells, calculated using a hypergeometric test with adjustment for multiple hypothesis testing33. All terms are consistent with neural differentiation of the TFAP2C−/− of cells. Terms specific to neural identity are colored blue. d, ATAC-seq peaks specific to WT cells relative to TFAP2C−/−cells after five days in 5iLAF show strong enrichment for AP2 motifs and for factors involved in pluripotency (OCT4, SOX, KLF), consistent with a loss of pluripotency in these cells. e, ATAC-seq peaks specific to TFAP2C−/− relative to WT cells after five days in 5iLAF show enrichment for motifs corresponding to such neural factors as SOX (e.g. SOX1) and ZIC (e.g. ZIC1). Motif enrichment was calculated using a cumulative binomial distribution19. f, Schematic for targeting generation of Tfap2c−/− and Tfap2a−/− Tfap2c−/− mESCs. Note that all deletions either induce frame shifts or delete splice sites. g, Western blot for Tfap2c in control and Tfap2c−/− lines. Representative of 2 independent experiments. Uncropped Western blot images are available in Supplementary Fig. 9.
a, RPKM for core and naive pluripotency factors upon TFAP2C induction in primed cells. Note that overexpression of TFAP2C does not result in upregulation of naive pluripotency markers. n = 1 replicate for all samples. b, Metaplot of TFAP2C ChIP-seq data upon TFAP2C induction in primed state. When TFAP2C is overexpressed in primed conditions, it primarily hones to regions of conserved openness in primed and naive cells (left pane) rather than naive-specific ATAC-seq peaks (right panel). c, Ectopic expression of TFAP2C rescues morphological abnormality found in TFAP2C−/− upon reversion in 5iLAF. Scale bar indicates 100 μm. d, Thirteen days after withdrawal of TFAP2C, very few colonies are visible in the sample in which doxycycline was withdrawn compared with the sample in which it remained. Scale bar indicates 100 μm. Results are representative of two independent experiments. e. Quantification of reduced cell number upon withdrawal compared with sample in which doxycycline treatment continued. f, Shift toward primed SSEA4+ phenotype in sample in which doxycycline had been withdrawn. Source data for a is in Supplementary Table 8.
a, Second reversion of control and TFAP2C−/− cells. Bright-field images of cells are shown. Cells were then sorted and TRA-1-85+ (human) cells were gated into SSEA4- negative and positive populations. In this reversion many SSEA4- cells were apparent, but subsequent RNA-seq indicated that they showed low expression of pluripotency genes and elevated neural gene expression, indicating that they were differentiated rather than naive. Scale bar indicates 100 μm. b, Expression (RPKM) of pluripotency markers, naive-specific markers, and primed markers of control and TFAP2C−/− cells relative to primed controls. Ratio is shown for n = 2 biologically independent replicates (Control cells, TFAP2C−/− SSEA4-) or n = 3 replicates TFAP2C−/− SSEA4+ over n = 2 primed control biological independent replicates. c, Principle component analysis of RNA-seq from all pluripotent datasets. Embryoid body differentiation, day 5 5iLAF TFAP2C−/− cells, and 5% O2 reversion 2 TFAP2C−/− SSEA4- cells were excluded because they diverged strongly due to loss of pluripotency. Blue dots: TFAP2C−/− cells in 5% O2 show a shift toward primed-like gene expression relative to control cells. Brown dots: no shift toward naive identity is observed in cells which overexpress TFAP2C in primed media conditions. Green dots, rescue of TFAP2C−/− cells with doxycycline-inducible TFAP2C is indicated by partial shift toward naive identity. d, e, Average RPKM in the indicated cell type is plotted for all genes previously determined6 to be downregulated (d) or upregulated (e) upon the transition from pre-implantation to post-implantation epiblast in primates. The box and whiskers are plotted by the Tukey method. f, Immunofluorescent images of KLF17 and TRA-1-60 staining in control and TFAP2C−/− cells cultured in t2iLGöY. Data represent 1 of 2 independent experiments with similar results. Scale bars, 20μm. g, Ratio of KLF17 expression in t2iLGöY/primed conditions for WT control and TFAPC2−/− cells. Data are for n = 2 control and n = 3 TFAP2C−/− biologically independent replicates in both naive and primed conditions. Source data for b is in Supplementary Table 8.
a, Frequency with which gene whose transcriptional start site is a given distance from a TFAP2C ChIP-seq peak summit that overlaps with a region of conserved openness in naive and primed cells is positively or negatively regulated by TFAP2C. Note that for this subset of TFAP2C ChIP peaks, there is almost no predictive effect. This indicates that TFAP2C primarily mediates its regulatory role in the context of TFAP2C-dependent open chromatin sites. b, Distance of sites in a, to nearest transcription start site. Note that many are promoter associated. c-e ATAC-seq and ChIP-seq data are shown in the viscinity of c, KLF5 d, NANOG e, FGF4. The control low O2 track is the SSEA4- population, the TFAP2C−/− low O2 is the SSEA4+ population.
a, Regulatory elements in the proximity of OCT4 locus. The conservation track shows one dot for each base, with the degree of conservation in placental mammals indicated by the value in the Y-axis. Note higher conservation over regulatory elements and coding sequence. b, Sashimi plot showing splice events over POU5F1 locus in naive hESCs. Some RNA-seq reads are observed over the POU5F1 Intron Element 1, possibly enhancer RNA or intronic RNA, but there are no splice events linking these to OCT4 transcripts. This indicates that the Intron Element 1 is not an alternative TSS. c, Normal karyotype in intron enhancer deletant and WT control cells. d, Western blot for OCT4 in Control and ΔIntron enhancer 1 in primed conditions. Uncropped Western blot images are available in Supplementary Fig. 9.
Uncropped images of all Western blots are shown, as well as the approximate extent of the cropped region. Note that because it was our procedure to stain target and loading control simultaneously, cross-reactive secondary antibody sometimes causes an H3 band to appear in target blot.
Supplementary Figs 1–9 and Supplementary Table legends
Description of all cell populations used for next generation sequencing libraries
Human ATAC and ChIP peak sets used in analysis
Description of embryos used for ATAC-seq
Murine ATAC peak sets used in analysis
RPKM of all RNAseq-samples and differentially regulated gene sets
Genes within 50 kb of TFAP2C-dependent regulatory element
GREAT analysis of TFAP2C-dependent regulatory elements
Statistical source data