The transcriptional regulator ZNF398 mediates pluripotency and epithelial character downstream of TGF-beta in human PSCs

Human pluripotent stem cells (hPSCs) have the capacity to give rise to all differentiated cells of the adult. TGF-beta is used routinely for expansion of conventional hPSCs as flat epithelial colonies expressing the transcription factors POU5F1/OCT4, NANOG, SOX2. Here we report a global analysis of the transcriptional programme controlled by TGF-beta followed by an unbiased gain-of-function screening in multiple hPSC lines to identify factors mediating TGF-beta activity. We identify a quartet of transcriptional regulators promoting hPSC self-renewal including ZNF398, a human-specific mediator of pluripotency and epithelial character in hPSCs. Mechanistically, ZNF398 binds active promoters and enhancers together with SMAD3 and the histone acetyltransferase EP300, enabling transcription of TGF-beta targets. In the context of somatic cell reprogramming, inhibition of ZNF398 abolishes activation of pluripotency and epithelial genes and colony formation. Our findings have clear implications for the generation of bona fide hPSCs for regenerative medicine. The downstream pathway regulating how TGF-beta affects pluripotency of human PSCs is unclear. Here, the authors find that transcription factor ZNF398 binds active promoters/enhancers together with the histone acetyltransferase EP300 and SMAD3, enabling expression of pluripotency and epithelial genes.

H uman pluripotent stem cells (hPSCs) have been derived from human blastocysts as human embryonic stem cells (hESCs 1 ) or from somatic cells via transcription factormediated reprogramming as induced pluripotent stem cells (hiPSCs 2,3 ). Initially hPSCs were cultured on layers of inactivated fibroblasts (feeder cells), which produce several adhesion and signalling molecules. Alternatively, the medium was conditioned, or enriched by unknown secreted factors, by fibroblasts before their use with hPSCs. Such poorly defined culture systems represented a hurdle to the identification of key signals regulating pluripotency. Importantly, chemically defined conditions for the expansion of hPSCs have been reported [4][5][6][7] . Despite variations in the media composition, ligands of the TGF-beta family are invariably added or produced by feeder cells 8,9 . Indeed, TGF-beta signalling has been shown to be critical for the maintenance of pluripotency in hPSCs 10,11 . However, the mechanisms of action of the TGF-beta signal remain poorly characterised.
TGF-beta ligands such as TGF-beta1/2/3 (TGFB1/2/3), Nodal and Activin A bind a dimer of type II serine/threonine kinase receptors, which in turn phosphorylate and activate two type I receptors, leading to the formation of a hetero-tetrameric receptor complex. Activation of the receptor complex leads to phosphorylation of SMAD2 and SMAD3, the receptor-SMADs (R-SMADs). Phosphorylated R-SMADs form heteromeric complexes with SMAD4 and translocate into the nucleus, where they bind target genes.
SMAD3 binds the DNA directly, while SMAD2 needs SMAD4 to do so [12][13][14] , in combination with the histone acetyltransferase EP300, ultimately leading to activation of target genes. R-SMADs are known to interact with additional transcription factors that may vary between different cell types 15 , resulting in activation of cell type-specific transcriptional programmes (see Supplementary  Fig. 1a for a diagram of the TGF-beta pathway) 12,13 . In order to understand how TGF-beta signalling regulates the behaviour of hPSCs, it is critical to identify genes directly induced by R-SMADs.
Core pluripotency factors-POU5F1/OCT4, NANOG, SOX2were initially identified in murine naïve pluripotent cells [16][17][18] and were then found to be functionally relevant in hPSCs 19 . A large set of additional murine pluripotency factors have been identified 20 , the majority of which are not expressed in conventional human PSCs, potentially because of differences between species or because conventional hPSCs are in a more advanced developmental state called primed pluripotency. Although naïve hPSCs have been recently generated either directly from embryos or by reprogramming of somatic cells [21][22][23][24] , they are not the focus of this study and, for clarity, we should stress that the acronyms hPSCs, hESCs and hiPSCs indicate only human conventional pluripotent cells in a primed state.
Here, we study conventional hPSCs with the aim of isolating human-specific pluripotency regulators that could reveal differences between PSCs of different species, or could play a critical role for induction of human pluripotency.
In this study, we characterise the transcriptional programme activated by TGF-beta/SMAD3 signalling in hPSCs. We identify several potential downstream mediators and test them using a gainof-function approach. TGF-beta appears to maintain pluripotency via induction of four factors. Among them, we extensively characterise a transcriptional regulator, called ZNF398, which induces genes associated with pluripotency and epithelial character in collaboration with SMAD3 and the histone acetyltransferase EP300. Moreover, ZNF398 knockdown during somatic cell reprogramming causes a drastic reduction in iPSC colonies.
Human pluripotent colonies are composed of a monolayer of cells expressing epithelial markers. SB43 treatment induces also a morphological change, with loss of cell-cell contact, reduction of epithelial markers and upregulation of mesenchymal markers ( Fig. 1b and Supplementary Fig. 1c), as previously described 25 .
We decided to study how TGF-beta controls gene programmes associated with pluripotency and the epithelial character with an unbiased functional approach based on the identification of direct transcriptional targets followed by functional validation 26,27 . We reasoned that TGF-beta transcriptional targets should be bound by SMAD2/3 and either downregulated upon signal inhibition or rapidly induced upon stimulation. SMAD2 and SMAD3 can also form heterodimers 14 and have redundant functions in pluripotent cells 28 . We focused on SMAD3, given that it is more abundant than SMAD2 in hPSCs and it binds directly to the DNA 12,13 ( Supplementary Fig. 1d). We intersected SMAD3 chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) with gene expression data from cells treated with SB43 and identified 195 genes downregulated and bound by SMAD3 ( Fig. 1c and d, top panel yellow dots on the left). Moreover, by intersecting SMAD3-bound genes with genes induced after 4 h of acute stimulation, we identified 61 additional putative targets and 20 that were also among the downregulated genes ( Fig. 1c and d, bottom panel yellow dots on the right). Several known TGF-beta direct targets, such as LEFTY1/2, SKIL and SMAD7, were identified (Fig. 1d), supporting the validity of our approach.
We then refined our gene list by focusing on genes encoding for transcriptional regulators, such as transcription factors or chromatin modifiers, given that such classes of proteins have the capacity to direct transcriptional programmes. Finally, we included only genes robustly expressed (>3 RPKM in hPSCs) ( Supplementary Fig. 1e), obtaining a list of 21 candidates (for all datasets, see Supplementary Data 1).
We performed qPCR to independently validate our putative TGF-beta targets. In particular, we tested the responsiveness to both TGFB1 and Activin A, two ligands commonly used for hPSCs expansion [4][5][6][7] (Supplementary Fig. 1a and f). We also tested whether targets were responsive to the TGF-beta signal when cells were expanded either on feeders or under feeder-free conditions, given that the TGF-beta signal is active and maintains pluripotency under both conditions ( Supplementary Fig. 1g) 5,8,9 . After extensive validation, we identified eight genes (ID1, MYC, BCOR, KLF7, OTX2, ZNF398, NANOG and ETS2) as bona fide TGF-beta and Activin A transcriptional targets in hPSCs (Fig. 2a, b).
Functional identification of pluripotency regulators. If a gene is a critical downstream mediator of the TGF-beta signal in hPSCs, its forced expression should maintain pluripotency also when TGF-beta signalling is inhibited. To test this hypothesis, we stably expressed our candidates in hiPSCs and hESCs using piggyBac (PB) vectors.
First, we performed a clonal assay, which allows us to quantify the fraction of hPSCs able to self-renew, giving rise to pluripotent colonies. Cells transfected with an empty vector formed a reduced number of alkaline phosphatase (AP)-positive pluripotent colonies when treated with SB43 ( Fig. 3a and Supplementary  Fig. 2a). Only expression of NANOG, KLF7, MYC and ZNF398 resulted in full rescue in formation of AP-positive colonies in the presence of SB43, while other factors had only a partial, or no effect. Second, under forced expression of either NANOG, KLF7, MYC or ZNF398 the cells maintained a flat epithelial-like morphology, generally associated with pluripotency ( Fig. 3b), upon SB43 treatment, while other factors failed to do so. Third, NANOG, KLF7, MYC and ZNF398 were each able to maintain expression of pluripotency markers (Fig. 3c) in presence of SB43, although they displayed specificity for different targets. For instance, ZNF398 and NANOG activated robustly PRDM14 expression. Quantitative immunostaining confirmed maintenance of OCT4 and NANOG at the protein levels (Fig. 4a). Comparable results were also obtained after prolonged culture with SB43 ( Supplementary Fig. 2b-d). We confirmed our results in an additional hESC line ( Supplementary Fig. 3a-c) Fig. 1d were independently validated by qPCR. Validation experiments were performed in two different culture conditions: feeder-free or on feeders (MEF). Expression was normalised to the mean of DMSO-treated cells. Right: RNA-seq data as in bottom panel of Fig. 1d were independently validated by qPCR. Expression was normalised to the mean of SB43-treated samples. The grey box highlights the known direct targets of SMAD3 (positive controls, including SKIL, one of the 21 candidates). Transcriptional SMAD3 targets independently confirmed are highlighted in bold. For qPCR validation, five independent experiments were performed for each condition. Unpaired two-tailed t-test, p-values were not adjusted. See also Supplementary Fig. 1f, g. Source data are provided as a Source Data file. b Example of SMAD3 binding and gene expression analysis of three validated targets. Top: Gene tracks represent binding of SMAD3 (data from ref. 15 ) at the indicated gene loci. Red lines indicate the regions validated by ChIP-qPCR. Center: ChIP-qPCR on SMAD7, NANOG and ZNF398 loci was performed using anti-SMAD3 and anti-SMAD2 or a rabbit control IgG antibody in BG01V (dark grey) and H9 (light grey) cell lines. Enrichment is expressed as a percentage of the DNA inputs. Bars indicate the mean of two biological replicates shown as dots. Bottom: Gene expression analysis by qPCR. Bars indicate mean ± SEM of n = 7, 7, 5, 7, 9, 10, 8, 6 biological replicates over six independent experiments shown as dots for SMAD7, n = 7, 6, 5, 6, 5, 5, 5, 5 biological replicates over five independent experiments shown as dots for NANOG, and n = 8, 7, 5, 6, 9, 10, 8, 6 biological replicates over six independent experiments shown as dots for ZNF398. Expression was normalised to the mean of SB43 16 h samples. Unpaired two-tailed t-test. Source data are provided as a Source Data file. a c b Fig. 3 Functional identification of pluripotency regulators in hPSCs. a Left: Clonal assay quantification of hESCs (HES2, red bars) and hiPSCs (KiPS, orange bars) stably expressing an empty vector control (Empty) or eight different SMAD3 targets identified in Fig. 2a. Two thousand cells were seeded at clonal density in the presence of DMSO or SB43 and stained for alkaline phosphatase (AP) after 5 days. Bars show the mean ± SEM percentage of APpositive colonies. Dots represent independent experiments (n = 7, 10 for Empty DMSO; n = 7, 10 for Empty SB43; n = 3, 4 for NANOG SB43; n = 2, 2 for KLF7 SB43; n = 4, 3 for MYC SB43; n = 3, 2 for ZNF398 SB43; n = 3, 3 for ID1 SB43; n = 2, 1 for BCOR SB43; n = 3, 2 for ETS2 SB43; n = 3, 1 for OTX2 SB43 in HES2 and KiPS, respectively). Unpaired two-tailed Mann-Whitney U test relative to Empty SB43 samples. Right: Representative images of clonal assay performed in KiPS. See also Supplementary Fig. 3a for results obtained in H9 hESCs. Scale bars 500 µm. Source data are provided as a Source Data file. b Morphology of HES2 colonies stably expressing an empty vector (Empty) in presence of DMSO or SB43 and HES2 stably expressing the eight SMAD3 targets in presence of SB43. Representative images of three independent experiments are shown. See also Supplementary Fig. 3b for results obtained in H9. Scale bars 200 µm. c Gene expression analysis by qPCR of HES2 (light green bars) and KiPS (dark green bars) stably expressing an Empty vector or the eight SMAD3 targets and treated with or without SB43 for 5 days. Bars indicate mean ± SEM of independent experiments, shown as dots (n = 5, 5 for NANOG overexpression; n = 2, 2 for KLF7 overexpression; n = 4, 4 for MYC overexpression; n = 4, 4 for ZNF398 overexpression, n = 2, 4 for ID1 overexpression; n = 2, 2 for BCOR overexpression; n = 2, 1 for ETS2 overexpression; n = 1, 2 for OTX2 overexpression in HES2 and KiPS, respectively). Expression was normalised to the Empty DMSO samples. Unpaired two-tailed Mann-Whitney U test. Source data are provided as a Source Data file. NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-16205-9 ARTICLE NATURE COMMUNICATIONS | (2020) 11:2364 | https://doi.org/10.1038/s41467-020-16205-9 | www.nature.com/naturecommunications performed transcriptome analysis and observed that MYC robustly activated UTF1, KLF7, LIN28B and DPPA2, despite the mild effect on NANOG and OCT4 (Fig. 4b). Similarly, NANOG, KLF7 and ZNF398 activated completely distinct sets of pluripotency regulators. We further extended our analysis to all genes highly expressed in hPSCs that were significantly reduced upon SB43 treatment for 5 days. We identified 538 genes downregulated, as a proxy for genes generally associated with human pluripotency (Fig. 5a, mean foldreduction relative to Empty-DMSO and Empty-SB43 = 3.75×), among them were also PRDM14 and NANOG. All of our four factors under study were able to rescue such global transcriptional effect (Fig. 4c, see data in Supplementary Data 2).
Murine epiblast stem cells (EpiSCs) are primed pluripotent cells derived from the post-implantation epiblast 35,36 . EpiSCs share several molecular features with primed hPSCs 37 , including the requirement of TGF-beta for self-renewal 10 . Therefore, we asked whether forced expression of the four factors would maintain pluripotency also in EpiSCs. We generated both GOF18 27 and OEC2 38 EpiSCs stably expressing the four transcription factors ( Supplementary Fig. 4a). TGF-beta inhibition led to a reduction of Nanog, Oct4, Otx2 and Fgf5 ( Supplementary Fig. 4b) and none of the four factors were able to maintain the expression of the markers analysed, with the exception of Otx2 maintained only by KLF7. We conclude that the ability of NANOG, KLF7, MYC and ZNF398 to maintain pluripotency is not conserved in murine EpiSCs.
In sum, our results indicate that in hPSCs, TGF-beta maintains pluripotency mainly via a quartet of transcriptional regulators, each one preferentially activating a specific subset of pluripotency factors. Among these, NANOG and MYC have - Shown data refers to KiPS transfected with the empty vector in the presence of DMSO or SB43 (n = 4, 4 independent experiments, respectively) and for KiPS stably expressing NANOG, KLF7, MYC or ZNF398 in presence of SB43 for 5 days (n = 2, 2, 2, 2 independent experiments, respectively). Average fold-change relative to Empty SB43 sample (X) is reported for each condition. Box plots indicate 25th, 50th and 75th percentile; whiskers indicate minimum and maximum. Unpaired two-tailed t-test.
been extensively investigated as pluripotency regulators 10,11,19,39 ; KLF7 is a Kruppel-like factor and other members of the same family, such as KLF2/4/5, are known regulators of pluripotency 40 . Conversely, ZNF398 has never been implicated in regulation of pluripotency, prompting us to choose it for further molecular characterisation.

ZNF398 represses differentiation and mesenchymal genes.
When TGF-beta is blocked hPSCs lose pluripotency and undergo a morphological change. After focusing on the pluripotency regulators (Figs. 3 and 4), we decided to study the global effect of TGF-beta on hPSC function. Thus, we performed an unbiased transcriptional analysis and observed that upon SB43 treatment   (Fig. 5a). Gene Ontology (GO) enrichment analysis identified several categories associated with cell adhesion, epithelial to mesenchymal transition and organisation of the extracellular matrix, in agreement with the observed morphological change (Fig. 5b). Among them we identified a subset of genes specifically associated with epithelial character, that were downregulated by SB43 (Fig. 5c, epithelial). Moreover, we observed several gene categories associated with formation and function of neural cells (Fig. 5b), corresponding to a set of genes upregulated by SB43 (Fig. 5c, neuroectodermal). Upregulation of neuroectodermal genes was expected from studies performed in different model systems showing that TGF-beta, Activin A and Nodal block neuroectoderm formation 41,42 . Indeed, inhibition of TGF-beta is commonly used for neuroectodermal differentiation protocols 43 .
Next, we asked whether the forced expression of ZNF398 was able to counteract such transcriptional changes and observed a reduction in neuroectodermal genes and boosted expression of epithelial genes (Fig. 5c). We conclude that ZNF398 is activated by TGF-beta to maintain the correct expression of neuroectodermal and epithelial genes in hPSCs.
Next, we asked whether ZNF398 would be required to control TGF-beta-dependent transcriptional programmes. We performed siRNA-mediated knockdown of ZNF398 and observed no effect on self-renewal ( Supplementary Fig. 5a, b), as expected from the presence of four factors that are individually able to maintain pluripotency downstream of TGF-beta. However, ZNF398 knockdown during the early phases (5 days) of differentiation resulted in further reduction of pluripotency and epithelial markers, and enhanced induction of neuroectodermal genes (Fig. 5d, see also Supplementary Fig. 5c) to an extent comparable or greater than NANOG knockdown.
To further investigate the capacity of ZNF398 to regulate pluripotency and the epithelial character of hPSCs in an independent assay, we performed embryoid bodies (EBs) differentiation. Forced expression of ZNF398 was able to activate expression of pluripotency and epithelial markers, while repressing mesenchymal and germ layer markers relative to control cells (Fig. 5e). Collectively, these results indicate that ZNF398 promotes the expression of pluripotency and epithelial markers and represses genes associated with differentiation of hPSCs.
ZNF398 activates transcription in concert with SMAD3. We next sought to understand the molecular mechanism by which ZNF398 promotes pluripotency and epithelial character. ZNF398 contains several zinc-finger domains and it has been shown to recognise specific DNA sequences in COS-1 cells 44 . Therefore, we performed ChIP-seq for ZNF398 in two different hESCs lines and identified genomic regions bound by it containing a DNA motif similar to other ZNF factors ( Supplementary Fig. 6a). Cooperative binding among transcription factors has been reported in several stem cell systems, thus we asked how similar the genome-widebinding profile of ZNF398 is to those of other transcriptional regulators (data from CODEX 45 ). Surprisingly, we found that ZNF398 clustered more closely with SMAD3 and the histone acetyl-transferase EP300, compared to the core pluripotency factors OCT4/NANOG or the Polycomb components (Fig. 6a, top  panel). A similar analysis conducted on histone modificationsassociated ZNF398 with regions decorated by acetylation of histone 3 on lysine 9 or 27 (Fig. 6a, bottom panel). Clustering results are confirmed by strong colocalisation of ZNF398, SMAD3, EP300 and H3K27ac (Fig. 6b). Histone acetylation is associated with both active promoters and enhancers, so we looked at the distribution of mono-methylation and tri-methylation of histone 3 on lysine 4, associated with active enhancers and promoters, respectively. We looked at ZNF398 peaks and found that 3595 ZNF398 peaks out of 5771 appeared as active enhancers (high levels of H3K4me1 and low H3K4me3), while the remaining 2176 peaks as active promoters (high H3K4me3). We conclude that ZNF398 preferentially colocalises with SMAD3 and EP300 at active enhancers and promoters in hPSCs.
The frequent colocalisation may be due to binding to neighbouring DNA regions or to physical interaction. A coimmunoprecipitation (Co-IP) assay indicates that SMAD3 and ZNF398 form a complex in hPSCs (Fig. 6c).
We observed that ZNF398 bound and activated the pluripotency factor LIN28B and the epithelial master regulator epithelial splicing regulatory protein 1 (ESRP1 46 ) (Fig. 7a), matching the pro-pluripotency and pro-epithelial activity of ZNF398. Interestingly, we also observed that LEFTY1, a known TGF-beta direct target, was also co-bound by ZNF398 (Fig. 7a). We therefore hypothesised that ZNF398 might potentiate the transcription of TGF-beta targets by binding SMAD3 targets. We functionally tested this hypothesis by comparing hPSCs expressing ZNF398 against control hPSCs. ZNF398 boosted the basal expression of LEFTY1 by >10 fold (i.e. in the presence of TGF-beta), and was even able to maintain residual LEFTY1 expression in the absence of TGF-beta signalling (Fig. 7b). We extended our analysis to all SMAD3 direct target genes (Fig. 1c) and observed that 23 out of 81 were also co-bound by ZNF398 (Fig. 7c, enrichment of 3.67 fold over those expected by chance, p-value = 3.49e−12, Chisquared test). Importantly, the entire set of SMAD3-ZNF398 cobound genes were significantly upregulated in cells expressing ZNF398 (Fig. 7d), further indicating a functional role of ZNF398 as activator of SMAD3 co-bound targets. This activity is specific Fig. 5 ZNF398 represses differentiation and mesenchymal genes. a Transcriptome analysis of KiPS stably expressing an empty vector and treated with SB43 for 5 days. DOWN-regulated (Log2 fold-change < −1 and p-value < 0.01) and UP-regulated (Log2 fold-change > 1 and p-value < 0.01) genes are indicated in blue and orange, respectively. Known TGF-beta targets (LEFTY1, LEFTY2) serve as controls. Not adjusted p-values were calculated with Wald Test. b GO term analysis for biological processes of DOWN-regulated genes (left, blue bars) and UP-regulated genes (right, orange bars) revealed a statistically significant enrichment (p-value < 0.05) for genes involved in cell adhesion, epithelial to mesenchymal transition, organisation of extracellular matrix (highlighted in red) and neural development (highlighted in green). p-values were calculated by Fisher Exact test using DAVID database 63 . c Heatmap for markers of neuroectodermal and epithelial character. RNA-seq data derived from KiPS stably expressing an empty vector (Empty) or ZNF398 and treated with DMSO or SB43 for 5 days. Z-scores of row-scaled expression values (TPM) are shown. Orange and blue indicate high and low expression, respectively. Markers of epithelial (EPCAM, ESRP1, EZR) and neuroectodermal identity (SIX3, ENC1, PAX3) are highlighted. d Gene expression analysis by qPCR of H9 transfected with the indicated siRNAs (a non-targeting siRNA (siCONTROL) or a pool of two validated siRNAs (siNANOG and siZNF398)) and treated with or without SB43 for 5 days. See also "Methods" section. Bars indicate the mean of two independent experiments shown as dots. Expression was normalised to the mean of siCONTROL treated with SB43 samples. See also Supplementary Fig. 5 for siRNA validation and for additional markers. Source data are provided as a Source Data file. e Gene expression analysis by qPCR of EBs differentiation of KiPS stably expressing an empty vector (Empty, light orange) or ZNF398 (dark orange) analysed at three different time points (Day 0, Day 5 and Day 15) of differentiation. Bars indicate the mean ± SEM of five independent experiments, shown as dots. Expression was normalised to the mean of Empty Day 0 sample. Unpaired twotailed Mann-Whitney U test. Source data are provided as a Source Data file.
for ZNF398, given that NANOG expression had no discernible effect on SMAD3-ZNF398 targets. Among the genes upregulated in hPSCs expressing ZNF398, we observed strong induction of several established direct targets of TGF-beta signal, such as LEFTY1/2, CER1, TGFB1 and NODAL (Fig. 7e). We also analysed the dynamics of R-SMADs nuclear entry upon TGF-beta stimulation. Sixty minutes of treatment were sufficient to induce phosphorylation of SMAD3 and translocation from the cytoplasm to the nucleus ( Supplementary Fig. 6b-d). Ectopic ZNF398 expression led to accelerated and enhanced nuclear translocation.
In sum, we conclude that ZNF398 colocalises with SMAD3 at active enhancers and promoters, activating the transcription of TGF-beta targets in hPSCs.
ZNF398 could be either a hPSC-specific or a general activator of the TGF-beta signal. We identified only two human cell lines expressing ZNF398 comparably to hPSCs (Supplementary Fig. 7a) and performed ZNF398 downregulation or over-expression, observing no differences in the induction of TGF-beta direct targets ( Supplementary Fig. 7b, c). In two EpiSCs lines, stable expression of Zfp398-the ZNF398 mouse orthologue-we also observed no effect on the levels of TGF-beta targets ( Supplementary Fig. 7d), in stark contrast with what we observed in hPSCs expressing ZNF398. We conclude that, among all the cell types we tested, ZNF398 activates the TGF-beta signal only in hPSCs.
ZNF398 is required for somatic cell reprogramming. So far, our results indicate that ZNF398 promotes the pluripotency and the epithelial character programmes in hPSCs. We decided to test the function of ZNF398 in an orthogonal system, the induction of pluripotency from somatic cells. Reprogramming from somatic cells, such as fibroblasts, requires an early mesenchymal to epithelial transition (MET) followed by the activation of endogenous pluripotency factors 23,47,48 . We noticed that ZNF398 is expressed in human fibroblasts ( Supplementary Fig. 8a), raising the possibility that ZNF398 promotes acquisition of epithelial character and pluripotency from early stages of reprogramming. We reprogrammed human fibroblasts by delivery of mRNAs encoding for either OSKMNL 23,47,48 (OCT4, SOX2, KLF4, MYC, NANOG, LIN28A) or OSKM, in combination with siRNAs, allowing to test the requirement of endogenous ZNF398 for reprogramming (Fig. 8a). By day 6 of reprogramming, fibroblasts transfected with control siRNAs formed clusters of epithelial cells, indicative of MET. This effect was clearly reduced upon ZNF398 knockdown ( Fig. 8b and Supplementary Fig. 8b). Around day 10 small colonies emerged and were stabilised over the following 6 days. Upon Control siRNA and OSKMNL transfection we obtained 0.9% of reprogramming efficiency, which was reduced to 0.15% by ZNF398 knockdown. In the case of Control siRNA and OSKM the efficiency was 0.5% and ZNF398 knockdown almost completely ablated formation of NANOG and OCT4-expressing colonies (Fig. 8c, d).
Transcriptional analysis indicates a failure to activate a large panel of pluripotency and epithelial markers upon ZNF398 knockdown ( Fig. 8e and Supplementary Fig. 8c). Immunostaining confirmed membrane localisation of E-cadherin only in NANOG-positive reprogrammed colonies (Fig. 8f), accompanied with loss of actin stress fibres, clearly visible in fibroblasts that failed to reprogramme.
We also asked whether ZNF398 might be required for the proliferation of fibroblasts, rather than for acquisition of pluripotency and epithelial character. However, we could not observe a reduction in cell number after 6 days of siZNF398 transfection and levels of proliferation regulators were also unchanged (Supplementary Fig. 8d, e).
Thus, ZNF398 is required for efficient induction of epithelial character and pluripotency from fibroblasts.

Discussion
TGF-beta signalling is critical for hPSC self-renewal [5][6][7] . The transcription factor NANOG was first identified in murine ESCs for its capacity to maintain pluripotency in the absence of exogenous signals 16 . Such activity was also found conserved in hPSCs and it was shown that TGF-beta directly induces NANOG expression in hPSCs 10,11 . However, an unbiased and systematic analysis of TGF-beta functional mediators in hPSCs was still missing. For this reason, we performed a transcriptome-level analysis of TGF-beta targets followed by a gain-of-function screening to identify uncharacterised pluripotency regulators. Loss-of-function screenings have been performed in hPSCs, whereby genes were inactivated by RNA interference or using the CRISPR system 30,31,49 . Such studies identified some critical pluripotency regulators, such as PRDM14 or BCOR. However, loss-offunction approaches might fail to identify critical regulators because of functional redundancy with other factors. For example, a CRISPR screening in murine ESCs failed to identify the majority of known pluripotency factors 50 , likely because the pluripotency network is highly redundant and robust to inactivation of single factors 20,40 . For this reason, we chose a gain-of-function screening approach, whereby individual putative pluripotency regulators are exogenously expressed in hPSCs and their capacity to maintain pluripotency is tested. Such an approach allowed the identification of several critical murine pluripotency regulators 20 . We identified a quartet of transcription factors, NANOG, MYC, KLF7 and ZNF398, which individually promote hPSC selfrenewal. Interestingly, each of these four factors activates a specific subset of human pluripotency regulators 3,19,[29][30][31][32][33][34] , indicating that the human pluripotency network is flexible and can be maintained under different configurations. Among them, ZNF398 controls both pluripotency and epithelial genes downstream of TGF-beta (Fig. 8g).
Our analyses identified an extended set of functional human pluripotency regulators beyond the core factors OCT4, SOX2 and NANOG (Fig. 4b). It will be interesting to apply computational modelling 20 to reconstruct the network of interactions among such factors in order to study how such a network reconfigures itself after perturbations or during reprogramming.
Interestingly, only a fraction of human pluripotency regulators are robustly expressed in murine ESCs (data from ref. 21 ). This observation is in part attributable to differences in the developmental stage, as conventional hPSCs are in a pluripotent stage primed for differentiation, whereas murine ESCs are in a more primitive, naïve state of pluripotency 20,37 .
However, naïve hPSCs have been recently obtained 21-24 and we observed that ZNF398 and KLF7 are robustly expressed in hPSCs regardless of their pluripotency state. Moreover, forced expression of both genes could not maintain pluripotency in primed EpiSCs (Supplementary Fig. 4) and Klf7 expression in murine ESCs had no effect 40 , indicating that the two factors are human-specific pluripotency regulators.
It will be interesting to test whether the functions of TGF-beta and its direct targets are conserved or divergent in naïve and primed hPSCs.
Inhibitors of differentiation (ID) genes, such as ID1, block neural differentiation in the developing mouse embryo 51 and in murine pluripotent stem cells 52 . ID1 is induced by BMP and by TGF-beta 52,53 , also shown in our experiments. ID1 expression had a mild yet reproducible effect on AP-positive colony formation and maintenance of PRDM14 (Fig. 3a), and in the future it will be interesting to study whether ID1 inhibits neural differentiation also in hPSCs.
ZNF398 is a member of the Krüppel-associated box domain zinc finger proteins (KZFPs), the largest family of transcriptional regulators found in higher vertebrates. The majority of the 350 KZFPs identified in humans have been found to be associated with repression of transposable elements 54 , playing key roles during early embryogenesis. Interestingly, roughly one-third of KZFPs, were found to be associated with gene promoters, as in the case of ZNF398.
We are tempted to speculate that some members of such a large family might have acquired new roles, beyond silencing of transposable elements and in so doing, contributed to the evolution of gene-regulatory networks.
ZNF398, also known as ZER6, has never been implicated in regulation of pluripotency. Previous studies reported that ZNF398 directly activates transcription 44 and is regulated by Oestrogen Receptor Alpha 44,55 . Two isoforms of ZNF398 have been described 44,55,56 , called p71 and p52. The shorter isoform (p52), lacks a N-terminal domain and promotes proliferation of cancer cells by ubiquitination of p53 56 . In hPSCs the longer isoform (p71) is predominant and has been used in all our experiments. It will be interesting to test whether p52, which lacks the N-terminal domain, regulates pluripotency in hPSCs.
Our results have also potential implications for reprogramming: ZNF398 knockdown strongly reduced reprogramming efficiency, indicating a critical role during establishment of pluripotency.
In particular, we observed reduced morphological conversion from mesenchymal to epithelial-like cells and reduced expression of epithelial markers and pluripotency markers, further indicating that ZNF398 promotes both pluripotency and epithelial character.
It will be interesting to see if ZNF398, or other members of the extended set of human pluripotency regulators, can be used to generate iPSCs at higher efficiency or to identify fully reprogrammed cells.
EpiSC lines (GOF18 27 8 ZNF398 is required for somatic cells reprogramming. a Experimental strategy for reprogramming by delivery of OSKM (OCT4, SOX2, KLF4, MYC) mRNAs in combination with siRNAs in order to test the requirement of ZNF398 for reprogramming. Scale bars 50 µm. See also "Methods" section for details. b Cell morphology during OSKM reprogramming. Representative images of two independent experiments are shown. See also Supplementary  Fig. 8b. c Number of iPSC colonies obtained from 100 cells seeded at day 14 in OSKMNL reprogramming and at day 16 in OSKM reprogramming under the indicated conditions. Bars indicate the mean ± SEM of independent experiments. p-values: unpaired two-tailed Mann-Whitney U test. OSKMNL reprogramming: dots indicate biological replicates (n = 23, 24 and 23 replicates for no siRNA, siCONTROL and siZNF398, respectively) from four independent experiments shown in different shades of colours. OSKM reprogramming: dots indicate biological replicates (n = 15 for replicates scored based on morphology or NANOG signal, n = 8 for replicates scored based on OCT4 signal) from two independent experiments shown in different shades of colours. Source data are provided as a Source Data file. d Immunostaining for OCT4 and NANOG of fibroblasts transfected with OSKM mRNAs and siCONTROL or siZNF398 at day 16. Representative images of two independent experiments are shown. Scale bars 150 µm. e Gene expression analysis by qPCR of hiPSCs (KiPS) (light grey) and fibroblasts (dark grey) serving as controls, and fibroblasts transfected with OSKMNL mRNAs and siCONTROL (light orange) or siZNF398 (dark orange) at day 6 and day 14. Bars indicate the mean of two independent experiments shown as dots. Expression was normalised to the mean of KiPS samples. See also Supplementary Fig. 8a, c. Source data are provided as a Source Data file. f Immunostaining for NANOG and E-Cadherin followed by Phalloidin staining of fibroblasts transfected with OSKM mRNAs and siCONTROL or siZNF398 at day 16. Representative images of two independent experiments are shown. Scale bars 150 µm. g Diagram representing the transcription factors induced by TGF-beta, among which ZNF398 is crucial for the maintenance of pluripotency and the epithelial character of hPSCs.
For DNA transfection, 250,000 hPSCs were dissociated as single cells with TrypLE (Gibco 12563-029) and were co-transfected with PB constructs (550 ng) and pBase plasmid (550 ng) using FuGENE HD Transfection (Promega E2311), following the protocol for reverse transfection. For one well of a 12-well plate, we used 3.9 μl of transfection reagent, 1 μg of plasmid DNA, and 250,000 cells in 1 ml of E8 medium with 10 µM Y27632 (ROCKi, Rho-associated kinase (ROCK) inhibitor, Axon Medchem 1683). The medium was changed after overnight incubation and Hygromycin B (200 μg/ml; Invitrogen 10687010) was added after 48 h. For the overexpression experiments, hPSCs stably expressing an empty vector or the candidates were plated. The next day, cells were treated with DMSO or 10 µM SB43 for 5 days and then analysed as indicated in Supplementary Fig. 2a.
Murine EpiSCs experiments. For generation of stable transgenic lines overexpressing candidate genes, EpiSCs were reverse-transfected with 3 µl of Lipofectamine 2000 (Invitrogen 11668-019) using 500 ng of PB transposon plasmid harbouring the indicated factor and 500 ng of transposase in 200 µl of Opti-MEM (Gibco 51985-026). 1.2 × 10 5 cells in 800 µl of N2B27 with FGF2 (12 ng/ml) and Activin A (20 ng/ml) and 10 µM ROCKi were added to the transfection mix and plated in serum-coated 12-well plates. The next day the medium was changed and Hygromycin B selection was applied for 5 days. To test the effect of TGF-beta inhibitors, 1/20 of a confluent well was plated on serum-coated 12-well in N2B27 medium with FGF2 and Activin A. The next day, the medium was changed to N2B27 with FGF2 and 1 µM SB43 or FGF2 and 1 µM A83 (Axon Medchem 1421). After 48 h, cells were harvested for expression analysis.
siRNA and DNA transfection in HEK293T and HaCaT cell lines. Cells were plated at 20% confluence on a 24-well plate the day before transfection. For transfection with siRNAs, each individual well was transfected with Lipofectamine RNAiMAX reagent (ThermoFisher 13778075) following the manufacturer protocol (0.2 µl of 100 µM siRNA with 1 µl of transfection reagent per well). For transfection with DNA, each individual well was transfected with a mix of: 2.25 µl of polyethylenimine (PEI, Polysciences 23966), 750 ng DNA in 100 µl Opti-MEM. In cases of treatment with TGF-beta, cells were starved in medium without serum (+10 µM SB43, for SB43 samples only), 24 h after transfection. After overnight incubation, the medium was replaced with DMEM without serum and with 10 µM SB43 or 5 ng/ml TGFB1 for 6 h.
siRNA transfection in hPSCs. For siRNA transfection, hPSCs were plated on Matrigel-coated 24-well plate as clusters (2500-5000 clusters for one well of a 24well plate) in E8 medium with 10 µM ROCKi. After 4 h, siRNAs were transfected at a final concentration of 20 nM using Stemfect TM RNA Transfection Kit (STEM-GENT 00-0069), following the protocol for forward transfection.
For a 24-well plate (2 cm 2 ), we used 0.52 µl of transfection reagent, 2 µl of 10 µM siRNA solution and 25 µl of transfection buffer. After waiting 20 min, we mixed the transfection mix with 1 ml of E8 medium. The medium was changed after overnight incubation. See Supplementary Table 1 for sequences of the siRNAs used.
EBs differentiation assay. KiPS stably expressing an empty vector or ZNF398 were detached as clumps with EDTA and plated on ultra low attachment surface plates (CORNING 3473) in E8 medium with 10 µM ROCKi. After 2 days, E8 medium was substituted with DMEM, 20% FBS, 2 mM L-glutamine, 1% NEAA and 0.1 mM 2-mercaptoethanol. Medium was changed every 2 days.
Reprogramming. All reprogramming experiments were performed in microfluidics in hypoxia conditions (37°C, 5% CO 2 , 5% O 2 ) 48 . The protocol for reprogramming experiments was optimised to transfect siRNA in order to test the requirement of ZNF398 for reprogramming.
In the case of OSKM reprogramming, individual modified mRNAs (OCT4, SOX2, KLF4 and MYC) were made in-house by in vitro transcription using mRNA synthesis with HiScribe™ T7 ARCA mRNA Kit (NEB E2060S) according to the manufacturer's instructions. On day 0, fibroblasts were seeded at 15 cells/mm 2 in DMEM/10% FBS. On day 1, 9 h before the first mRNAs transfection, we applied E6 medium including 100 ng/ml FGF2, 5 µM ROCKi, 0.1 μM LSD1i, 1% KSR and 200 ng/ml B18R (Invitrogen 34-8185-81). The B18R protein was added to the medium to reduce the interferon response. The transfection mix was prepared according to the StemMACS™ mRNA Transfection Kit and using OSKM mRNAs made inhouse and NM-microRNAs (Stemgent StemRNA-NM Reprogramming Kit). Cells were transfected daily at 6 p.m. and fresh medium was given daily at 9 a.m. siRNAs were transfected at a final concentration of 20 nM at day 1, day 3 and day 5 (see Supplementary Table 1 for sequences of the siRNAs used) together with mRNAs. The dose of mRNAs transfected was gradually increased according to cell proliferation rate and transfection-induced cell mortality 48 .
Immunofluorescence and stainings. Immunofluorescence analysis was performed on 1% Matrigel-coated glass coverslip in wells or in situ in microfluidic channels with the same protocol. Cells were fixed in 4% formaldehyde (Sigma-Aldrich 78775) in PBS for 10 min at RT, washed in PBS, permeabilized for 1 h in PBS + 0.3% Triton X-100 (PBST) at RT, and blocked in PBST + 5% of HS (ThermoFisher 16050-122) for 5 h at RT. Cells were incubated overnight at 4°C with primary antibodies (see Supplementary Table 2) in PBST + 3% of HS. After washing with PBS, cells were incubated with secondary antibodies (Alexa, Life Technologies) (Supplementary Table 2) for 45 min at RT. Nuclei were stained with either DAPI (4′,6-diamidino-2-phenylindole, Sigma-Aldrich F6057) or Hoechst 33342 (ThermoFisher 62249). In the case of Phalloidin staining (see Fig. 8f), Alexa Fluor 488 Phalloidin and Hoechst were added with secondary antibodies. Images were acquired with a Zeiss LSN700 or a Leica SP5 confocal microscope using ZEN 2012 or Leica TCS SP5 LAS AF (v2.7.3.9723) software, respectively.
For alkaline phosphatase staining, cells were fixed with a citrate-acetone-formaldehyde solution and stained using an alkaline phosphatase detection kit (Sigma-Aldrich 86R-1KT). Plates were scanned using an Epson scanner and scored manually.
Western blotting. To monitor endogenous protein levels, cells were detached, medium removed and frozen at −80°C prior to processing. Pellets were then thawed and resuspended in 10 ml/cm 2 HPO buffer (50 mM Hepes pH 7.5, 100 mM NaCl, 50 mM KCl, 1% triton X-100, 0.5% NP-40, 5% glycerol, 2 mM MgCl 2 ) freshly supplemented with 1 mM DTT, protease inhibitors (Roche 39802300) and phosphatase inhibitors (Sigma-Aldrich P5726). Western blotting was performed as in ref. 59 . Quantitative PCR. Total RNA was isolated using Total RNA Purification Kit (Norgen Biotek 37500), and complementary DNA (cDNA) was made from 500 ng using M-MLV reverse transcriptase (Invitrogen 28025-013) and dN6 primers. For real-time PCR SYBR Green Master mix (Bioline BIO-94020) was used. Primers are detailed in Supplementary Table 3. Three technical replicates were carried out for all quantitative PCR. GAPDH was used as endogenous control to normalise expression. qPCR data were acquired with QuantStudio™ 6&7 Flex Software 1.0.
RNA sequencing. For induction experiments (Fig. 1d), poly(A) mRNA was purified from total RNA using the Dynabeads mRNA direct kit (ThermoFisher, 61011). Quantity and quality of the starting mRNA were checked by Qubit and Agilent Bioanalyzer 2100 RNA pico chip. The template library was prepared using the Ion Total RNA-Seq Kit v2 (ThermoFisher, 4475936). Quantity and size distribution of the library were analysed using the Agilent Bioanalyzer 2100 DNA HS chip. Emulsion PCR using 10 ml of 100 pM library was performed using a One-Touch 2 instrument (ThermoFisher, 4474778) with an Ion PI Template OT2 200 kit following the manufacturer's instructions (ThermoFisher, 4488318). The enrichment of the template library was achieved using the Ion OneTouch ES enrichment system (ThermoFisher). Ion Proton sequencer and IPv2 chip were prepared according to the manufacturer's recommendations. Raw reads were aligned in two steps: first reads were aligned on genome build GRCh37.p13 with STAR (v2.4), reads that were not aligned in this step were realigned with bowtie2 (v2.2.4). Raw counts over the ensembl annotation release 75 were obtained with htseq-count (v0.6.0). Normalisation and differential analysis were carried out using edgeR package (v3.4.2) 60 and R (version 3.5.2, R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/). EdgeR fits genewise negative binomial generalised linear models and conducts likelihood ratio test. Raw counts were normalised to obtain counts per million-mapped reads (CPM) and reads per kilobase per million mapped reads (RPKM). Only genes with a CPM >1 in at least two samples were retained for differential analysis. Differences between batches were adjusted using an additive model. Genes were considered significantly upregulated with a p-value ≤ 0.05 and a fold-change ≥1.5.
For overexpression experiments (Figs. 4b, c and 5a),~2 μg of total RNA were subjected to poly(A) selection, and libraries were prepared using the TruSeq RNA Sample Prep Kit (Illumina) following the manufacturer's instructions. Sequencing was performed on the Illumina NextSeq 500 platform. Reads were mapped to the Homo sapiens hg19 reference assembly using TopHat (v2.1.1), and gene counts were computed using htseq-count (v0.6.1p1) 61 . Differential expression analysis was performed using DESeq v2 62 . Genes with abs (Log2 fold-change) ≥ 1 and p-value < 0.01 were considered significant and defined as differentially expressed (differentially expressed genes (DEG)).
Microarray analysis. Public gene expression data of hESCs treated with SB43 were downloaded from ArrayExpress (E-MEXP-1741). Differentially expressed genes were identified applying limma (v3.18.13) 64 on the RMA normalised gene expression matrix. Limma fits a linear model for each gene and calculates moderated t-statistics and p-values with an empirical Bayes moderation approach.
To identify genes associated with TGF-beta inhibition, we compared the expression levels of hESCs treated with SB43 or control cells and selected those probe sets with a fold change lower than or equal to −2 and an FDR lower than or equal to 0.05. Microarray analyses were performed in R (version 3.5.2).
Protein coimmunoprecipitation. To detect the protein interaction, nuclei were isolated from H9 cells expressing Avi-Tag-ZNF398 which were induced with 25 ng/ ml Activin A for 1 h. Cells were lysed with Isotonic buffer supplemented with 1% NP-40. The nuclei pellets were resuspended in IP buffer (50 mM Tris-HCl pH 8.0, 100 mM NaCl, 200 mM sucrose, 0.5 mM MgCl 2 , 5 mM CaCl 2 , 5 μM ZnCl 2 ) and were treated with micrococcal nuclease at 30°C for 10 min. Nuclear proteins were incubated with 2 μg of indicated antibodies (Supplementary Table 2) overnight at 4°C. The immunoprecipitated complexes were incubated with Protein G magnetic beads (Invitrogen) for 2 h at 4°C and then were washed three times with IP buffer plus 0.5% NP-40. The precipitated proteins were eluted by incubating with 0.5 M NaCl TE buffer and were further analysed by western blotting.
Statistics and reproducibility. For each dataset, sample size n refers to the number of independent experiments or biological replicates, shown as dots, as stated in the figure legends. A Gaussian distribution was not assumed and p-values were calculated using the non-parametric unpaired two-tailed Mann-Whitney U test with the exception of induction experiments (Fig. 2a, b) for which we used the unpaired two-tailed t-test. p-values were not calculated for datasets with n < 3.
p-values are reported in the plots or figure legends. R software (v3.5.2) was used for statistical analysis.
All error bars indicate the standard error of the mean (SEM). All key experiments were repeated between two and five times independently, as indicated. Experiments of candidate's functional validation were repeated using three different hPSC lines. All qPCR experiments were performed with three technical replicates.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.