Human transcription factors responsive to initial reprogramming predominantly undergo legitimate reprogramming during fibroblast conversion to iPSCs

The four transcription factors OCT4, SOX2, KLF4, and MYC (OSKM) together can convert human fibroblasts to induced pluripotent stem cells (iPSCs). It is, however, perplexing that they can do so only for a rare population of the starting cells with a long latency. Transcription factors (TFs) define identities of both the starting fibroblasts and the end product, iPSCs, and are also of paramount importance for the reprogramming process. It is critical to upregulate or activate the iPSC-enriched TFs while downregulate or silence the fibroblast-enriched TFs. This report explores the initial TF responses to OSKM as the molecular underpinnings for both the potency aspects and the limitation sides of the OSKM reprogramming. The authors first defined the TF reprogramome, i.e., the full complement of TFs to be reprogrammed. Most TFs were resistant to OSKM reprogramming at the initial stages, an observation consistent with the inefficiency and long latency of iPSC reprogramming. Surprisingly, the current analyses also revealed that most of the TFs (at least 83 genes) that did respond to OSKM induction underwent legitimate reprogramming. The initial legitimate transcriptional responses of TFs to OSKM reprogramming were also observed in the reprogramming fibroblasts from a different individual. Such early biased legitimate reprogramming of the responsive TFs aligns well with the robustness aspect of the otherwise inefficient and stochastic OSKM reprogramming.


Results
Defining the set of transcriptional factors to be reprogrammed, the TF reprogramome. Transcription factors (TF) are critical in defining any cell type [11][12][13] . Human pluripotent stem cells (PSCs) should have a defined set of TFs, so do the starting cells for iPSC reprogramming, fibroblasts. In order to convert human fibroblasts to iPSCs, it is of paramount importance to reprogram the TFs to the expression levels of the pluripotent state from that of fibroblasts. It is not clear what the full TF differences are between the pluripotent cells and the starting somatic fibroblasts, i.e., the TF reprogramome. In order to find out the TF transcriptional differences between PSCs and fibroblasts, this study compared the expression of the entire set of human TFs based on RNA-seq data we recently published 9,10 . Several groups attempted to define the repertoire of human TFs [14][15][16] . The latest revised version by Lambert et al. was used in this study 17 . The RNA-seq data were extracted for the Lambert set of 1639 human TFs, but the current report concerns with 1636 TFs only because ZNF788 is a pseudogene, and DUX1 and DUX3 were not annotated in the Ensembl database (Table S1). Of the 1636 TFs, 315 were not expressed in both ESCs and fibroblasts, while 442 TFs were expressed in both cell types at similar levels ( Fig. 1A). Two hundred and seventy-nine (279) TFs were enriched by at least 2-fold (q < 0.01) in fibroblasts compared to human ESCs, and they constitute the TF downreprogramome (Table S2,  There are 110 zinc finger, 18 HOX, 11 forkhead box, and 7 T-box TF genes in the downreprogramome. Within the TF downreprogramome, 93 TFs were expressed in fibroblasts only, constituting the TF erasome (Table S3, Fig. 1D). Of note, all of the 18 HOX TF genes are in the erasome, and there are 32 zinc finger TF genes in the erasome. Excluding the erasome, additional 71 TFs of the downreprogramome were highly enriched (by > 5-fold) (Fig. 1D). In the entire downreprogramome of 279 TFs, 217 were enriched by at least 3-fold.
Three hundred and ten (310) TFs are enriched in hESCs by at least 2-fold, and constitute the TF upreprogramome (Table S4, Fig. 1A-C, and Fig. S2). As expected, the established pluripotent TFs are in the upreprogramome including POU5F1, NANOG, SOX2, ZFP42, ZSCAN10, FOXD3, PRMD14, ZIC3, SALL4, and others. Interestingly, there are 198 zinc finger TF genes of different types (genes with the designations of ZNF, ZFP, ZIC, ZSCAN, GATA, KLF, SALL, ZBTB, and others) in the upreprogramome. Therefore, zinc finger TFs represent 63.9% of the TF upreprogramome. In contrast to the downreprogramome, there is not a single HOX gene in the upreprogramome. In fact, all the 39 HOX genes of human genome are silenced in hESCs. The upreprogramome contains 8 SOX, 4 POU, and 4 SALL genes but the downreprogramome includes no SALL and SOX genes and only one POU gene. Within the TF upreprogramome, 70 TFs were expressed in hESCs only, constituting the TF activatome (Table S5, Fig. 1B). Excluding the activatome, 80 additional TFs are highly enriched in hESCs (by > 5-fold) (Fig. 1B). There were 28 zinc finger TFs in the activatome. Two hundred and ninety of the human TFs could not be classified into the above categories because they fell into the marginal areas based on the selection criteria used here (Fig. 1A). Combining the TF downreprogramome and upreprogramome, the TF reprogramome includes 589 TFs that should be reprogrammed by at least 2-fold. Interestingly, the downreprogramome is characterized by HOX genes while the upreprogramome is featured by SOX, SALL, and POU TFs, and is dominated by zinc finger TFs of different types.
GO analyses of the upreprogramome and downreprogramome resulted in very different pictures. The majority of the pluripotent TF GO terms were generic while the fibroblast ones were very promiscuous. At the FDR < 0.01 level, the TF downreprogramome was overrepresented by 428 different GO terms (Table S6) (Table S7). Of these 103 pluripotent TF GO terms, 90 were shared with that of fibroblasts. Among the 13 unique GO terms for ESCs were "chromatin organization" (26 genes), "stem cell population maintenance" (15), "regulation of cell cycle arrest" (9), and "maintenance of cell number" (15) (Fig. S3). As expected, ESC TFs were overrepresented in "reactome pathways" analyses by the GO term of "transcriptional regulation of pluripotent stem cells" (11 genes), but "generic transcription pathway" (116) and "RNA polymerase II transcription" (119) were also overrepresented. Among these GO terms unique to fibroblasts were "response to chemical" (99 genes), "response to stress" (79), "regulation of programmed cell death" (57), "limb development" (26), "muscle tissue development" (21), "brain development" (27), and "response to mechanical stimulus" (12) (Fig. S4). One interesting GO term in "reactome pathway" analyses of fibroblastenriched genes was "transcriptional regulation of white adipocyte differentiation" (11 genes). In sum, the enriched GO terms for fibroblast TFs appear to be more tissue-specific and promiscuous, and those for pluripotent TFs are more generic and stem cell-related.
A portion of the TF reprogramome are resistant to reprogramming. Yamanaka reprogramming is very inefficient, slow and stochastic. Previously, we reported that 953 genes are resistant to iPSC reprogramming at the initial stages 10 . We hypothesized that a portion of transcription factors in the reprogramome is among those genes irresponsive to OSKM reprogramming considering the low efficiency and long latency of iPSC reprogramming. To test this, the TF reprogramome was examined for the reprogramming statuses of its member genes upon OSKM reprogramming.  Table S9). Clustering analyses indicated that these irresponsive TFs remained similar to those in the starting naïve fibroblasts and in the fibroblasts transduced with GFP. In the upreprogramome, 124 zinc finger TFs were irresponsive to OSKM reprogramming including 4 ZSCAN zinc finger genes (ZSCAN2, ZSCAN10, ZSCAN16, and ZSCAN31). In the downreprogramome, 13 HOX genes were resistant to OSKM reprogramming. As expected, 7 ESC-enriched genes with the "reactome pathway" GO term of "transcriptional regulation of pluripotent stem cells" were among the TF genes resistant to reprogramming. These include NANOG, LIN28A, ZSCAN10, FOXD3, PRDM14, ZIC3, and HIF3A. Interestingly, 66 ESC-enriched genes of the "reactome pathway" GO term of "generic transcription pathway" were also resistant to OSKM reprogramming.  Table S10). As expected, the four reprogramming factors OCT4 (POU5F1), SOX2, KLF4, and MYC were among this list. Therefore, only 49 TFs were significantly upregulated by at least 2-fold at the initial stages. The fold upregulation ranged from 2-to around 17-fold and up to de novo activation of 12 TF genes. On the other hand, 70 TFs were downregulated by OSKM by at least 2-fold (q < 0.01) compared to the naïve fibroblasts and the fibroblasts transduced with GFP viruses ( Fig. S8 and Table S11). The fold downregulation ranged from 2-to 14-fold, and up to silencing of 5 TF genes.

Most of the downregulation of TFs by OSKM is legitimate reprogramming.
The legitimacy of a transcriptional change of a TF induced by OSKM should be evaluated by the relative expression levels of individual TFs in PSCs to that in fibroblasts 10 (Fig. 3A). Upregulation of a gene is legitimate if its expression is higher in PSCs, while it is not when its expression is lower in PSCs. On the other hand, downregulation of a gene is legitimate if its expression is lower in PSCs while it is not when its expression is higher in PSCs. If the expression level of a gene is similar in both cell types both up-and down-regulations by the OSKM reprogramming factors are illegitimate, which constitute aberrant reprogramming 10 .
Using this logic, we evaluated the 70 TFs downregulated by OSKM for their reprogramming legitimacy. Surprisingly, only one TFs was enriched in ESCs by at least 2-fold and 49 were enriched in fibroblasts by at least 2-fold (Table S12). Because of this biased enrichment of the downregulated TFs for fibroblasts, the criteria were then loosened to a significance level of q < 0.05 for significant differences at any level. Fifty six out of the 70 genes downregulated by OSKM were expressed significantly higher in fibroblasts by at least 1.46-fold, indicating legitimate downreprogramming of these 56 TFs (Fig. 3B). A scrutiny of the 56 genes indicated that 21 were properly reprogrammed and became clustered with ESCs and away from fibroblasts and the fibroblasts transduced with GFP viruses (Figs. 3C,D, S9). Thirty-three of those 56 TFs were downregulated significantly towards the pluripotency levels although the downreprogramming is insufficient (Figs. 3C,E, S10), indicating a positive drive to the pluripotency states with some deficiency. In summary, 80% of the downregulated TFs by OSKM underwent legitimate downreprogramming. ). However, 7 of the fibroblast-enriched genes upregulated by OSKM were wrongly upreprogrammed and became a separate independent group in clustering analyses (Fig. 4A,E). Of the 9 TFs with similar expression in both cell types, three can be considered as unwanted activation because they are not expressed in both cell types while four are unwanted upreprogramming (data not shown). In summary, legitimate reprogramming is predominant among the 49 upregulated TFs (67.5%).    Fig. S13A,B). Out of the remaining 87 TFs not shared by BJ, 59 were still expressed significantly higher in ESCs than in the BJ when the sorting criteria were loosened (p < 0.05 with significant differences at any levels) (Fig. S13C). At the same time, out of the 47 ESC-enriched TFs in the BJ list but not in the CRL list, 28 were still enriched in ESCs compared to CRL when the selection criteria were loosened (Fig. S13D). Notably, only in one rare case we saw conflicting results between the two fibroblasts. DBP expression was significantly higher (2.1 ×) in ESCs compared to one fibroblast (CRL) but significantly lower (− 1.6 ×) in ESCs compared to another fibroblast (BJ). Similarly, the majority of the TF down-reprogramomes were shared by the two fibroblasts ( Supplementary  Fig. S14, and Supplementary Table S14).

Independent human fibroblasts from a different individual have a similar
The initial TF responses to Yamanaka factors in an independent human fibroblast are predominantly legitimate reprogramming. Next, we investigated whether the initial legitimate reprogramming of transcription factors can be observed in an independent fibroblast cell line. For this purpose, we sequenced RNA from the fibroblast CRL undergoing early reprogramming. As in BJ cells, OSKM were all overexpressed well in CRL cells in all samples (Supplementary Fig. S15). OSKM upregulated 219 TF at both 48 and 72 h post factor transduction ( Fig. 5A and Supplementary Table S15). As seen with the BJ cells, these upregulated TF are predominantly ESC-enriched (129 out of 219) (Fig. 5B). Classification of "insufficient up", "proper up" and "over up" eliminate some genes with legitimate reprogramming because of the stringent sorting criteria applied. In fact, these 129 TFs can be considered as legitimate reprogramming (insufficient up, proper up and over up) (Fig. 3A) since overexpression of pluripotency factors might be beneficial to reprogramming as we have seen with the OCT4 and SOX2 reprogramming factors. We also examined the situation of these 129 TFs in BJ cells. None of these 129 TFs have significantly higher expression in BJ cells than in ESCs and the majority of them (116) are expressed significantly higher in ESCs than in BJ ( Supplementary Fig. S15 and data not shown). None   Fig. S16). However, 65 TFs were upregulated by OSKM in CRL cells when they should not be, and 25 TF were upregulated when they should be downregulated (Supplementary Table S15). Nevertheless, the majority (59%) underwent legitimate upreprogramming and were largely conserved between the two fibroblast types. Table S16). Out of the 118 genes, 89 were enriched by at least 2-fold in the fibroblast CRL, indicating legitimate downreprogramming (Fig. 5C,D). Importantly, none of these 89 TFs exhibited significantly higher expression in ESCs than in the other fibroblast BJ, and the majority of them (79 TFs) displayed significantly higher expression in BJ than in ESCs (Supplementary Fig. S17 and data not shown). Furthermore, none of the 89 genes was significantly upregulated and the majority (71 TFs) of those were significantly downregulated in BJ cells by OSKM at 96 h of reprogramming. Like in CRL, these 89 genes became clustered with ESCs in the reprogramming CRL cells at 96 h ( Supplementary Fig. S17), indicating conserved legitimate downreprogramming between the two different fibroblasts.

Discussions
OSKM can convert a rare population of fibroblasts into the pluripotent state with an extended long latency 4,5 . This is in sheer contrast to the oocyte reprogramming, which is authentic and fast 6 . OSKM cannot activate the master transcriptional network of pluripotency directly. The key pluripotent factors, OCT4, SOX2, NANOG and others, are activated very late in the reprogramming process. Previous efforts have tried to identify molecular events underlying OSKM reprogramming of fibroblasts into pluripotency. However, those researches ignored the fact that 99% of the cells do not go in the direction towards pluripotency and represent the noise signals of the data. Those authors implicitly treated all the transcriptional responses to OSKM induction as positive reprogramming. To mitigate this limitation, the authors developed the concepts of reprogramome and reprogramming www.nature.com/scientificreports/ legitimacy 9,10 . Using these concepts, a transcriptional response to the OSKM reprogramming can be evaluated as positive, negative (aberrant reprogramming), or irresponsive i.e., legitimate, illegitimate, and no responses, respectively. In previous reports, the authors evaluated the transcriptional responses of all human genes without specific examination of the transcription factors. In this report, the authors evaluated the reprogramming legitimacy of the transcriptional responses of the entire set of human transcription factors to OSKM reprogramming at the initial stages (48, 72, and 96 h). In agreement with the inefficiency, long latency, and stochastic nature of OSKM reprogramming, it was found here that the majority of human transcription factors (296 TFs) were irresponsive to OSKM induction. This report also identified some transcription factors that underwent aberrant reprogramming such as wrong and unwanted reprogramming. These data provide molecular interpretation for the inefficiency and stochastic nature of OSKM reprogramming. When we specifically analyzed the reprogramming legitimacy of TFs in this report, a surprising discovery is that the majority of transcription factors, which did respond to OSKM induction, underwent legitimate reprogramming. This phenomenon was also observed in an independent human fibroblast cell line from a different individual. The population of transcription factors undergoing legitimate reprogramming is not small. 18 PSC-enriched TFs were properly upreprogrammed to the levels found in PSCs, while 21 somatic TFs were properly downreprogrammed to the levels of pluripotency. Additionally, 11 PSC-enriched TFs were significantly albeit insufficiently reprogrammed towards the pluripotent levels while 33 somatic TFs were significantly downregulated albeit insufficiently downreprogrammed towards the pluripotent levels. These observations may under-estimate the number of legitimate TF reprogramming since classification of legitimate reprogramming into proper, insufficient and over reprogramming usually eliminate some genes undergoing legitimate reprogramming as outlined in Fig. 3A. In fact, we observed many more legitimate TF reprogramming simply using the rationale in Fig. 3A (Fig. 5).
Transcription factors are critical in defining the transcriptional programs and identities of any cell type [11][12][13] . TFs are also critical in cellular reprogramming. In fact, all the four conventional pluripotency reprogramming factors are transcription factors 21 . Lineage-specific transcription factors can reprogram fibroblasts into the corresponding functional somatic cells [22][23][24] . Here, the legitimate reprogramming of a large set of transcription factors at the initial stages provides molecular underpinnings for the ability of OSKM to push some reprogramming fibroblasts to the pluripotent state. At the same time, the inability of OSKM to incite the required transcriptional changes of the transcription factors in the TF reprogramome at the early stages explains in part why OSKM is very inefficient, slow and stochastic.

Methods
Cell lines and cultures. The NIH-registered human embryonic stem cell (ESC) lines H1 (WiCell, Madison, WI) and H9 (WiCell, Madison, WI) were cultured in the chemically defined media as described before 2,3,9,10 . Briefly, hESCs were cultured on Matrigel-coated vessels with the E8 media 25 , and passaged using the EDTAmediated dissociation when they reach 80% confluency.

Lentivirus vector production. Lentivirus vectors were generated using the PEI-mediated transfection of
Lenti-X 293T cells (Takara, Cat. 632180) by the reprogramming plasmids. Briefly, 2 × 10 7 Lenti-X 293T cells were seeded into one 150-mm dish and cultured in expansion medium: DMEM-F12 (Gibco, Cat. 12400-024) supplemented with 10% FBS (Gibco, Cat. 10437-028). Twenty-four hours post seeding and at least 2 h before transfection, the spent medium was replaced with 24 mL of fresh expansion medium. Mix the envelope, packaging and transfer plasmids at a ratio of 1:3:4 (total amount of 60 µg of plasmids) in 3 mL of DMEM-F12 medium, and then mix the plasmid solution with 3 mL of "PEI solution" containing 60 µg/mL of Polyethylenimine "Max" (PEI, Polysciences Inc., Cat. 24765-2). Incubate the 6 ml of transfection mix for 15 min at room temperature, and then add the resulting DNA complex dropwise into the cell cultures. The cultures were incubated for 16 h at 37 °C, 5% CO 2 . After the 16-h transfection, the transfection medium was replaced carefully with 20 ml of fresh complete expansion medium, and the cells were incubated for additional 72 h. Medium containing lentiviral particles was then harvested and filtered using Stericup PVDF membrane filters of 0.45-µm pore size (Millipore). The transduced cells were cultured in fibroblast medium until RNA harvest, which were 48, 72, and 96 h post transduction. We consistently reach > 90% of transduction efficiency over the years using these constructs as judged by GFP expression and flow cytometry 3 ( Supplementary Fig. S15). Efficient overexpression of all four reprogramming factors in all of our nine reprogramming RNA-seq samples was indicated by the elevated normalized read counts of the transgenes in each sample ( Supplementary Fig. S15) 10 .
RNA preparation for RNA-seq. RNA  Criteria for an active gene with similar expression in both cell types. A gene is considered active in both cell types with equal expression when the following conditions are met: (1) all replicates have a normalized read count greater than 50; (2) the q value should be greater than 0.01; (3) the fold difference between the two cell types should not be equal or greater than 2 regardless of the q value. In this group, all normalized individual read counts are > 50.
Defining the inactive gene set for both cell types. All normalized read counts are < 50 including all replicates for both cell types. The rationale of this value as the threshold of an active gene has been described before 9 .
Defining activatome (hPSC-specific gene set) and erasome (fibroblast-specific gene set). A gene will be a member of activatome or erasome when the following conditions are met: (1) the active cell type, e.g. hPSCs for activatome, should have a normalized read count of greater 50 for all individual replicates while the silent cell type should have a normalized read count less than 50 for all individual replicates; (2) the fold differences should be greater than 2; (3) the q value should be less than 0.01.
Defining upreprogramome (hPSC-enriched gene set) and downreprogramome (fibroblast-enriched gene set). An enriched gene in any cell type should meet the following criteria: (1) the normalized read count should be greater than 50 for all replicates; (2) the enrichment should be at least 2-fold; (3) the q value should be < 0.01. The rationale for these criteria have been described before 10 .
Defining the irresponsive TF genes to OSKM induction. For TF downreprogramome and upreprogramome, an irresponsive TF gene to OSKM induction should meet the following criteria: (1) the fold changes should be less than 2-fold (upregulation or downregulation), and any gene with a fold change of > 2-fold is removed from this list regardless of the significant levels; (2) any gene with a significant level of q < 0.01 is removed from the list regardless of the levels of fold changes. i.e., even though the fold change is 1.5-fold it will Scientific Reports | (2020) 10:19710 | https://doi.org/10.1038/s41598-020-76705-y www.nature.com/scientificreports/ be removed from the list of irresponsive TF genes if the change is significant; (3) exclude any gene for which the difference between OSKM reprogramming cells and the ESC become less than 2-fold regardless of the significance status. For upreprogramome, the following additional criteria were applied. If the gene remain inactive after OSKM induction (normalized read count < 50), it is considered irresponsive even though the fold change is > 2 and is statistically significant.

Data visualization.
Heat maps were prepared using the R package pheatmap (Version 1.0.12) in RStudio (Version 1.3.1073) (https ://rstud io.com/) on a desktop iMAC (Version 10.15.6) as described 35 . Boxplots were generated using the generic R function of boxplot() in RStudio as described recently 36 . Both heat maps and boxplots were prepared using the log2-transformed read counts. Ladder plots were generated using the R package of plotrix (Version 3.7-8) in RStudio.