Generation and gene expression profiling of 48 transcription-factor-inducible mouse embryonic stem cell lines

Mouse embryonic stem cells (ESCs) can differentiate into a wide range – and possibly all cell types in vitro, and thus provide an ideal platform to study systematically the action of transcription factors (TFs) in cell differentiation. Previously, we have generated and analyzed 137 TF-inducible mouse ESC lines. As an extension of this “NIA Mouse ESC Bank,” we generated and characterized 48 additional mouse ESC lines, in which single TFs in each line could be induced in a doxycycline-controllable manner. Together, with the previous ESC lines, the bank now comprises 185 TF-manipulable ESC lines (>10% of all mouse TFs). Global gene expression (transcriptome) profiling revealed that the induction of individual TFs in mouse ESCs for 48 hours shifts their transcriptomes toward specific differentiation fates (e.g., neural lineages by Myt1 Isl1, and St18; mesodermal lineages by Pitx1, Pitx2, Barhl2, and Lmx1a; white blood cells by Myb, Etv2, and Tbx6, and ovary by Pitx1, Pitx2, and Dmrtc2). These data also provide and lists of inferred target genes of each TF and possible functions of these TFs. The results demonstrate the utility of mouse ESC lines and their transcriptome data for understanding the mechanism of cell differentiation and the function of TFs.

To identify the effect of each TF on the transcriptome of ESCs, we used microarrays for gene expression profiling after 48 hours of culturing cells without Dox. Cells cultured in the presence of Dox were used as a control (Fig. 1c). The 48 hour time point was selected based on time-course experiments with multiple TFs 2,8 . This interval is sufficient to observe the change of expression in a large set of downstream genes, but short enough to observe a substantial enrichment of direct targets among responding genes. An example of a scatterplot with color-coded upregulated and downregulated genes after induction of Dlx2 is shown in Fig. 1d. Principal Component Analysis indicated that the new set of tested TFs has, in general, weaker effects on the ESC transcriptome as compared to such TFs as Gata2, Gata3, Cdx2, Nrip1, Dlx3, Ascl1, Gbx2, and Klf4 that were tested before (Fig. 1e).
Association of downstream genes of TFs with tissue-specific expression, gene ontology, and phenotypes. To explore the changes in the expression of downstream genes, we compared our microarray data with three databases: (1) GNF database ver. 3 on tissue-specific gene expression 9,10 ; (2) Gene Ontology (GO) annotations 11 ; and (3) Genetic Association Database (GAD) on gene sets associated with mouse phenotypes 12 . Because the GNF database is quantitative and the other two are qualitative, we used different methods to quantify association: (1) correlation of median-subtracted log-transformed gene expression values 3 , and (2) parametric analysis of gene set enrichment, PAGE 13 (see Methods).
Comparison with the GNF database showed that the induction of individual TFs shifted the transcriptome toward specific differentiation fates. For example, gene expression change toward neural tissues was observed after induction of Myt1, St18, and Isl1; toward mesodermal lineages after induction of Pitx1, Pitx2, Barhl2, and Lmx1a; toward white blood cells after induction of Etv2, Myb, and Tbx6; and toward ovary after induction of Pitx1, Pitx2, and Dmrtc2 (Fig. 2). TFs associated positively with transcriptome changes toward specific lineages often showed a negative association with those toward different cell lineages. For example, effects of Myt1 correlated positively with neural tissues but negatively with blood lineages (Fig. 2). Validation of the cell-differentiation potential of each TF is beyond the scope of this paper because it requires longer experiments (6-14 days) and is specific for each cell lineage 4 . As an example, however, here we provide information on the capacity of three TFs (Myt1, Isl1, and St18) to facilitate ESC differentiation towards neural fate. ESC clones with transgenic TFs were cultured in Dox− and Dox+ medium (3 days in α MEM and then 3 days in NeuroCult), and then the proportion of cells with neural progenitor marker PSA-NCAM was quantified by FACS (Canto II, Becton Dickinson). Induction of two TFs, Myt1 and Isl1, (in Dox− condition) resulted in a substantial increase in the proportion of PSA-NCAM(+ ) cells as compared to control (Dox+ condition) (Fig. 3a,b), which confirms that these TFs facilitate neural differentiation. The effect of St18 induction was too weak to score positively; it was somewhat higher than in controls (Dox+ ) for the same clone, but did not differ from controls in other two clones.

Predicting direct targets regulated by TFs from gene expression change and TF binding.
We tested whether genes upregulated after induction of TFs were enriched in the binding of TFs to promoters and/or enhancers, if such information on genome-wide binding (ChIP-seq) was available in the GEO database. Statistically significant enrichment (PAGE method) was detected for four TFs: Etv2, Pitx1, Isl1, and Dlx2, out of ten tested TFs (Fig. 3c). We used ChIP-seq data from mouse Etv2 14 , Pitx1 15 , Isl1 16 , and human DLX2 17 , because there was no ChIP-seq data on mouse Dlx2. The other six TFs tested (Fezf2, Hoxc9, Msx1, Myb, Pitx2, and Prdm1) did not show significant enrichment.

Discussion
Systematic induction of individual TFs in undifferentiated ESCs followed by global gene expression profiling yields a useful resource for cell and molecular biology. It can identify TFs functioning upstream of any given gene, predict functional roles of TFs in cell differentiation, and select genes for potential application in gene therapy and regenerative medicine 2,3 . Correlation matrices of gene expression profiles between TF-induced ESCs and various tissues/organs can also provide candidate TFs, whose overexpression can induce the differentiation of ESCs into specific cell types, as we have shown in a proof of concept 4 . Further mining of the microarray results reported here as well as additional experiments with the ES cell lines and their derivatives could yield further insight into gene regulatory networks.
Previously published research provides a positive control for our bioinformatics-based functional analysis of gene expression change after induction of 48 transgenic TFs. For example, functions of Myt1, St18, and Isl1 in neural tissues has been described [19][20][21] . TFs Pitx1 and Pitx2 are known to be involved in limb development 22 , consistent with our analysis of their downstream effects associated with mesoderm lineages. Roles of Etv2 in angiogenesis is consistent between our analysis and published research 14,23 . Association of Myb with thymocytes has also been described 24 . By contrast, the effects of some TFs were not anticipated. For example, Barhl2 is known to function in the brain and spinal cord 25,26 , but in our data, the induction of Barhl2 in mouse ES cells gave non-neural effects similar to Pitx1 and Pitx2. As another example, Tbx6, which is known to determine neural and cardiac cell fate 27 , rather resulted in gene expression profiles trending toward macrophages (although GO annotations confirmed cardiac tendency as well). These discrepancies may point to additional unexplored functions of the TFs studied. Alternatively, or in addition, however, some effects observed in our experiments could be artifacts associated with the ectopic induction of TFs in the unusual context of ESC cultures in the medium employed. Thus, the unexpected results are both a caveat and a possible indication of new information.
Enrichment of TF binding in genes upregulated after the induction of Etv2, Pitx1, Isl1, and Dlx2 is in accord with the expectation that downstream effects of TFs are likely to be mediated by TF binding to promoters and enhancers of their targets, which is the primary mechanism of their regulatory function. However, we cannot rule out additional effects of TFs, such as binding to other signaling molecules, protein modification, remodeling of chromatin, or indirect effects caused by an initial rapid activation of another TF(s) followed by a cascade of further gene activation.
In general, the preliminary analyses reported here provide indications that the collection of mouse ES cell lines reported here can be a starting point for more extensive attempts to form lineages and even tissues in vitro. As an example, we confirmed the capacity of Myt1 and Isl1 to enhance neural differentiation of ESCs. All transgenic ESC lines are freely available to the research community as a resource. Similar experiments for more regulatory genes (ideally for all TFs, signaling proteins, and non-coding RNA) should give increasingly complete information about selective gene regulation in mammalian systems. The approach can be further expanded via altering culture conditions, possibly including growth factors, or even the activation of multiple TFs simultaneously.

Experimental Procedures
Cell culture and microarray hybridization. ESC lines carrying a tetracycline-regulatable TF were derived from MC1 (129.3) cell line, which was obtained from the expanded frozen stock at Johns Hopkins University, as described previously 2,3 . ESCs of passage 25 were cultured in the standard LIF+ medium with added Dox+ on a gelatin-coated dish through the experiments. Cells from each cell line were split into six wells and the media was changed 24 hours after cell plating: three wells with Dox+ medium, and three wells with Dox− medium to induce transgenic TFs. Dox was removed via washing three times with PBS at three-hour intervals. The proportion of Venus-p;ositive cells was evaluated by FACS (Canto II, Becton Dickinson). Total RNA was isolated by TRIzol (Invitrogen) after 48 hours, and two replications were used for microarray hybridization. RNA samples were labeled with total RNA by Low RNA Input Fluorescent Linear Amplification Kit (Agilent). We hybridized Cy3-CTP labeled sample from Dox− medium together with Cy5-CTP labeled sample from Dox+ medium (i.e., control) to the NIA Mouse 44K Microarray v3.0 (Agilent, design ID 015087) 28 . Slides were scanned with Agilent DNA Microarray Scanner. All DNA Microarray data are available in Table S2, at GEO/NCBI (http://www.ncbi.nlm.nih. gov/geo; GSE72350), and at NIA Array Analysis, http://lgsun.grc.nia.nih.gov/ANOVA 29 .
Neural differentiation of ESCs. For neural differentiation we used α MEM medium for 3 days followed   30 . The response of genes to the induction of TFs was measured as a logratio (i.e., difference between means of logtransformed intensities) between manipulated (Dox− ) and control (Dox+ ) cells. We considered gene expression change as significant if logratio was significantly different from zero (FDR ≤ 0.05) and the change of expression was ≥ 1.5 fold. Correlation of gene expression changes induced by TF manipulation (i.e., logratio of Dox− vs. Dox+ ) versus tissue-specific gene expression in the GNF database (i.e., logratio of each tissue vs. median) was evaluated using ExAtlas 30 . The correlation analysis was done using 15,709 genes that were significant in both data sets. Criteria of significance for the GNF database were FDR ≤ 0.05 and change ≥ 2 fold, which is higher than the 1.5 fold threshold used for our data on TF manipulation because the magnitude of gene expression difference between adult tissues was much greater than the magnitude of gene expression change after the induction of TFs. It was sorted first with hierarchical clustering, and then sorted manually.
Comparison of gene expression changes induced by TF manipulation with functionally annotated gene sets (i.e., GO, GAD, and sets of TF targets) was done using Parametric Analysis of Gene set Enrichment, PAGE 13 , implemented in ExAtlas 30 . PAGE was applied separately to upregulated genes (25% top genes sorted by logratio of Dox− vs. Dox+ ) and downregulated genes (25% bottom genes sorted by logratio). Sets of genes bound by TFs were identified from published ChIP-seq data for Etv2 (GSM1436364, GSM1436365, GSM1436367, GSM1436367); Pitx1 (GSM1019784, GSM1019786); Isl1 (GSM782848, GSM928985, GSM928986); Fezf2 (GSM1135048-GSM113504); Hoxc9 (GSM766060, GSM766061); Msx1 (GSM657516); Myb (GSM912903); Pitx2 (GSM1162577); Prdm1 (GSM1616574, GSM1616575); and Dlx2 (GSM1208724). Peak coordinates were downloaded from the GEO database or from supplements to publications 31,32 . For some TFs, we filtered out peaks with low scores (< 100 for Hoxc9, <60 for Pitx1, <8 for Pitx2). If multiple samples were available, we used only matching peaks in at least three samples for Etv2 or two samples for other TFs. ChIP-seq peaks were annotated based on genomic coordinates of RefSeq and ENSEMBL transcripts downloaded from the UCSC database (http://genome.ucsc.edu). Transcripts were scored based on gene symbol (valid symbols were assigned a score of 3, whereas clones and predicted genes were assigned a score of 1) divided by distance from the peak to the transcription start site, TSS (distances < 1Kb were counted as 1Kb). Each peak was associated with one or two highest-score transcripts, and the second transcript was included if its score was >25% of the highest score. TF binding within 0.5 Kb from TSS was classified as a promoter, and binding within 0.5-50 Kb from TSS was classified as an enhancer.