Characterization and target genes of nine human PRD-like homeobox domain genes expressed exclusively in early embryos

PAIRED (PRD)-like homeobox genes belong to a class of predicted transcription factor genes. Several of these PRD-like homeobox genes have been predicted in silico from genomic sequence but until recently had no evidence of transcript expression. We found recently that nine PRD-like homeobox genes, ARGFX, CPHX1, CPHX2, DPRX, DUXA, DUXB, NOBOX, TPRX1 and TPRX2, were expressed in human preimplantation embryos. In the current study we characterized these PRD-like homeobox genes in depth and studied their functions as transcription factors. We cloned multiple transcript variants from human embryos and showed that the expression of these genes is specific to embryos and pluripotent stem cells. Overexpression of the genes in human embryonic stem cells confirmed their roles as transcription factors as either activators (CPHX1, CPHX2, ARGFX) or repressors (DPRX, DUXA, TPRX2) with distinct targets that could be explained by the amino acid sequence in homeodomain. Some PRD-like homeodomain transcription factors had high concordance of target genes and showed enrichment for both developmentally important gene sets and a 36 bp DNA recognition motif implicated in Embryo Genome Activation (EGA). Our data implicate a role for these previously uncharacterized PRD-like homeodomain proteins in the regulation of human embryo genome activation and preimplantation embryo development.


Supporting Information
Supplemental Text S1. Predicted sequences for PRD-like gene cloning primer design Supplemental Text S2. UCSC genome browser tracks for novel PRD-like homeobox genes Supplemental Table S1. PRD-like homeobox gene expression in the FANTOM5 comprehensive promoter database. The number of predicted promoters, normalized tagclusters in predicted transcript or in the whole region of the gene are shown. Values near or below 1 may be considered noise, explaining the lack of predicted promoters for very low expression values. Counts of tagclusters from the FANTOM5 promoter database correlate with the expression of the genes.
Supplemental Table S2. The up-and downregulated gene sets and corresponding TFEs and genomic annotations. All the detected up-and downregulated target genes are shown on separate sheets for each homeodomain transcription factor. The down-regulated and up-regulated target genes are sorted according to the differential expression score from lowest to highest. TFE-s (TFE), gene names (Gene.Name), differential expression score (Score.d.), q-value (q.value...), fold change (Fold.Change), TFE genomic annotation location (annotation), and differential expression direction (regulated) are shown. The column Expressed.in.preimplantation shows whether the gene was detected (average RPKM values over oocytes, zygotes, 2-, 4-, 8-cell embryos, morulaes and blastocysts > 1) or not detected (average RPKM < 1) in Yan, et al., 2013 dataset.  Supplemental Figure S4. Similarities in protein sequence inside and outside of the homeodomain. Alignment of a) CPHX1 and CPHX2 and b) TPRX1 and TPRX2 shows protein sequence similarities outside of the highly conserved homeodomain. Figure S5. Chromosome ideogram visualizing the genomic locations of the PRD-like homeobox genes. The clustering of DUXA, DPRX, TPRX2 and TPRX1 on chromosome 19 is shown, with the position of CRX given as well. There is a second cluster of genes on chromosome 16, namely CPHX1, DUXB and CPHX2, whereas TPRXL, ARGFX, NOBOX and OTX2 are located outside of these 2 clusters on chromosome 3, 7 and 14 respectively. Figure S6. Expression of homeobox genes in hESCs and 8-cell embryo. PCR was performed on tree different hESC lines Hs401, H9, HS980 and 8-cell blastomeres. Expression values were obtained by normalizing to GAPDH expression by the formula y= (Ct(GAPDH) -Ct(gene)) + 25. The average of three replicas is shown with standard deviation represented by error bars. A Ct value of 40 was used for not detected samples. Figure S7. The methylation patterns of the PRD-like homeobox genes in human sperm, preimplantation embryo, ESC and embryonic tissue cells. The 100 bp methylation value percentages were plotted for sites within 500 bp up-and downstream of the TFE for the specific homeobox gene. Only sites with at least 5 x coverage were used. Circles indicate each data-point and nothing is plotted when data is missing. Color of the boxplot indicates sample groups: redsperm; green -8-cell preimplantation embryo (8-cell); dark blueinner cell mass of the preimplantation embryo (ICM); light bluetrophectoderm of the preimplantation embryo (TE); purpleblastocyst of the preimplantation embryo (Blast, BlastSingle); yellowembryonic stem cells; greyfetal heart and lung tissues.

Supplemental
Supplemental Figure S8. The methylation patterns of the PRD-like homeobox genes in blood cells. The M-values for CpG sites surrounding the homeobox genes ARGFX, DPRX, DUXA, NOBOX and the control gene OTX2 are shown for human blood cells from Infinium Human Methylation 450K bead chip array. The values below 0 indicate hypomethylation, the values above 0 indicate hypomethylation. Values are indicated by physical distance from the TSS of every gene for up to 1500 bp upstream (TSS1500) throughout the 5' UTR, 1-st Exon, Gene Body and 3'UTR.
Supplemental Figure S9. The pFastBac vector construct is functional in hESCs. mCherry red fluorescent protein was cloned into the pFastBac vector in the same manner as the homeobox genes. Both the microscopy images with bright field in top panel, followed by mCherry and eGFP channels (a) and FACS analysis (b) show simultaneous expression of both eGFP and mCherry proteins in the same cell (boxed). Multiple sequence reads from GFP positive samples align to the overexpression vector pFastBac backbone sequence while almost no reads align to the backbone from the GFP negative samples in library 4 (c). Supplemental Figure S11. PRD-like homeodomain transcription factors have common target genes. Similarity of the homeobox target genes was observed by performing a chi-squared test of all pairwise comparisons for both up-and downregulated genes. The number of intersecting genes is shown, as well as a multiple testing corrected p-value associated with the chi-squared test. p < 10 -2 *, p < 5 x 10 -5 **, p < 5 x 10 -8 ***. Figure S12. Hierarchical clustering and outlier detection. Hierarchical clustering reveals outliers in library 3 (a): C5 and E8. There are no evident outliers in library 4 (b).

Predicted sequences for PRD-like gene cloning primer design
Coordinates refer to Genome assembly GRCh37/hg19 Forward primers were designed within predicted the 5' UTRs, and reverse primer within the predicted 3'UTRs.