Pleomorphic Adenoma Gene 1 Is Needed For Timely Zygotic Genome Activation and Early Embryo Development

Pleomorphic adenoma gene 1 (PLAG1) is a transcription factor involved in cancer and growth. We discovered a de novo DNA motif containing a PLAG1 binding site in the promoters of genes activated during zygotic genome activation (ZGA) in human embryos. This motif was located within an Alu element in a region that was conserved in the murine B1 element. We show that maternally provided Plag1 is needed for timely mouse preimplantation embryo development. Heterozygous mouse embryos lacking maternal Plag1 showed disrupted regulation of 1,089 genes, spent significantly longer time in the 2-cell stage, and started expressing Plag1 ectopically from the paternal allele. The de novo PLAG1 motif was enriched in the promoters of the genes whose activation was delayed in the absence of Plag1. Further, these mouse genes showed a significant overlap with genes upregulated during human ZGA that also contain the motif. By gene ontology, the mouse and human ZGA genes with de novo PLAG1 motifs were involved in ribosome biogenesis and protein synthesis. Collectively, our data suggest that PLAG1 affects embryo development in mice and humans through a conserved DNA motif within Alu/B1 elements located in the promoters of a subset of ZGA genes.


INDEX Supplemental Materials and Methods
Page 3 Video S1 (a separate attachment), legend Page 5 Time-lapse video of WT embryo development Video S2 (a separate attachment), legend Page 5 Time-lapse video of matPlag1KO embryo development File S1 (a separate attachment), legend Page 5 de novo PLAG1 motifs and repetitive elements in human ZGA genes File S2 (a separate attachment), legend Page 5 GO clusters associated with delayed-activation and delayed-degradation genes File S3 (a separate attachment), legend Page 6 GO clusters associated with up-and downregulated genes during ZGA in WT mice File S4 (a separate attachment), legend Page 6 de novo PLAG1 motifs in mouse ZGA genes Figure S1 Page 7 Comparison of human PLAG1 and mouse Plag1 protein sequence Figure S2 Page 8 Plag1 deficiency does not have significant effects on ovaries and uterus Figure S3 Page 9 Size of the mRNA and spike-in reference RNA libraries Figure S4 Page 10 Expression of PLAG1 in human embryos Figure S5 Page 11 Expression of Plag1 transcripts in mouse ovary as shown by X-gal staining Figure S6 Page 12 Gene set comparison of mouse and human ZGA genes Figure S7 Page 13 Frequency of de novo PLAG1 motifs and B1 elements in delayed-activation and delayed-degradation gene promoters References Page 14 described previously 2 with the following modifications: barcoded 10 μM template-switching oligonucleotides were added prior to reverse transcriptase, ERCC spike-in Mix A was diluted 15,300-fold with clean water, and 1 μl was taken per library reverse transcriptase master mix. Twenty cycles of PCR were used for the first round of amplification and ten additional cycles for the second round to introduce Illumina-compatible universal sequences. Both libraries contained all developmental stages and genotypes.

RNA-seq data analysis
The reads were filtered, samples de-multiplexed, UMIs joined, reads trimmed and mapped to the reference mouse genome mm9 by TopHat 3 . The resulting bam files were converted to tag directories employing Homer 4 and were subsequently used to estimate the reads in all annotated genes. Annotations were in GTF file format retrieved from UCSC and were concatenated to a GTF file with the ERCC annotations. Gene counts were then imported to R 5 and libraries with a median gene expression of log2 counts per million (cpm) under 0 were excluded from further analysis. Cell libraries, excluding the ERCC spike-in counts, were normalized with EdgeR 6 using the TMM normalization method. The ERCC counts were used for normalization between the various embryonic cell stages by scaling the library sizes. EdgeR was also employed for the subsequent differential gene expression analysis, which was performed on genes that had 1 cpm in at least five or more samples and the rest of the genes were filtered out. After removal of low-expressed and unexpressed genes, the gene counts were renormalized. Principal component analysis was performed in R by using the genes that were significant in any of the comparisons between the genotypes. Heatmaps were plotted on TMM-normalized counts exported from EdgeR and gene expression was standardized across all samples (mean = 0 and SD = 1). Samples and genes were clustered using hierarchical clustering in R and plotted employing the ComplexHeatmap library 7 . The same gene set was used for the cell trajectory (pseudotime) analysis by the monocle package 8 . Gene ontology analysis was carried out in R with the topGO library. To identify enriched GO terms, the classic algorithm and Fisher statistic were used and analysis was carried out on up-and downregulated genes separately. Semantic similarity between the GO terms was calculated using the Wang algorithm in the GOSemSim bioconductor package 9 . The result is given as the best-match average (BMA) score that ranges from 0 to 1. Gene set enrichment analysis was conducted to test whether the mouse genes homologous to the human genes regulated between the 4c and 8c stage were also regulated in mouse development. To calculate the p values, the geneSetTest function from the limma package 10 was used. The moving average of the enrichment was calculated with the tricubeMovingAverage function and plotted with ggplot2. Homologene from NCBI was used to convert the human genes to the homologous mouse genes. The significance in overlap of the human genes with the mouse genes regulated in the KO at the 2c stage was calculated with the Fisher test in R. Genes were also converted to protein families using the bioconductor libraries for genome-wide annotation for human and mouse (org.Hs.eg.db/org.Mm.eg.db). For genes that had more than one protein family annotated to them, only one of the protein families was used in order not to inflate the number of overlapping or non-overlapping families between the different gene groups.

Promoter analyses
Human embryo promoter analysis was performed as previously described 11 . Briefly, the de novo motif was compared with known motifs by TomTom 12 . We applied MEME 13 for motif analysis within the upregulated promoters, and identified sequences similar to the PLAG1 motif (MA0163.1 in JASPAR) 14,15 using MAST 16 . The location of Alu elements within the promoters was based on the RepeatMasker track in the UCSC Genome Browser. Human and mouse SINE repetitive elements DF0000002 (AluY), DF0000051 (AluSz), DF0000034 (AluJo), DF0000144 (FLAM_C), DF0000016 (7SLRNA), DF0003101 (PB1) and DF0001733 (B1_Mm) were retrieved from the Dfam database 17 and aligned and highlighted according to the percent identity by JalView2. Mouse embryo promoter analysis was carried out with Homer 4 . The hits of the repetitive elements and the motifs were imported into R. Repetitive elements that fell within the promoter regions (TSS -2,000 bp to +500bp) were kept and the distance to the nearest TSS calculated for the repeats and the de novo PLAG1 motifs and plotted using ggplot2. Enrichment was analyzed using Fisher's exact test.
Video S1. Sheet AluJSY Position of Alu elements in the human ZGA TFEs within -2,000 bp upstream to 500 bp downstream from their transcription start site (TSS). AluS/J/Y elements were extracted from the RepeatMasker track of the UCSC Genome Browser, and the columns about TFE were joined. The type of element, its position ("left" and "right" within the promoter 0-2,500), TFE region, promoter region, TFE and promoter strand, associated gene, and TFE position within the gene are shown.
File S2. Gene ontology (GO) clusters associated with delayed-activation and delayeddegradation genes Genes affected by maternal Plag1 deficiency in 2-cell mouse embryos were assigned to GO categories using topGO library in R 5 using the classic algorithm and Fisher statistics. Top 150 GOs by p-value were then clustered based on their semantic similarity. Clusters 1-8 (delayed up) and 1-6 (delayed down) are shown.
File S3. Gene ontology (GO) clusters associated with up-and downregulated genes during zygotic genome activation (ZGA) in wildtype (WT) mice Genes up-and downregulated during major ZGA in wild type mouse embryos (2-cell to 8-cell transition) were assigned to GO categories using topGO library in R 5 using the classic algorithm and Fisher statistics. Top 150 GOs by p-value were then clustered based on their semantic similarity. Clusters 1-9 are shown.
File S4. PLAG1 de novo motifs in mouse zygotic genome activation (ZGA) genes All annotated mouse promoters were scanned for the presence of the de novo PLAG1 motif from -2,000 bp upstream to 500 bp downstream of transcriptional start sites (TSSs) using Homer 4 . Delayed-activation genes that have de novo PLAG1 motifs in their promoters are shown (symbol, entrez, refSeq) together with the motif sequence, strand and genomic coordinates of the motif as well as the location of the transcription start site (TSS) and the distance relative to the direction of the gene. The last column indicates if the gene is also present among the human ZGA genes 11 .  Comparison of amino acid sequences between human PLAG1 (RefSeq NP_002646.2) and mouse PLAG1 (NP_064353.2) by NCBI blastp 18 indicating 94% similarity. The seven C2H2 zinc-finger domains are labeled (+---+) based on UniProt protein knowledge base 19 (entry Q6DJT9). Domains 6 and 7 bind to the "core" of the PLAG1 consensus sequence while domain 3 binds to the "cluster" [Figure 1c (v) and 20 ]. The amino acid sequences of these three domains are identical between mice and humans except for position 191 (red) within domain 6. However, human PLAGL1 has a glutamic acid residue (E) at the corresponding position of the PLAG1 D191E, and binding preference to G-rich "core" was highly conserved in both of PLAG1 and PLAGL1 21 . Moreover, similarity of the C2H2 domains of mousePLAG1 to human PLAG1 is higher than human PLAGL1. Therefore, although mouse PLAG1 has one inconsistent residue within the C2H2 domain for binding, preference of the binding site sequences would be identical between human and mouse PLAG1

Figure S7
Frequency of de novo PLAG1 motifs and B1 elements in delayed-activation and delayed-degradation (a) Location of B1 repetitive elements (black lines) and de novo PLAG1 motifs (red dots) along the promoters of delayed-activation and delayed-degradation genes from -2,000 bp to +500 bp around the transcription start site (TSS). Enrichment of the sites as shown as colorcoded lines above the graphs. (b) Empirical cumulative distribution function showing the distribution of de novo PLAG1 motifs along the promoters of delayed-activation and delayed-degradation genes.