TBPL2/TFIIA complex establishes the maternal transcriptome by an oocyte-specific promoter usage

During oocyte growth, transcription is required to create RNA and protein reserves to achieve maternal competence. During this period, the general transcription factor TATA binding protein (TBP) is replaced by its paralogue, TBPL2 (TBP2 or TRF3), which is essential for RNA polymerase II transcription. We show that in oocytes TBPL2 does not assemble into a canonical TFIID complex. Our transcript analyses demonstrate that TBPL2 mediates transcription of oocyte-expressed genes, including mRNA survey genes, as well as specific endogenous retroviral elements. Transcription start site (TSS) mapping indicates that TBPL2 has a strong preference for TATA-like motif in core promoters driving sharp TSS selection, in contrast with canonical TBP/TFIID-driven TATA-less promoters that have broader TSS architecture. Thus, we show a role for the TBPL2/TFIIA complex in the establishment of the oocyte transcriptome by using a specific TSS recognition code.


Introduction 39 40
Regulation of transcription initiation by RNA Polymerase II (Pol II) is central to all 41 developmental processes. Pol II transcription requires the stepwise assembly of multi-protein 42 complexes called general transcription factors (GTFs) and Pol II 1 . The evolutionary conserved 43 TFIID complex plays a major role in transcription initiation as it is the first GTF to initiate the 44 assembly of the pre-initiation complex (PIC) by recognizing the core promoter 2 . TFIID is a 45 large multi-protein complex composed of the TATA box-binding protein (TBP) and 13 TBP-46 associated factors (TAFs) in metazoa 3 . The model suggesting that transcription is always 47 regulated by the same transcription complexes has been challenged in metazoans by the 48 discovery of cell-type specific complexes containing specialized GTF-, TBP-or TAF-49 paralogs 4 . Two TBP paralogues have been described in vertebrates: TBPL1 (TBP like factor; 50 TLF, also known as TRF2) has been identified in all metazoan species 5-10 , while TBPL2 (also 51 known as TRF3 or TBP2) has only been described in vertebrates 11,12 . Remarkably, while 52 Tbpl1 and Tbpl2 mutants display embryonic phenotypes in non-mammalian species 7-10,12,13 , 53 Tbpl1 and Tbpl2 loss of function in mouse results in male and female sterility, respectively 14-54 16 , suggesting that in mammals, these two TBP-like proteins are involved in cell specific 55 transcription. While TBPL2 shares high degree of identity (92%) within the conserved saddle-56 shaped C-terminal DNA binding core domain of TBP 17 , the C-terminus of TBPL1 is more 57 distant with only 42% identity 12 . A consequence is that TBPL2, but not TBPL1, is able to 58 bind canonical TATA-box sequences in vitro 5,12,18 . The N-terminal domains of the three 59 vertebrate TBP related factors do not show any conservation. All three vertebrate TBP related 60 factors can interact with the GTFs TFIIA and TFIIB, and can mediate Pol II transcription 61 initiation in vitro 12,13,18-20 . However, how alternative initiation complexes form, how they 62 oocyte is the very high expression of retrotransposons driven by Pol II transcription. These 88 elements are interspersed repetitive elements that can be mobile in the genome. One of the 3 89 major classes of retrotransposons in mammals is the long terminal repeat (LTR) 90 retrotransposons derived from retroviruses, also known as endogenous retroviruses (ERVs) 91 that is subdivided in 3 sub-classes: ERV1, ERVK and endogenous retrovirus like ERVL-92 MaLR (mammalian apparent LTR retrotransposons) (reviewed in 26 ). Transcription of mobile 93 elements in specific cell types depends on the presence of a competent promoter recognition 94 transcription machinery and/or the epigenetic status of the loci where these elements have 95 been incorporated. Remarkably, MaLRs encode no known proteins, but MaLR-dependent 96 transcription is key in initiating synchronous developmentally regulated transcription to 97 reprogram the oocyte genome during growth 27 . 98 Remarkably, during oocyte growth TBP is absent and replaced by TBPL2 28 . Indeed, 99 TBP is only expressed up to the primordial follicular oocytes and becomes undetectable at all 100 subsequent stages during oocyte growth. In contrast, TBPL2 is highly expressed in the 101 growing oocytes, suggesting that TBPL2 is replacing TBP for its transcription initiating 102 functions during folliculogenesis 28 . In agreement with its oocyte-specific expression, a crucial 103 role of TBPL2 for oogenesis was demonstrated in Tbpl2 -/females, which show sterility due 104 to defect in secondary follicle production 16,29 . In the absence of TBPL2, immunofluorescent 105 staining experiments showed that elongating Pol II and histone H3K4me3 methylation signals 106 were abolished between the primary and secondary follicle stage oocytes suggesting that Pol 107 II transcription was impaired 16 . Initially TBPL2/TRF3 was suggested to be expressed during 108 muscle differentiation 30 , but this observation was later invalidated 16,29 . Altogether, the 109 available data suggested that TBPL2 is playing a specialized role during mouse oocyte 110 development. However, how does TBPL2 regulate oocyte-specific transcription and what is 111 the composition of associated transcription machinery, remained unknown. 112 Here we demonstrate that in oocytes TBPL2 does not assemble into a canonical TFIID 113 complex, while it stably associates with TFIIA. The observation that the oocyte specific 114 deletion of Taf7, a TFIID specific TAF, does not influence oocyte growth and maturation, 115 corroborates the lack of TFIID in growing oocytes. Our transcriptomics analyses in wild type 116 and Tbpl2 -/oocytes show that TBPL2 mediates transcription of oocyte-expressed genes, 117 including mRNA destabilisation factor genes, as well as MaLR ERVs. Our transcription start 118 site (TSS) mapping from wild-type and Tbpl2 -/growing oocytes demonstrates that TBPL2 119 has a strong preference for TATA-like motif in gene core promoters driving specific sharp 120 TSS selection. This is in marked contrast with TBP/TFIID-driven TATA-less gene promoters 121 in preceding stages that have broad TSS architecture. Our results show a role for the TBPL2-122 TFIIA transcription machinery in a major transition of the oocyte transcriptome mirroring the 123 maternal to zygotic transition that occurs after fertilization, completing a full germ line cycle. To characterize TBPL2-containing transcription complexes we prepared whole cell 130 extracts (WCE) from growing 14 days post-natal (P14) mouse ovaries and analysed TBPL2-131 associated proteins by anti-mTBPL2 immunoprecipitation (IP) coupled to label free mass 132 spectrometry (Fig. 1a, Supplementary Fig. 1a, b). To determine the stoichiometry of the 133 composition of the immunoprecipitated complexes, normalized spectral abundance factor 134 (NSAF) values were calculated 31 . In the anti-TBPL2 IPs we identified TFIIA-αβ and TFIIA-γ 135 subunits as unique GTF subunits associated with TBPL2 ( Fig. 1a, Supplementary Data 1). As 136 ovaries contain many other non-oocyte cell types which express TBP, in parallel from the 137 same extracts we carried out an anti-TBP IP. The mass spectrometry of the anti-TBP IP 138 indicated that TBP assembles into the canonical TFIID complex in non-oocyte cells (Fig. 1b,  139 Supplementary Data 2). As growing oocytes represent only a tiny minority of ovary cells, we 140 further tested the TBPL2-TFIIA interaction by a triple IP strategy (Fig. 1c): first, we depleted 141 TAF7-containing TFIID complexes with an anti-TAF7 IP, second, the remaining TFIID and 142 SAGA complexes, which contain also shared TAFs 32 , were depleted with an anti-TAF10 IP 143 using the anti-TAF7 IP flow-through as an input, third we performed an anti-TBPL2 IP on the 144 anti-TAF7/anti-TAF10 flow-through fraction ( Fig. 1d-f, Supplementary Data 3). The analysis 145 of this third consecutive IP further demonstrated that TBPL2 forms a unique complex with 146 TFIIA-αβ, and TFIIFA-γ, but without any TFIID subunits. 147 To further analyse the requirement of TFIID during oocyte growth, we carried out a 148 conditional depletion of TFIID-specific Taf7 gene during oocyte growth using the Zp3-Cre 149 transgenic line 33 (Supplementary Fig. 1c-g). Remarkably, TAF7 is only detected in the 150 cytoplasm of growing oocyte ( Supplementary Fig. 1c). The oocyte-specific deletion of Taf7 151 did not affect the presence of secondary and antral follicles and the numbers of collected 152 mature oocytes after superovulation (Fig. 1g, h, Supplementary Fig. 1f). The lack of 153 phenotype is not due to an inefficient deletion of Taf7, as TAF7 immunolocalization is 154 impaired ( Supplementary Fig. 1d, e), and as oocyte-specific Taf7 mutant females are severely 155 hypofertile ( Supplementary Fig. 1g). The observations that TBP is not expressed in growing 156 oocytes, and that the oocyte specific deletion of Taf7 abolishes the cytoplasmic localization of 157 TAF7, but does not influence oocyte growth, show that canonical TFIID does not assemble in 158 the nuclei of growing oocytes. Thus, our results together demonstrate that during oocyte 159 growth a stable TBPL2-TFIIA complex forms, and may function differently from 160

TBP/TFIID. 161
In order to further characterize the composition of the TBPL2-TFIIA complex, we 162 took advantage of NIH3T3 cells artificially overexpressing TBPL2 (NIH3T3-II10 cells 28 ). In 163 this context where TBP and TAFs are present, TFIID is efficiently pulled down by an anti-164 TBP IP, but no interaction with TFIIA could be detected (Fig. 2a). Interestingly, the anti-165 TBPL2 IP showed that the artificially expressed TBPL2 can incorporate in TFIID-like 166 complexes as TAFs were co-IP-ed (Fig. 2a), however with much lower stoichiometry (NSAF 167 values) than that of TBP (Fig. 2b). In contrast, strong interaction with TFIIA-αβ and TFIIFA-168 γ were detected, suggesting that the TBPL2-TFIIA complex can be formed in the NIH3T3-169 II10 cells and that TBPL2, to the contrary to TBP has the intrinsic ability to interact with 170 TFIIA. Remarkably, in spite of the high similarity between the core domains of TBP and 171 TBPL2, no interaction with SL1 (TAF1A-D) and TFIIIB (BRF1) complexes that are involved 172 in Pol I and Pol III transcription, respectively 34 ) could be detected in the anti-TBPL2 IP, to the 173 contrary to the anti-TBP IP in the NIH3T3-II10 (Fig. 2a, b)  To analyze whether TBPL2 associates with TFIID TAFs and TFIIA in the same 177 complex, we performed a gel filtration analysis of NIH3T3-II10 WCE. The profile indicated 178 that most of the TBPL2 and TFIIA could be found in the same fractions (22-26) eluting 179 around 150-200 kDa, while TBPL2 protein was below the detection threshold of the western 180 blot assay in the TAF6-containing fractions 9-15 (Fig. 2c). To verify that TBPL2 and TFIIA 181 are part of a same complex in fractions 22-26, we IP-ed TBPL2 from these pooled fractions 182 and subjected them to mass spectrometric analysis. Our data confirmed that in these fractions 183 eluting around 170 kDa, TBPL2 and TFIIA form a stable complex that does not contain any and that the main explanation for the variance is the genotype, and then the stage 198 ( Supplementary Fig. 2b). Comparison of the RNA level fold changes between mutant and WT 199 oocytes showed that in Tbpl2 -/-, there is a massive down-regulation of the most highly 200 expressed transcripts, both at P7 and P14 ( Supplementary Fig. 2c). The Pearson correlation 201 between the P7 and P14 fold change data sets for transcripts expressed above 100 normalized 202 reads was close to 0.8 ( Supplementary Fig. 2c), indicating that Tbpl2 loss of function 203 similarly altered RNA levels at P7 and P14 stages. We therefore focused on the P14 stage for 204 the rest of the study. 205 In WT P14 oocytes transcripts corresponding to 10791 genes were detected. 206 Importantly, many of these detected transcripts have been transcribed at earlier stages and are 207 stored in growing oocytes 37 . As there is no Pol II transcription in Tbpl2 -/growing oocytes 16 , 208 RNAs detected in the Tbpl2 -/mutant oocytes represent mRNAs transcribed by a TBP/TFIID-209 dependent mechanism and deposited into the growing oocytes independently of TBPL2 210 activity at earlier stages, i.e. at primordial follicular stage, where TBP is still expressed. The  Fig. 2d, e) strongly supports the latter hypothesis (but see also the next 217 paragraph). 218 Nevertheless, we detected 1802 significantly downregulated transcripts in the Tbpl2 -/-219 oocytes (Fig. 3c). Key genes known to be expressed during oocyte growth, such as Bmp15, 220 Eloc, Fgf8, Gdf9 and Zar1 35,36,39 , were confirmed by RT-qPCR to be down-regulated 221 ( Supplementary Fig. 2f, g). These results suggest that TBPL2 has an important role in gene 222 expression in the growing oocytes. Gene Ontology (GO) analyses of biological process of the 223 identified down regulated categories of genes (Supplementary Data 6) indicated that many 224 genes, involved in meiosis II and distinct cell cycle processes, were significantly down-225 regulated ( Supplementary Fig. 2h). The most enriched molecular function GO category was 226 "poly(A)-specific ribonuclease activity" containing many genes coding for factors or subunits 227 of complexes contributing to deadenylation/decapping/decay activity in eukaryotes (Fig. 3d) 228 (i.e. CCR4-NOT, PAN2/PAN3 40 ; DCP1A/DCP2 41 , or BTG4 39 ). In good agreement with the 229 transcriptome analyses, transcripts coding for these "poly(A)-specific ribonuclease activity" 230 factors were significantly down regulated in Tbpl2 -/mutant P14 oocytes when tested by RT-231 qPCR (Fig. 3e, Supplementary Fig. 2i). Thus, in P14 oocytes TBPL2 is regulating the 232 transcription of many genes coding for factors, which are in turn crucial in regulating the 233 stability and translation of the mRNA stock deposited during early oogenesis, as well as 234 transcription of meiosis II-and cell cycle-related genes to prepare the growing oocytes for the 235 upcoming meiotic cell division. 236 A remarkable feature of oocyte is the very high expression of retrotransposons driven 237 by Pol II transcription (see Introduction). As expected, in WT P7 and P14 oocytes the 238 expression of ERVs was found to be the most abundant 27,42 (Supplementary Fig. 3a-c). 239 Importantly, the transcription of the vast majority of MaLR elements was the most affected in 240 Tbpl2 -/mutant oocytes at P7 and P14 (Fig. 4). Among them, three highly expressed members, 241 MT-int, MTA_Mm, and MTA_Mm-int, were dramatically down-regulated in P7 and P14 242 Tbpl2 -/mutant oocytes ( Supplementary Fig. 3d, e). As in P14 oocytes TBPL2 depletion is 243 reducing transcription more than 4-fold from MaLR ERVs, which often serve as promoters 244 for neighbouring genes 27,42 , TBPL2 could seriously deregulate oocyte-specific transcription 245 and consequent genome activation. 246 Therefore, this is the first demonstration that TBPL2 is orchestrating the de novo 247 restructuration of the maternal transcriptome and that TBPL2 is crucial for indirectly silencing 248 the translation of the earlier deposited TBP-dependent transcripts. 249 250

TBPL2-driven promoters contain TATA box and are sharp 251
The promoter usage changes during zebrafish maternal to zygotic transition revealed 252 different rules of transcriptional initiation in oocyte and in embryo, driven by independent and 253 often overlapping sets of promoter "codes" 23 . Importantly, this switch has not yet been 254 demonstrated in mammals and the role of TBPL2 in this switch during oogenesis remained to 255 be investigated. To this end, we mapped the TSS usage by carrying out super-low input 256 carrier-CAGE (SLIC-CAGE) 43 from WT and Tbpl2 -/-P14 oocytes. To characterize only the 257 TBPL2-driven promoters, we removed the CAGE tags present in the Tbpl2 -/dataset from the 258 WT P14 dataset, to eliminate transcripts that have been deposited at earlier stages (hereafter 259 called "TBPL2-dependent"). Conversely, the Tbpl2 -/dataset corresponds to the TBP/TFIID-260 dependent, or TBPL2-independent TSSs (hereafter called "TBPL2-independent"). 261 Next, we analysed the genome-wide enrichment of T-and/or A-rich (WW) dinucleotide 262 motifs within the -250/+250 region centred on the dominant TSSs of the TBPL2-specific-only 263 and TBPL2-independent-only oocyte TSS clusters (Fig. 5a, b). TBPL2-dependent TSS 264 clusters are strongly enriched in a well-defined WW motif around their -30 bp region (Fig. 5a, 265 red arrowhead) 44 . In contrast, only about 1/3 rd of the TBPL2-independent TSS clusters 266 contained WW-enriched motifs at a similar position (Fig. 5b, red arrowhead), as would be 267 expected from promoters that lack maternal promoter code determinants 23,44 . As canonical 268 TATA boxes are often associated with tissue-specific gene promoters, we investigated 269 whether the above observed WW motif densities correspond to TATA boxes using the TBP analysis observations (Fig. 3). 298 Importantly, TSS architecture analyses of the TBPL2-dependent MaLR ERV TSSs 299 indicated that the majority of MaLR core promoters contain high quality TATA box motif 300 (median of the TATA box PWM match is 85%, Fig. 5h-j). These observations together 301 demonstrate that the TBPL2/TFIIA complex drives transcription initiation primarily from 302 core promoters that contain a TATA box-like motif in their core promoter and directs sharp 303 transcription initiation from the corresponding promoter regions to overhaul the growing 304 oocyte transcriptome. 305 In addition, we observed that TSS usage can shift within the promoter of individual 306 genes depending on the genetic background ( Supplementary Fig. 4b). To get more insights in 307 these promoter architecture differences, we identified genome-wide 6429 shifting promoters 308 by comparing either TBPL2-dependent to TBPL2-independent TSS data. These results are 309 consistent with TSS shifts between somatic and maternal promoter codes occurring either in 310 5', or 3' directions ( Fig. 6a, Supplementary Fig. 4q) 44 . WW motif analysis indicated that on 311 each shifting promoter, TBPL2-dependent dominant TSSs are associated with WW motifs, 312 while TBPL2-independent dominant TSSs are not (Fig. 6b). In addition, the TATA box PWM 313 match analyses indicated that these WW motifs are enriched in TATA box like elements 314 compared to the corresponding TBPL2-independent shifting TSSs (Fig. 6c). Thus, our 315 experiments provide a direct demonstration that TBP/TFIID and TBPL2/TFIIA machineries 316 recognize two distinct sequences co-existing in promoters of the same genes with TBPL2 317 directing a stronger WW/TATA box-dependent sharp TSS selection in them. 318 319 320

Discussion 321
In this study, we showed that a unique basal transcription machinery composed of 322 TBPL2 associated with TFIIA is controlling transcription initiation during oocyte growth, 323 orchestrating a transcriptome change prior to fertilization using an oocyte-specific TTS usage. 324 TBPL2 expression in mouse is limited to the oocytes and in its absence, oocytes fail to 325 grow and Tbpl2 -/mouse females are sterile 16,28 . In a mirroring situation, TBPL1 (TRF2) 326 expression is enriched during spermatogenesis and male germ cells lacking TBPL1 are 327 blocked between the transition from late round spermatids to early elongating spermatids 14,15 . 328 An interesting parallel between TBPL2 and TBPL1 is that both TBP-type factors form TBP are co-expressed in spermatids 46,47 and it has been suggested that TBPL1 is a testis-337 specific subunit of TFIIA that is recruited to PIC containing TFIID and might not primarily 338 act independently of TFIID/TBP to control gene expression in round spermatids 48  result the canonical TFIID, or its building blocks, cannot be assembled, and as a result the 359 canonical TFIID is not present in the nuclei of growing oocytes. Another reason why TBPL2 360 does not interact with TAFs, or ALF, but rather interacts with TFIIA could be its N-terminal 361 domain that is very different from TBP (only 23% identity 51 ). 362 TBPL2 proteins from different vertebrates show a high degree of similarity in their C-363 terminal core domains amongst themselves; but display very little conservation in their N-364 terminal domains 12 . It is interesting to note that TBPL2 deficiency leads to an embryonic 365 phenotypes in Xenopus 13 and zebrafish 12 , because, contrary to the mouse, TBPL2 is still 366 present in the embryo after fertilization and thus may act in parallel with TBP in the 367 transcription of specific embryonic genes 10,54 . The molecular mechanism by which TBPL2 368 controls the transcription of these specific sets of genes in frogs and in fish has not been 369 studied. In contrary, TBPL2 in mammals are only expressed in growing oocytes and the only 370 phenotype that can be observed in mammals is female sterility 16,29 . 371 LTR retrotransposons, also known as ERVs, constitute ~10% of the mouse genome 372 TBPL2 is regulating the activity of several deadenylation/decapping/decay complexes and in 390 the absence of TBPL2, we observed apparent stabilization of a significant number of 391 transcripts, suggesting that in wild type oocytes TBPL2 is indirectly inhibiting the translation 392 of mRNAs, and/or inducing the degradation of the mRNAs, previously deposited by 393 TFIID/TBP in the primordial follicular oocytes (Fig. 7). To put in place the growing oocyte 394 specific maternal transcriptome TBPL2 is controlling the production of new mRNAs using a 395 maternal specific TSS grammar, as most of these transcripts will remain in the oocyte after 396 transcriptional quiescence. Remarkably, as TBPL2 does not interact with Pol I and Pol III 397 transcription machineries in the growing oocytes, this strongly suggest that rRNA and tRNA 398 are deposited very early during oogenesis in amounts sufficient for the initiation of 399 development. 400 Therefore, it seems that TBPL2 contributes to establish a novel TBPL2-dependent 401 growing oocyte transcriptome and consequent proteome required for further development and 402 oocyte competence for fertilization (Fig. 7). The indirect regulation of previously deposited  Reads were preprocessed in order to remove adapter, polyA and low-quality sequences (Phred 556 quality score below 20). After this preprocessing, reads shorter than 40 bases were discarded 557 for further analysis. These preprocessing steps were performed using cutadapt version 1.10 61 . 558 Reads were mapped to spike sequences using bowtie version 2.2.8 62 , and reads mapping to 559 spike sequences were removed for further analysis. Reads were then mapped onto the mm10 560 assembly of Mus musculus genome using STAR version 2.7.0f 63 . Gene expression 561 quantification was performed from uniquely aligned reads using htseq-count version 0.9.1 64 , 562 with annotations from Ensembl version 96 and "union" mode. Read counts were normalized 563 across samples with the median-of-ratios method, to make these counts comparable between 564 samples and differential gene analysis were performed using the DESeq2 version 1.22.2 65 . All 565 the figures were generated using R software version 3.5.3. 566 567

RT-qPCR 568
Complementary DNA was prepared using random hexamer oligonucleotides and SuperScript

SLIC-CAGE analyses 582
Twenty-eight and 13 ng of total RNA isolated from P14 oocytes (biological replicate 1 and 583 replicate 2, approximately 500-1000 oocytes pooled for each replicate) and 15 ng of total 584 RNA isolated from P14 Tbpl2 -/mutant oocytes (approximately 550 pooled oocytes) were 585 used for SLIC-CAGE TSS mapping 43 . Briefly, 5 µg of the carrier RNA mix were added to 586 each sample prior to reverse transcription, followed by the cap-trapping steps designed to 587 isolate capped RNA polymerase II transcripts. The carrier was degraded from the final library 588 prior to sequencing using homing endonucleases. The target library derived from the oocyte 589 RNA polymerase II transcripts was PCR-amplified (15 cycles for P14 WT, 16 cycles for P14 590 Tbpl2 -/mutant) and purified using AMPure beads (Beckman Coulter) to remove short PCR 591 artifacts (< 200bp, size selection using 0.8 x AMPure beads to sample ratio). The libraries 592 were sequenced using HiSeq2500 Illumina platform in single-end, 50 bp mode (Genomics 593 Facility, MRC, LMS). 594 Sequenced SLIC-CAGE reads were mapped to the reference M. musculus genome (mm10 595 assembly) using the Bowtie2 62 with parameters that allow zero mismatches per seed sequence 596