Intertwined canonical and non-canonical initiation in dual promoters are pervasive and differentially regulate Polymerase II transcription

The diversity and complexity of transcription start site (TSS) selection reflects variation of preinitiation complexes, divergent function of promoter-binding proteins and underlies not only transcriptional dynamics but may also impact on post-transcriptional fates of RNAs. The majority of metazoan genes are transcribed by RNA polymerase II from a canonical initiation motif having an YR dinucleotide at their TSSs. In contrast, translation machinery-associated genes carry promoters with polypyrimidine initiator (known as 5’-TOP or TCT) with cytosine replacing the R nucleotide. The functional significance of start site choice in promoter architectures is little understood. To get insight into the developmental regulation of start site selection we profiled 5’ ends of transcripts during zebrafish embryogenesis. We uncovered a novel class of dual-initiation (DI) promoters utilized by thousands of genes. In DI promoters non-canonical YC-initiation representing 5’-TOP/TCT initiators is intertwined with canonical YR-initiation. During maternal to zygotic transition, the two initiation types are divergently used in hundreds of DI promoters, demonstrating that the two initiation systems are distinctly regulated. We show via the example of snoRNA host genes and translation interference experiments that dual-initiation from shared promoters can lead to divergent spatio-temporal expression dynamics generating distinct sets of RNAs with different post-transcriptional fates. Thus utilization of DI promoters in large number of genes suggests two transcription initiation mechanisms targeting these promoters. DI promoters are conserved within human and fruit fly and reflect an evolutionary conserved mechanism for switching transcription initiation to adapt to the changing developmental context. Thus, our findings highlight a novel level of complexity of core promoter regulation in metazoans and broaden the scope for identification and characterization of alternative RNA products generated at shared core promoters.

Transcription is a tightly regulated process initiated by RNA polymerase II (Pol II) 2 in the core promoter region, which is typically -40 to +40 nucleotides with respect to 3 transcription start sites (TSS). There are no universal core promoter elements 1 as they are 4 diverse in their sequence and functions, and the structure-function relationship of core 5 promoters remains poorly understood. Sequencing of capped RNA 5' ends by CAGE (cap-6 analysis of gene expression) revealed that an overwhelming majority of TSSs are 7 anchored by a purine base at the start site (+1 position) and flanked by pyrimidine in the 8 upstream region (-1 position), thus defining consensus Y-1R+1 (hereafter called YR-9 initiation) as canonical initiator in mammals 2 and in teleosts (zebrafish and tetraodon) 3 , 10 suggesting generality of conserved initiator among vertebrates. Analysis of core 11 promoters in Drosophila melanogaster (invertebrates) revealed a related but more motif-12 like TC-1A+1GT initiator sequence 4,5 . In contrast, transcription initiation of translation-13 associated genes (ribosomal proteins, snoRNA host genes, translation initiation and 14 elongation factors) is anchored by C+1 (cytosine) and flanked by a polypyrimidine stretch 6-15 11 (hereafter called YC-initiation). These non-canonical initiators have previously been 16 termed 5'-TOP (terminal oligo-polypyrimidine) in mammalian systems or TCT initiators 17 in Drosophila 12 and these YC initiation-dependent genes were shown to be conserved in 18 zebrafish 3 . Drosophila ribosomal protein genes with TCT promoters are recognized by a 19 TFIID-independent transcription initiation mechanism and mediated by the TATA-20 binding protein (TBP) family member TBP-related factor 2 (TRF2), but not TBP 13 . These 21 results suggest that the non-canonical initiation is specialized for a subset of genes and 22 facilitates a non-canonical initiation complex formation with distinct proteins from that of 23 TBP and TFIID and likely reflecting distinct regulation of transcription initiation 14 . 24 However, it is unknown, why such a non-canonical initiation has evolved and has been 25 maintained in evolutionary distant species. Important insight into potential functional 26 significance of the non-canonical initiation is emerging from studies investigating target 27 genes of mTOR pathways that are translationally regulated 15,16 , and are enriched in 5'-28 TOP/TCT initiator. The 5'-TOP initiator is defined by a minimum of 4-15 pyrimidine 29 sequences 17 . The polypyrimidine stretch proximal to the 5' end of these genes is a target 30 for translation regulation and has been suggested to serve as a target mechanism for 31 oxidative and metabolic stress or cancer-induced differential translational regulation by 32 the mTOR pathway 15,16,[18][19][20] . The existence of 5'-TOP/TCT promoters raises the questions 33 of how widespread non-canonical initiation is and what is its relationship with canonical 34 initiation. 35 Nepal.et.al 4 We have generated CAGE datasets 3 in zebrafish and profiled all transcription 1 initiators during embryogenesis from the maternal to zygotic transition (MZT) and then 2 through organogenesis. We have extended the detection of YC-initiation in zebrafish to 3 thousands of genes, and made constellation observation of pervasive co-occurrence of YR-4 initiation and YC-initiation events in shared core promoter. We performed a 5 comprehensive and unbiased analysis of TSSs in promoters and characterized the 6 features and roles of non-canonical initiation by a systematic survey of the base 7 composition within the TSSs in CAGE datasets 3 . This analysis led us to uncover non-8 canonical YC-initiation in thousands of genes that are proximal to or intertwined with the 9 canonical YR-initiation in the same core promoter region, thus revealing thousands of 10 what we term dual-initiation (DI) promoter genes. We provide multiple lines of evidence 11 for the functional relevance of dual-initiation, such as sequence composition, differential 12 usage of initiators during development, differential response of initiators during 13 translation inhibition and selective association of snoRNA biogenesis, which is predicted 14 to be processed by splicing from introns of the YC-initiation products of dual-initiation 15 genes. We thus demonstrate that the two initiation types within dual promoters represent To comprehensively map non-canonical initiation events at single nucleotide 4 resolution, we analyzed the start base distribution of (m)RNA 5' ends by pooling CAGE 5 Transcription Start Sites (CTSSs) with at least 1 tag per million (TPM) detectable across 6 12 stages during zebrafish embryo development 3 (Figure 1a). Majority of CTSSs (71.6%) 7 have canonical (Y-1R+1) start sites (Figure 1a; Supplementary Figure 1a). The remaining 8 CTSSs have been excluded from further analysis as they include RNA start sites with a 9 well-characterized GG dinucleotide associated with post-transcriptional processing 10 products independent from transcription initiation 3 and therefore do not reflect true 11 transcription start sites. Furthermore, we have excluded CAGE signals which represent 12 Drosha-processing sites on pre-miRNAs 21 Table 1). Intersection analysis of gene promoters revealed 23 that 50 (1.19%) genes carry only YC-initiation and 7905 (65.5%) genes have only YR-24 initiation, thus regulated by a single type of initiator. However, the majority of YC-25 initiation site-containing promoters (98.81%) also carry YR-initiation sites (Figure 1a; 26 Venn diagram). This novel class of promoters have collectively called dual-initiation (DI) 27 promoters (Figure 1b). The DI promoters identified by CAGE were also confirmed by 28 independent analysis of capped mRNA sequencing at prim 5 stage of development (24h 29 post fertilization), which, though less sensitive than CAGE, has demonstrated hundreds of 30 cases of dual-initiation events and demonstrated statistically significant overlap with 31 CAGE detected dual-initiation promoter genes (Supplementary Figure 1c). 32 For all dual-initiation promoter genes, we summed the expression levels of all YR 33 and YC components and genes were classified as either YR-dominant or YC-dominant 34 depending upon the TPM levels of their YR and YC components. The exemplified sumo2b 35 Nepal.et.al 6 gene (Figure 1b) has a higher total level of YR-initiations than YC-initiation, thus 1 classified as a YR-dominant gene. We then used the highest expression level of YR and YC 2 CTSSs and determined the position of dominantly used YR and YC TSS. The YR-dominant 3 TSS is located 4 nucleotides downstream to the YC-dominant TSS in the exemplified 4 sumo2b gene (Figure 1b). The distance between dominant YR-initiation and YC-initiation 5 of all DI promoters at prim 5 stage fall mostly within 30 bases and there is some degree of 6 preference for YC 1 nt upstream of YR (Figure 1c). This close proximity between the two 7 types of initiations suggest that the initiation machinery or machineries involved in 8 controlling transcription of these transcripts recognize the same core promoter region. 9 Comparing the expression levels of YR and YC components revealed that the contribution 10 of YC-initiation to the total activity of dual-initiation promoters tends to be relatively 11 small (Figure 1d; Supplementary Figure 1d), resulting only in a small portion (8.25%; 12 n=251) of genes as YC-dominant in prim 5 stage (Figure 1d). This observation may 13 explain why the non-canonical YC-initiation events largely have been missed in previous 14 studies, which focused on the single dominant TSSs. However, YC-initiation can be 15 dominant over YR-initiation in individual genes even at lowly expressed promoters 16 (Figure 1d; Supplementary Figure 1d). In conclusion, we show that non-canonical YC-17 initiation events are pervasively intertwined with canonical YR-initiation and occur 18 within a small physical distance within the same core promoter regions. 19 Features of dual-initiation gene promoters 20 Translational-associated genes such as ribosomal proteins, translation 21 initiation/elongation factors and small nucleolar RNA (snoRNA) host genes are 22 transcribed by 5'-TOP/TCT initiators, thus we asked whether their zebrafish homologs 23 possess single or dual-initiation. The annotation of zebrafish snoRNAs is not 24 comprehensive, therefore we analyzed a size selected RNA library 23 enriched for full-25 length snoRNA length (18-250 nt) and annotated 176 novel zebrafish snoRNAs 26 (Supplementary Table 2). Intersection of the expressed genes from the above listed 27 gene-families revealed that most of these genes carry dual-initiation sites (Figure 2a). 28 Gene ontology (GO) analysis of DI promoter genes revealed an enrichment of translation 29 machinery components (translation, translation elongation and translation termination), 30 co-translational proteins targeting to membrane, RNA stability and nonsense mediated 31 decay (Figure 2b; Supplementary Table 3). Enrichment of ribosome-related functions is 32 consistent with previous studies describing YC-initiation 17,24 , associated with such genes 33 while our findings reveal a novel, dual-initiation featuring these promoters (Figure 2a). 34 Excluding translation-associated genes from the query list revealed an enrichment of 35 additional unexpected GO terms such as mRNA splicing via spliceosome, telomerase RNA 36 localization, chromosome organization and mitotic cell cycle (Figure 2b; Supplementary 1 Table 3). In contrast, YR-only initiator genes are enriched for GO terms related to 2 morphogenesis, pattern specification and embryonic development (Figure 2b) 3 characteristic of the prim 5 stage of development and highlight the functional distinction 4 of core promoter architectures. 5 Sequence composition around (10 nucleotides) dominant TSSs of both initiation 6 sites revealed higher fraction of pyrimidine sequence adjacent to the YC-initiation ( Figure  7 2c), predominantly with an uninterrupted stretch of at least 4 pyrimidines (Figure 2d), a 8 characteristic feature of the 5'-TOP motif (reviewed in 17 ). We find that the longer an 9 uninterrupted pyrimidine stretch around YC-initiation, the higher is the expression level 10 of dominant YC CTSSs (Figure 2e). Translation-associated genes have a longer stretch of 11 pyrimidine sequence (Supplementary Figure 2a), which is in agreement with the 12 stringent definition of translationally regulated 5'-TOP mRNAs 15 . Dual-initiation promoter 13 genes have shorter 5'-UTR length as compared to single initiation YR promoters ( Figure  14 2f), which may reflect efficient translation as transcripts with longer 5'UTR tend to have 15 lower translational efficiency 25 . 16 Next, we sought to define the promoter features of YR-components and YC-17 components of dual-initiation promoters. CAGE defined TSSs have revealed 3 main classes 18 of promoter shapes, namely broad peak, sharp peak and bimodal peaks 2 , and 5'-TOP/TCT 19 promoters were primarily associated with sharp peak promoters of highly expressed 20 genes 1 . To explore features of promoter shapes of dual-initiation genes, we first calculated 21 the number of CTSSs and observed that dual-initiation genes have higher number of YR-22 initiation sites (an average of 6 CTSSs) as compared to their YC constituent (an average of 23 2 CTSSs) or compared to the YR-only genes (an average of 3 CTSSs) (Figure 2g). 24 Accordingly, YR component of dual-initiation promoters is typically defined by a broad 25 peak, while YC-initiation events appear mostly sharp (Figure 2h). We then asked if 26 positionally constrained motifs characteristic of known promoter architectures can be 27 assigned to either YC and YR-initiation events in DI promoters. We have plotted YR, YY, 28 SS, WW (Y=C/T; R=A/G; S=C/G; W=A/T) dinucleotides and positionally constrained 29 motifs (TATA box, GC box and CCAT motif) with respect to YR and YC-initiation events at 30 fertilized egg and at prim 5 stage. The WW dinucleotide (W-box motif) present in most 31 promoters in zebrafish 26 is enriched in both initiators in the fertilized egg but depleted in 32 prim 5 stage (Supplementary Figure 2b,c). The finding that YC-initiation is associated 33 with positionally constrained motif previously described for YR-initiation supports YC-34 initiation detection as indicator of promoter function. Moreover, we have detected similar 35 developmental utilization of sequence determinants of YC transcription start site choice 36 to that previously described for YR-initiation 26 . However, TATA box, CCAT motif and GC 1 box were not enriched with either initiation events in both stages (Supplementary 2 Figure 2b-c). Thus, we conclude that YR-initiations peaks of dual-initiation genes are 3 generally broad, while YC-initiations are sharp, however these differences are not 4 reflected in observable differences in the frequency of positionally constrained motifs. 5 Taken together, our results collectively demonstrate the pervasive nature of YC-initiation 6 in the genome which is characteristic not only to translation-associated genes but to 7 previously unappreciated GO categories and often feature TOP promoter-like pyrimidine 8 stretches. These observations suggest that the DI promoter is a novel promoter 9 classification category widely used in the zebrafish genome and which appears to be a 10 composite of canonical and 5'-TOP/TCT promoter features. 11 12 Nepal.et.al 9

Differential regulation of YC-initiations and YR-initiations in DI promoters during
We have previously shown that two distinct and independently regulated promoter 3 sequence codes such as the W-box and +1 nucleosome positioning signals are often 4 overlaid in individual promoters and used differentially during the maternal to zygotic 5 transition of embryo development 26 . The existence of such overlapping sequence codes, 6 together with the observation that TCT promoters and canonical initiator may be 7 regulated by different initiation complexes 12,13 prompted us to hypothesize that 8 intertwined YR-initiation and YC-initiation events may represent differential regulatory 9 principles. Thus divergent regulatory inputs may target dual-initiation promoters, and 10 lead to divergent transcriptional regulation during embryo development. Therefore, we 11 asked about the relationship between the expression dynamics of YR-initiation and YC-12 initiation during early embryo development. We performed self-organizing map (SOM) 13 clustering between YR and YC expression levels for each gene, and observed the typical 14 zebrafish developmental expression profiles, characterized by two opposing trends. A 15 typical maternal dominant trend includes mRNA expression at early stages originating 16 from the oocyte, which is removed by RNA degradation after zygotic genome activation 17 manifesting as loss of expression typically after 6 th to 9 th stages (Supplementary Figure  18 3a, e.g. panels of first column). An opposite zygotic dominant trend features low or no 19 maternal activity followed by the zygotic activation, which also appears as an increase in 20 expression after the 6 th to 9 th stages of 12 stages analyzed. Additional trends variations in 21 maternal to zygotic activity of YC and YR have also been detected (Supplementary 22  (Figure 3d). Genes in the top cluster predominantly use YR-initiation during maternal stages, in contrast YC-initiation gets dramatically 1 upregulated at the zygotic genome activation after the mid blastula transition ( Figure  2 3c,d). This trend is demonstrated by translation elongation factor (eef1g) gene promoter 3 (Figure 3e), the human homolog of which is transcribed by a non-canonical YC-type 4 initiator 17 . The other negatively co-regulated cluster (bottom cluster in Figure 3c) is 5 primarily driven by YC-initiation in maternal stages and by increased YR-initiation in 6 zygotic stages (Figure 3d), as exemplified by the initiation profile of the psmd6 gene 7 (Figure 3f). These results indicate that YR-initiation and YC-initiation are widely used in 8 development and not specific to maternal or zygotic stages. However, they are selectively 9 used for individual genes, which suggests that these genes can respond to differential 10 regulatory inputs. Taken together, the expression dynamics within these 312 dual-11 initiation promoters indicate independent regulation of YR-initiation and YC-initiation 12 components, which is markedly apparent during the dramatic overhaul of the 13 transcriptome at the MZT. 14 have an additional function in maintaining nodal signaling 30 . In contrast to previous 20 studies in mammals that described snoRNA host genes being transcribed by YC-initiation 21 (5'-TOP/TCT), we showed that zebrafish snoRNA host genes are characterized by dual-22 initiation (Figure 2a). These observations raise the question, whether the dual function of 23 snoRNA host genes is decoupled by YR-initiation and YC-initiation and whether the two 24 initiation events contribute selectively to distinct RNA fates. Indeed, it was previously 25

YC component of dual-initiation promoter genes regulates snoRNA expression
shown that a 5'-TOP promoter element determines the specific ratio of snoRNA to mRNA 26 production and an artificial canonical YR-initiation containing Pol II promoter is 27 incompatible with the efficient release of snoRNA 11 . The dramatic transition of maternal 28 and zygotic transcriptomes and the uncovered differential regulation of YC-initiation and 29 YR-initiation at MZT provides an opportunity to address whether YR and YC components 30 of snoRNA host genes are differentially regulated. We thus hypothesized that potentially 31 divergent expression dynamics of YR and YC derived transcripts during MZT could be 32 informative to separate 5' end of the source RNA for embedded snoRNA genes in dual-33 initiation promoter host genes. To this end, we plotted the expression levels of both YR 34 and YC components of 97 snoRNA host genes (containing 249 snoRNAs) and the expression of snoRNAs 23 at the corresponding developmental stages (Supplementary 1 Figure 4a). The majority of snoRNA host genes are maternally deposited, and both YR 2 and YC activity as well as snoRNA expression are generally increased after activation of 3 zygotic transcription (Supplementary Figure 4a). Correlation of expression levels of 4 snoRNAs with YR and YC components revealed stronger correlation of the YC component 5 with the temporal dynamics of snoRNAs (Figure 4a), suggesting YC-initiation to be the 6 likely source for snoRNA host RNA species. 7 To further investigate the observed correlation between snoRNA expression with 8 YC-initiation, we selected two host genes (kansl2 and nop53) whose overall expression 9 levels are comparable but have varying levels of YR and YC components. The snoRNA host 10 gene kansl2 has a dominant YR-initiation and a minor YC-initiation, while its snoRNA 11 expression levels is low throughout development (Figure 4b). On the other hand, the 12 The above results suggest that snoRNA host RNAs may be divergently expressed. 3 However, their temporal expression dynamics may not reveal the full extent of 4 differential RNA regulation which emerge from dual-initiation promoter genes. Therefore, 5 we investigated the spatial expression patterns of two newly annotated snoRNAs 6 (Supplementary Table 2 that besides the expected differential subcellular localization of host gene products and 33 embedded snoRNAs they are also activated in partially overlapping domains of the 34 embryo, which is consistent with potential divergence in transcriptional regulation of 35 products from the same core promoter.

Differential fates of YR-initiation and YC-initiation products during translation
SnoRNA host genes are selectively subjected to nonsense mediated decay (NMD), 3 shown by blocking NMD with translation inhibitor cycloheximide, which led to 4 stabilization of several (UHG and GAS5) 6,32 , but not all (e.g. U17HG 7 , U87HG 33 , rpS16 6 ) 5 snoRNA host genes. This result suggests differential stabilization of host RNAs due to 6 differential association of snoRNA host mRNAs with translating ribosomes 7 . We asked 7 whether dual-initiation promoter genes are subjected to differential post-  Dual-initiation promoter genes are conserved across metazoans 7 Finally, we asked whether DI promoters observed in zebrafish are present among 8 other metazoans. We first re-analyzed transcription initiation of the human snoRNA host 9 gene GAS5 that is transcribed by a 5'-TOP promoter 6 Table 6). Furthermore intersection of human and zebrafish 23 orthologous DI promoter genes revealed that 1171 (38.46%) genes share the DI promoter 24 feature indicating high degree of conservation of DI promoters among vertebrates. Gene 25 ontology analysis of DI promoter genes in human has revealed enrichment for translation 26 regulation, mRNA stability, and RNA splicing in human (Figure 7d) similar to that in 27 zebrafish (Figure 2b) and suggesting that what were previously described as 5'-TOP/TCT 28 promoters, are better described as DI promoters in several cell types both in human and 29 Drosophila and argues for redefining non-canonical initiator promoters in these 30 metazoans. 31 We next sought to compare sequence content, analyze expression levels and 32 promoter width of dual-initiation promoters in human and Drosophila. In both species, DI 33 promoters have higher C+T content around the TSS as compared to YR-only promoters 34 but lower than YC-only promoters (Figure 7e), similar to observations in zebrafish (Figure 2c). Dual-initiation promoters are highly expressed compared to YR-only and YC-1 only initiation promoters, which appears to be a shared feature among all three species 2 (Figure 7f; Figure 1d). Dual-initiation promoters have higher number of CTSSs, resulting 3 in broad promoter shapes, whereas the YC component shows sharp peaks similar to 4 zebrafish (Figure 7g compare to Figure 2g). The UCSC browser view of the orthologous 5 ribosomal protein genes RPL38 shows a similar intertwining of YR and YC-initiation 6 events across all three species (Figure 7h). Taken together, the above results 7 demonstrate that DI promoters are pervasive and an evolutionary ancient phenomenon 8 characteristic to distant clade with highly conserved promoter architecture and 9 expression features shared among metazoans and highlight the importance of this novel 10 promoter structure organization in divergent animal systems.

1
In this study, we demonstrate the pervasive nature of non-canonical transcription 2 initiation intertwined with canonical initiation within the core promoter of thousands of 3 genes in zebrafish development. Thus YC-initiation is utilized by a much larger set of 4 genes than previously reported, which was limited to components of translational 5 machinery 6,7,12,17 , and characterized as 5'-TOP/TCT initiators. This dual-initiation 6 arrangement represents a novel composite promoter architecture, which encompasses 7 two sets of targets for transcription initiation in individual promoters. By exploiting the 8 dramatic switch of the embryo transcriptome during the maternal zygotic transition, we 9 show that two initiations are uncoupled from each other during this transition, 10 demonstrating the differential use as well as evidence for lack of interdependence 11 between them in many genes. The apparent independent regulation of initiation site 12 selection in dual promoters during the MZT argues for two initiation mechanisms acting 13 both in the oocyte and the early embryo. However, their use is not selective to ontogenic 14 state, instead it appears to alternate among promoters. The remarkable overlap of 15 transcription initiation mechanisms in the same promoter regions suggest that promoters 16 of dual-initiation genes may respond in more than one ways to regulatory inputs acting in 17 different ontogenic contexts, such as the maternal to zygotic transition (Figure 8a). 18 We provide evidence that zebrafish snoRNA host genes are transcribed from YC-19 initiation similar to other model systems 6,7 . However, we also observe that snoRNA host 20 genes also carry canonical YR-initiation not only in zebrafish but also in mammalian cells. 21 While short read sequencing used either in CAGE and RNA-seq is not suitable to directly 22 trace YR-and YC-specific full length RNAs and thus, unequivocally uncouple the post-23 transcriptionally generated secondary RNA products from two initiation sites. 24 Nevertheless, we show an association of YC-initiation with snoRNA generation by 25 expression correlation analysis of initiation usage. Our results are in agreement with a 26 previous study, which demonstrate that experimentally replacing YC-initiation (5'-TOP) 27 snoRNA promoter with a YR-initiation site reduce snoRNA production 11 . Taken together,  applied by dual promoters, could substantially impact on the a yet unexplored additional 3 layer of diversity of RNAs produced from genes. We hypothesize, that the expansion of 4 utilization of a non-canonical initiation to a wide range of genes could indicate a general 5 transcription regulation paradigm, which represents adaptation to differential regulation 6 of a variety of promoters 15,18 . Dual-initiation promoter genes are highly expressed 7 compared to other genes (Figure 1d; Figure 7f), which is not specific to the contributing 8 YC components, as expression levels of the corresponding YR component alone is also 9 higher than that of YR-only or YC-only initiator genes. This observation either suggest that 10 sharing two alternative initiation mechanisms leads to boost of expression levels or 11 suggest that YC-initiation might be evolutionary co-opted in highly expressed genes. It is 12 interesting to note that the efficiency of transcription correlates positively with 13 translation efficiency and raises the possibility that highly expressed DI promoters 14 contribute to coordination between transcription and translation 36 . The enrichment of 15 translation and RNA regulation related gene ontology terms in DI promoter genes, along 16 with notable absence of developmental regulator genes, raises the question of why and 17 how this promoter architecture evolved. Important insight into potential functional 18 significance of the non-canonical initiation comes from studies on target genes of the 19 mTOR pathway that are translationally regulated 15,16 , and are enriched in 5'-TOP/TCT 20 initiator. Polypyrimidine proximal to 5' end of these genes is a target for translation 21 regulation and has been suggested to serve as a targeting mechanism for oxidative and 22 metabolic stress or cancer induced differential translation regulation by the mTOR 23 pathway 15,16,[18][19][20]37  RNAs reflects such a dual regulatory function. Dual-initiation promoters offer the 29 potential for linking translational regulation to transcriptional regulation in a large range 30 of genes and thus increase the repertoire of genes that may respond to such signals. In 31 this study we have identified many genes, which carry low level of YC-initiation events, 32 which may reflect a non-induced ground state for YC regulation. However there was a 33 notable correlation between the length of polypyrimidine stretch at the 5' end and the 34 expression level of YC (Figure 2e). It is not yet possible to distinguish in the CAGE dataset 35 whether this correlation reflects RNA stability or transcriptional differences.
Nevertheless, an unanswered question remains, whether the polypyrimidine stretch at 1 the 5'-end is required for selective translation factor binding such as eIF4F complex or 2 also represent distinct transcription regulatory signals acting at the transcription 3 initiation level. 4 The current definition of 5'-TOP mRNA includes a stretch of minimally 4 to 13 5 pyrimidine 17 based on observations restricted to translational-associated genes 17 , which 6 have longer pyrimidine stretch also in zebrafish (Supplementary Figure 2d). This 7 definition has been suggested to be potentially too stringent, as translationally regulated 8 genes revealed by ribosome profiling are enriched in transcription initiation with "C" and 9 carry only a short pyrimidine stretch 15,16 . We used a threshold of 1 TPM and identified 10 thousands of YC-initiation sites and thus expanded the pool of genes, which ought to be 11 considered when transcriptomic responses to metabolic stress for example via the mTOR 12 pathway are sought and our results argue for the need for discriminating RNAs produced 13

1
Zebrafish CAGE data after cycloheximide treatment 2 We generated zebrafish CAGE data for translation inhibition experiment. Zebrafish 3 embryos were treated with 100 µg/ml cycloheximide (Sigma-Aldrich) or 0.1 % DMSO as 4 control for 2 hours, starting at 22 hours post fertilization (hpf). Total RNA was extracted 5 from the control and treatment groups at 24 hpf using TRIzol (Invitrogen/ThermoFisher) 6 following the manufacturer's instructions and used for CAGE libraries preparation as 7 described before 3 , except for the use of oligo-dT primer instead of random primers in the 8 first strand synthesis step. CAGE libraries were sequenced on Illumina MiSeq system. 9   43 . We allowed two mismatches and only unique mapping reads were retained. 1

RNA sequencing of capped RNAs
Mapped reads having a "G" mismatch in the first nucleotide was corrected and 2 transcription start site was corrected accordingly. 3 Annotation of zebrafish snoRNAs from size selected small RNA reads 20 Size selected (18-350 nucleotide) zebrafish small RNA-seq data was downloaded 21 from public dataset 23 . Adapters were filtered, and mapped sequence reads to zebrafish 22 genome (zv9) using bowtie2 43 . Sequence reads were first mapped to ribosomal RNAs 23 (rRNAs) and excluded those mapping to rRNAs. Unmapped reads were then remapped to 24 genome by allowing up to four multi mappings reads. To ensure that snoRNAs are 25 annotated from mapped reads that resemble the expected full-length of snoRNAs, we 26 retained only those mapped reads that longer than 50 nucleotides and potentially 27 represent full-length snoRNAs rather than small RNA fragments. SnoRNAs were 28 annotated by using four different tools, namely Infernal 44 , snoReport 45 , snoGPS 46  Over-represented GO terms were corrected for multiple testing with the Benjamini-7

Downstream analysis of CAGE data
Hochberg false discovery rate and obtained statistically significant GO terms by applying a 8 p-value cutoff of <= 0.05. 9 Data visualization 10 A genome browser view of multiple genes was downloaded from UCSC genome 54 11 CTSSs and other relevant data were uploaded on UCSC Genome Browser as tracks for 12 visualization. A screenshot of promoter regions with data tracks were downloaded from 13 the UCSC browser. All other figures were made using R. 14 15 Purification of total RNA was performed using miRNeasy mini kit (Qiagen, Cat. The probes were subsequently purified on NucAway spin columns (Ambion), and then 29 ethanol-precipitated. Single whole-mount in situ hybridizations were performed as 30 described previously 55 . Double fluorescent in-situ hybridizations were carried out as 31 described previously 56 . 32

RNA extraction and RT-PCR amplification
Whole mount immunofluorescence after ISH hybridization 1 Embryos were washed in wash buffer (PBS, 0.3% v/v triton), incubated in blocking 2 buffer (PBS 1x, Tween 0.1%, Goat serum 4%, BSA 1%, DMSO 1%) for 3 hours and then 3 incubated with primary antibody over night at 4C (Anti-Fibrillarin, Abcam 38F3, 1:10). 4 Embryos were then washed in wash buffer and blocked 3 hours followed by incubation 5 with the secondary antibody overnight at 4C (Anti-Mouse Alexa 633, 1:500). 6 Imaging 7 Microscopy images were obtained with an Olympus DP70 camera fixed on a BX60 8 Olympus microscope. Confocal imaging was performed using a Leica TCS SP5 inverted 9 confocal laser microscope (Leica Microsystems, Germany) Digitized images were acquired 10 using a 63X glycerol-immersion objective at 1024X 1024 pixel resolution. Series of optical 11 sections were carried out to analyse the spatial distribution of fluorescence, and for each 12 embryo, they were recorded with a Z-step ranging between 1 and 2 μm. Image processing, 13 including background subtraction, was performed with Leica software (version 2.5). 14 Captured images were exported as TIFF and further processed using Adobe Photoshop 15 and Illustrator CS2 for figure mounting.        Inserts in k and l show head from dorsal view from which magnified view is cropped.