NF-Y controls fidelity of transcription initiation at gene promoters through maintenance of the nucleosome-depleted region

Faithful transcription initiation is critical for accurate gene expression, yet the mechanisms underlying specific transcription start site (TSS) selection in mammals remain unclear. Here, we show that the histone-fold domain protein NF-Y, a ubiquitously expressed transcription factor, controls the fidelity of transcription initiation at gene promoters in mouse embryonic stem cells. We report that NF-Y maintains the region upstream of TSSs in a nucleosome-depleted state while simultaneously protecting this accessible region against aberrant and/or ectopic transcription initiation. We find that loss of NF-Y binding in mammalian cells disrupts the promoter chromatin landscape, leading to nucleosomal encroachment over the canonical TSS. Importantly, this chromatin rearrangement is accompanied by upstream relocation of the transcription pre-initiation complex and ectopic transcription initiation. Further, this phenomenon generates aberrant extended transcripts that undergo translation, disrupting gene expression profiles. These results suggest NF-Y is a central player in TSS selection in metazoans and highlight the deleterious consequences of inaccurate transcription initiation.

W hile the sequence, structure, and binding partners of gene promoters have been intensely scrutinized for nearly half a century 1 , how the cell discerns when and where to initiate transcription is still not fully understood 2 . Recent studies have established basic rules regarding spatial arrangements of cis-regulatory elements, ordered recruitment of general transcription factors (GTFs) for transcription preinitiation complex (PIC) formation, and the role of chromatin in defining the promoter environment [3][4][5][6][7][8] . One key determinant of active gene promoters is the requirement for an accessible transcription initiation/start site (TSS), characterized by a nucleosome-depleted region (NDR) flanked by two wellpositioned nucleosomes (the −1 and +1 nucleosomes) 9 .
Within the NDR, several core promoter elements such as the TATA box and the initiator (Inr) element exhibit a positional bias relative to the TSS 10,11 and play important roles in TSS selection. However, the core promoter elements vary from one promoter to the next and can be either absent or present multiple times within a single NDR, suggesting that these elements are not the sole determinants of TSS selection. Thus, how the RNA Polymerase II (Pol II) chooses one transcription initiation site over another remains unclear. In yeast, mutational studies have identified several GTFs and other accessory factors with key roles in TSS selection [12][13][14][15][16][17][18] . However, despite greater specificity in TSS selection in metazoans, such accessory factors have yet to be described in higher eukaryotes.
NF-Y, also known as the CCAAT-binding factor CBF, is a highly conserved and ubiquitously expressed heterotrimeric transcription factor (TF) composed of the NF-YA, NF-YB, and NF-YC subunits, all three of which are necessary for stable DNA binding of the complex [19][20][21][22][23] . NF-YA, which harbors both DNAbinding and transactivation domains, makes sequence-specific DNA contacts, whereas the histone-fold domain (HFD) containing NF-YB and NF-YC interact with DNA via nonspecific HFD-DNA contacts 22,23 . The structure and DNA-binding mode of NF-YB/NF-YC HFDs are similar to those of the core histones H2A/H2B, TATA-binding protein (TBP)-associated factors (TAFs), the TBP/TATA-binding negative cofactor 2 (NC2α/β), and the CHRAC15/CHRAC17 subunits of the nucleosome remodeling complex CHRAC 22 .
NF-Y has an established role in gene regulation through cell type-invariant promoter-proximal binding 24,25 . However, the mechanisms through which NF-Y influences gene expression remain unclear. Several lines of evidence have pointed toward a possible role in the recruitment of chromatin modifiers and/or the PIC to the promoters of its target genes (for a review, see Dolfini et al. 20 ). Previously, we reported a critical role for NF-Y in facilitating a permissive chromatin conformation at cell typespecific distal enhancers to enable master transcription factor binding 24 . Here, we investigate the role NF-Y plays at gene promoters, what effect it may have on chromatin accessibility and recruitment of the transcription machinery, and how this might impact gene expression.
Through genome-wide studies in mouse embryonic stem cells (ESCs), we find that NF-Y is essential for the maintenance of the NDR at gene promoters. Depletion of the NF-YA protein leads to the accumulation of ectopic nucleosomes over the TSS, reducing promoter accessibility. Interestingly, under these conditions, we find that the PIC can relocate to a previously NF-Y-occupied upstream site, from where it commences ectopic transcription initiation. Remarkably, the resulting ectopic transcript can create novel mRNA isoforms and, in a large number of cases, leads to abnormal translation. Overall, we establish NF-Y's role in TSS selection and demonstrate its importance in safeguarding the integrity of the NDR at gene promoters.

Results
NF-Y promotes chromatin accessibility at gene promoters. Our previous characterization of NF-Y-binding sites in ESCs revealed that a majority of these binding sites are located within 500 base pairs (bp) of annotated TSSs of protein-coding genes 24 . Further analysis of our NF-Y ChIP-Seq data revealed a positional bias, with nearly all of the NF-Y-binding sites located immediately upstream (median distance 94 bp) of the TSS (Fig. 1a, b) and a positive correlation between NF-YA occupancy and CCAAT motif occurrence (Fig. 1b, c; Supplementary Fig. 1a-c). To investigate whether NF-Y promotes chromatin accessibility at proximal promoters, as it does at distal enhancers 24 , we used small interfering RNA (siRNA) to knockdown (KD) the DNAbinding subunit NF-YA in ESCs ( Supplementary Fig. 1d, e) and assessed DNase I hypersensitivity at candidate promoters, either bound or not bound by NF-Y. Quantitative assessment of the relative "openness" of the probed regions revealed that depletion of NF-YA results in a significant reduction in DNA accessibility at promoters bound by NF-Y, but not at promoters without NF-Y binding ( Fig. 1d; Supplementary Fig. 1f, g). In agreement with this, genome-wide assessment of chromatin accessibility using ATAC-Seq confirmed loss in ATAC signal that is specific to promoters targeted by NF-Y (Fig. 1e, f; Supplementary Fig. 1h, i). These data suggest that NF-Y helps maintain accessible chromatin at promoters.

NF-Y binding protects NDRs from nucleosome encroachment.
To further explore the role of NF-Y binding at promoters, we mapped nucleosomes using micrococcal nuclease (MNase) digestion followed by high-throughput sequencing (MNase-Seq). Our data revealed that NF-Y-binding sites genome-wide are depleted of nucleosomes ( Fig. 1g; Supplementary Fig. 2a). Focusing on NF-Y-bound promoter regions, we noted a strong anti-correlation between NF-Y and nucleosome occupancy (Fig. 1b, g). To determine whether NF-Y binding plays a direct role in occluding nucleosomes at its binding sites, we performed MNase-Seq in NF-YA-depleted cells. Examination of the nucleosomal landscape at candidate NF-Y target promoters in NF-Y-depleted cells revealed a striking gain of a nucleosome(s) within what was previously a well-defined NDR (Fig. 1e, h), an observation we confirmed by both MNase-qPCR (Fig. 1i) and histone H3 ChIP-qPCR analyses ( Supplementary Fig. 2b). Genome-wide gain of ectopic nucleosome(s) within the NDR upon NF-YA KD is observed specifically at NF-Y bound promoters (Fig. 1h), rather than non-NF-Y-bound promoters (Supplementary Fig. 2d), suggesting a direct effect. Interestingly, ectopic nucleosomes observed in NF-YA-depleted cells overlap the TSS of NF-Y bound genes (Fig. 1e, h; Supplementary Fig. 2d), suggesting that NF-Y binding could protect the TSS from inhibitory nucleosome binding. Moreover, this ectopic nucleosome positioning in NF-Y-depleted cells is consistent with sequencebased predictions of nucleosome binding preference at these regions 26 (Supplementary Fig. 2e). Altogether, these data suggest a role for NF-Y binding at gene promoters in protecting the NDR and the TSS from nucleosome encroachment and that in NF-Y's absence, nucleosomes are able to bind within this region.
NF-Y binding impacts PIC positioning and TSS selection. Nucleosomes characteristically provide a refractory chromatin environment for the binding of TFs. In fact, the TATA-binding protein (TBP), and hence the general Pol II transcription machinery, is unable to bind nucleosomal DNA 27 . Consequently, upon observing the appearance of ectopic nucleosome binding within the NDR of NF-Y-bound promoters in NF-YA-depleted cells, we sought to investigate whether this outcome affects binding of the transcription machinery and thus transcription initiation. Because NF-Y is known to interact with the Pol IIrecruiting TBP 28 and several TBP-associated factors (TAFs) 20 , essential components of the TFIID complex that recruits Pol II, we decided to investigate whether TBP enrichment was also affected upon NF-YA KD. Indeed, loss of NF-Y binding led to a significant diminishment and/or upstream shift of TBP's binding pattern ( Supplementary Fig. 3a), a consequence that could have a dramatic impact on TFIID recruitment, PIC positioning, and thus TSS selection.
In order to obtain a high-resolution view of Pol II activity and to map TSS utilization at base-pair resolution, we performed Start-Seq 9 , a high-throughput sequencing method that captures capped RNA species from their 5′-ends. Start-Seq faithfully mapped the canonical TSSs in the control cells, confirming its utility in capturing transcription initiation sites (Fig. 2a). More importantly, consistent with altered TBP binding, analysis of Start-Seq data in NF-YA-depleted cells revealed clear upstream shifts in TSS usage at many NF-Y-bound promoters (Fig. 2a). To ensure identification of promoters exhibiting significant shifts in TSS usage, we used a stringent criterion that excludes ectopic transcription initiation events that occur within ±25 bp of the canonical TSS. Our analysis of the 3056 NF-Y bound gene promoters, using this strategy, identified 538 genes exhibiting significant shifts in TSS location (Fig. 2b), with a vast majority exhibiting a TSS shift upstream of the canonical TSS. Ectopic TSSs are located at a median distance of 115 bp upstream of the canonical TSS ( Supplementary Fig. 3b). Importantly, this upstream ectopic transcription initiation is specific to NF-Ybound promoters, with the same directionality of transcription as that from the canonical TSS ( Fig. 2c; Supplementary Fig. 3c) Fig. 1 NF-Y binding at gene promoters protects the nucleosome-depleted region (NDR) from nucleosome encroachment. a Average NF-YA occupancy (y-axis), as measured using ChIP-Seq, near transcription start sites (TSSs) of genes with (black, n = 3191) or without (gray, n = 21,195) promoter-proximal NF-Y binding in mouse ESCs. RPM, reads per million mapped reads. b NF-YA occupancy near TSSs of genes with promoter-proximal NF-Y binding. c CCAAT motif occurrence near TSSs of genes with promoter-proximal NF-Y binding. d DNase I hypersensitivity and qPCR analysis of promoters with (top; n = 6) or without (bottom; n = 4) NF-Y binding in control (blue) and NF-YA KD (red) ESCs. Error bars, SEM. The data for individual promoters shown in Supplementary Fig. 1f, g. e Genome browser shots of candidate NF-Y target genes showing nucleosome occupancy in control (blue) or NF-YA KD (red) ESCs. Gene structure is shown at the bottom. Arrows highlight nucleosomal gain in NF-YA KD cells over what was previously a well-defined NDR in control KD cells. f Relative change (log2) in chromatin accessibility, as measured using ATAC-Seq, near TSSs of genes with promoter-proximal NF-Y binding in NF-YA KD vs. control KD ESCs. g Nucleosome occupancy (RPKM), as measured using MNase-Seq, near TSSs of genes with promoter-proximal NF-Y binding in control ESCs. RPKM, reads per one kilobase per one million mapped reads. h Relative change (log2) in nucleosome occupancy near TSSs of genes with promoter-proximal NF-Y binding in NF-YA KD vs control KD cells. i MNase-qPCR validation of MNase-Seq data shown in Fig. 1 e. Error bars, SEM of three biological replicates. *P-value < 0.04 (Student's t test, two-sided) of annotated genes, as the sites of transcription initiation of the associated upstream antisense RNA (divergent transcription [29][30][31] or the downstream antisense RNA (convergent transcription) exhibit no such shift (median shift of 0 bp; Supplementary  Fig. 3d).
Analysis of the differences in Start-Seq read counts upstream of the TSS between control and NF-YA KD cells revealed a positive correlation with NF-YA-binding intensity (Fig. 2d). Comparison of NF-Y-bound promoters that exhibit TSS shifts upon NF-YA KD against those that do not revealed stronger NF-YA occupancy at genes with TSS shifts (Supplementary Fig. 3e). Moreover, the loss of NF-YA binding at these genes upon NF-YA KD results in a substantial increase in nucleosome deposition over canonical TSSs compared with their non-shifting counterparts a Atp5g1 Hsp90b1 G3bp1 Expressionfold-change (log2) over control KD Having established that NF-YA depletion leads to altered localization of the transcription pre-initiation complex at a subset of NF-Y-bound promoters, we performed RNA-Seq to assess whether transcription initiation from sites upstream of canonical TSSs gives rise to ectopic transcripts. Consistent with NF-Y having roles beyond that of a canonical transcription activator, loss of NF-YA led to both up-and downregulation of mRNA levels ( Supplementary Fig. 3g). Importantly, RNA-Seq experiments in NF-YA-depleted cells revealed RNA emanating from sites upstream of the canonical TSS at many NF-Y-bound promoters ( Intriguingly, individual paired-end reads from RNA-Seq experiments also revealed that RNAs originating from ectopic TSSs can form stable multi-exonic transcripts (Fig. 2g); we confirmed this finding using RT-PCR experiments (Fig. 2i).
Remarkably, in cases such as Hsp90b1, instead of simply extending the 5′ end of the canonical mRNA, the ectopic transcript incorporates a new splicing donor site, resulting in transcripts that skip the canonical TSS, 5′UTR and first exon, giving rise to a new isoform (* product in Fig. 2h).
Because hundreds of genes change in expression in response to NF-YA depletion ( Supplementary Fig. 3g), we wondered if there might be other TFs that could be responsible for the observed effects. To objectively evaluate this possibility, we assessed the enrichment of 680 TF-binding motifs (source: JASPAR databse) within the promoter sequences (defined as 200 -bp upstream of TSS) of NF-Y-bound genes that exhibit a shift in TSS upon NF-YA depletion. Our analysis revealed enrichment of 24 TF-binding motifs ( Supplementary Fig. 4a), of which 14 were also enriched within promoters of non-NF-Y bound genes. Of the ten TF motifs enriched only within promoter sequences of NF-Y-bound genes that exhibit a shift in TSS ( Supplementary Fig. 4b), eight are not expressed (<1 RPKM) in the control or NF-YA KD ESCs, leaving NF-YA and CREB1 as the only two TFs that could potentially explain the observed effects. The NF-YA motif is present in 88% of the promoters that exhibit a shift in TSS, whereas the CREB1 motif is present in only 33% of the promoters. Furthermore, unlike NF-YA, CREB1 is upregulated (1.6-fold) in NF-YA KD ESCs, making it less likely to explain the TSS shift. Nevertheless, to investigate whether CREB1 motif-containing promoters (that do not bind NF-YA) also exhibit nucleosome encroachment and TSS shift, we examined MNase-Seq and Start-Seq signals in control and NF-YA KD cells and found no obvious changes in the nucleosome positioning or TSS selection ( Supplementary Fig. 4c, d) to suggest that CREB1 might be responsible for the observed changes at NF-Y-bound genes that exhibit a shift in TSS. Collectively, these data support our conclusion that the observed effects are directly attributable to the loss of NF-Y binding.
All our NF-YA KD studies were performed 48 h after siRNA transfection, when NF-YA is depleted but the cells appear normal with no obvious differentiation phenotype 24 . But, to rule out the possibility that the observed changes are due to ESC differentiation as a result of NF-YA depletion, we assessed ectopic TSS usage in ESCs undergoing retinoic acid (RA)-induced differentiation. Four days of RA-induced differentiation did not elicit the use of ectopic TSSs (Fig. 2i), indicating that the observed phenomenon is not due to global cellular differentiation effects that may occur upon NF-YA KD. Knockdown of NF-YC, or all three NF-Y subunits produced the similar results ( Fig. 2i), making it unlikely that the observed changes are due to potential offtarget effects involving siRNA targeting NF-YA. Collectively, these findings suggest that NF-Y binding impacts TSS selection at promoters of protein-coding genes, and that ectopic initiation creates aberrant mRNA species.
DNA sequence implication in NF-YA KD-induced effects. In about 22% of the cases of TSS-shifted genes, such as with the Ezh2 gene ( Fig. 3a), the ectopic TSS used in response to NF-YA KD corresponds to a previously described alternative TSS, indicating that DNA sequence-based elements play a role in defining sites of ectopic transcription initiation. To further our understanding of how ectopic nucleosome and TSS positioning are established in NF-Y's absence, we examined the DNA sequence underlying the regions surrounding the canonical and ectopic TSS of NF-Ybound genes that exhibit TSS shifts upon NF-YA KD. De novo motif analysis (±5 bp) surrounding the canonical and ectopic TSSs, as determined using Start-Seq, revealed the typical YR initiator dinucleotide 11 (Fig. 3b), indicative of a sequence recognition mechanism determining the location of ectopic TSSs.
We thus we investigated sequence conservation of the regions containing ectopic TSSs. PhyloP conservation scores were calculated from genomic sequence alignment across placental  Fig. 2i. b Relative fold change (log2) in Start-Seq signal (red, gain; blue, loss) near TSSs of NF-Y-bound genes exhibiting TSS shifts (n = 538) in NF-YA KD vs control KD cells. c Left: average fold change (log2) in Start-Seq signal near TSSs of NF-Y bound genes exhibiting TSS shifts (red), NF-Y-bound genes (black), and non-NF-Y-bound genes (gray) in NF-YA KD vs. control KD cells. Also shown is the average NF-YA occupancy (blue; secondary y-axis) in ESCs. Right: box plot showing the distribution of fold changes in Start-Seq signal (in NF-YA KD vs. control KD ESCs) within the upstream proximalpromoter regions (−200 bp to −50 bp; highlighted in yellow). ***P-value = 1.01E-66, ****P-value = 3.53E-128 (Wilcoxon rank-sum test, two sided) d Box plot showing the distribution of maximum differences in Start-Seq read count between control and NF-YA KD cells within the region upstream of TSS (−900 to −25bp). A 10 -bp sliding window was used for computing read count differences. Genes with NF-Y binding were binned into six groups based on NF-YA ChIP-Seq read count within −900 to +100 bp of TSS. *P-value < 0.05, **P-value < 0.0009, ***P-value = 4.47E-07, # P-value = 1.16E-18, ## Pvalue = 2.49E-18 (Wilcoxon rank-sum test, two-sided). e Same as (b), but for RNA-Seq data. f Same as (c), but for RNA-Seq data. ***P-value = 5.39E-120, ****P-value = 3.44E-158 (Wilcoxon rank-sum test, two-sided). g, h Genome browser shot showing RNA-Seq signal at the Hsp90b1 gene in control (blue) or NF-YA KD (red) ESCs (g). Arrow highlights region with ectopic RNA in NF-YA KD cells. A representative selection of individual RNA-Seq reads is shown beneath. Red and blue rectangles highlight the ectopic and endogenous splice sites, respectively. Schematic of the RT-PCR results shown at the bottom represent the different PCR amplification products shown in (h). PCR amplification was performed using the "ectopic" primer pair shown underneath the gene structure (g). * denotes new isoform fragment (bypassing the canonical 5'UTR, TSS and the 1st exon), ** denotes mRNA fragment with prolonged 5′ fragment (uses the canonical 1st exon), *** denotes pre-mRNA fragment, and # denotes non-specific/unknown fragment. i RT-qPCR analysis of relative gene expression using the "total" and "ectopic" primer pairs shown in Fig. 2a 32 and NF-Y-bound promoters were compared with an equivalent number of randomly chosen gene promoters. This analysis revealed significantly higher conservation of the region immediately upstream of canonical TSSs of NF-Y-bound genes (Fig. 3c). Further, the conservation at NF-Y-bound and TSSshifted promoters is even higher than that at NF-Y-bound promoters in general (Fig. 3c), along with a higher enrichment of CCAAT motifs (Fig. 3d) and NF-Y occupancy ( Supplementary  Fig. 3e). Focusing our analysis specifically on ectopic TSSs, we discovered a high degree of sequence conservation starting at the ectopic TSS and continuing downstream toward the canonical TSS (Fig. 3e), with the NF-Y-binding motif more strongly conserved than the regions immediately surrounding it (Supplementary Fig. 5a). Therefore, based on the high-level of conservation of NF-Y-bound promoter regions, and similarities in NF-Y-binding pattern between mouse and human (Supplementary Fig. 5b), we propose that NF-Y's role in the organization of the NDR and in TSS selection is likely conserved in other species.
Altogether, our data, which shows that the ectopic initiation sites are generally located upstream of the NF-Y binding and that the ectopic nucleosomes are observed downstream of the NF-Y binding (Fig. 3f), suggest that NF-Y controls the fidelity of transcription initiation at a subset of gene promoters through two complementary mechanisms: (i) NF-Y promotes transcription from the canonical TSS by maintaining the integrity of the NDR, and (ii) NF-Y binding within the NDR per se, either directly or indirectly, prevents PIC from "accidental" utilization of aberrant, upstream sites for transcription initiation.
RNAs from ectopic TSSs in NF-Y-depleted cells undergo translation. Considering the importance of the 5′UTR in the regulation of translation 33 , and the fact that close to 70% of genes showing a TSS shift upon NF-YA KD possess an AUG (translation start codon) within the ectopically transcribed region (Supplementary Data 1), we explored the potential impact of these aberrant transcripts on translation output. To do so, we performed Ribo-Seq, a ribosome-profiling experiment 34 , on control and NF-YA-depleted cells. By sequencing only the ribosomeprotected fraction of the transcriptome, Ribo-Seq allows us to determine which RNAs are being actively translated at a given time. Typical of Ribo-Seq, triplet phasing was observed beginning at the annotated translation start site ( Supplementary Fig. 6a), whereas RNA-Seq presented a flat, uniform distribution (Supplementary Fig. 6a). To determine if the transcripts originating from ectopic TSSs in NF-YA KD cells are also undergoing translation, we investigated the differences in Ribo-Seq read coverage within the region between the canonical and the ectopic TSSs. At the individual gene level, we can clearly detect the ribosome-protected RNA originating from the region upstream of canonical TSSs of NF-Y-bound genes that exhibit ectopic transcription initiation ( Fig. 4a; Supplementary Fig. 6b). Furthermore, of the 429 NF-Y-bound genes with ectopic TSSs that had sufficient Ribo-Seq coverage between the canonical and the ectopic TSSs, 92% showed significantly higher levels of ribosomeprotected RNA in NF-YA KD cells compared with control KD cells (Fig. 4b, c). To ensure that the ribosome-protected RNA, transcribed from the region between the ectopic TSS and canonical TSS, is undergoing translation and is not an artefact, we reanalyzed Ribo-Seq data for read coverage phasing after individually determining which ectopically transcribed AUG was most likely to be used as a translation start site for each gene. We found a significant enrichment of triplet periodicity in the Ribo-Seq read coverage beginning at the putative ectopic translation start site, in NF-YA KD cells compared with control cells (Fig. 4d), indicating that the ectopically transcribed regions indeed undergo translation.
In an attempt to evaluate whether these translated upstream open-reading frames (ORFs) generate fusion or variant forms of endogenous protein, we performed western blot analysis, using commercially available antibodies for candidate proteins, in cells depleted of NF-YA and treated with the proteasome inhibitor MG-132 (to stabilize any unstable fusion proteins; Supplementary  Fig. 6c, d). We did not detect any obvious aberrant fusion/variant protein of distinct molecular weight. However, in notable cases, we did detect altered protein expression levels. For a number of NF-Y-bound and TSS-shifted genes, there is a noticeable discordance between the manners with which the RNA and protein levels change in response to NF-YA KD ( Supplementary  Fig. 6d). This was surprising considering that changes in transcript levels generally result in proportional changes in protein abundance (Spearman correlation of~0.8 35,36 ). Recent studies have shown that upstream ORFs (uORFs) can have either a favorable or a deleterious effect on downstream mRNA translation [37][38][39][40][41] and suggested that the abundance of transcript is less important than the difference in translatability of the canonical versus ectopic transcript 39 . Thus, in situations wherein an upstream extension of the transcript leads to altered protein production, we speculate that this could be due to introduction of an uORF. To summarize, we establish here that transcripts originating from ectopic TSSs undergo translation, and highlight the variable effects this can have on the translation of the canonical ORF within these transcripts.

Discussion
Driven by the prevalence of the CCAAT motif(s) within core promoters, NF-Y's function as a regulator of gene expression has almost exclusively been studied in relation to its promoterproximal binding. Yet, the exact mechanism by which it exerts control over gene expression remains poorly understood. Through comprehensive genome-wide studies in ESCs, we have uncovered a previously unidentified role for NF-Y in safeguarding the integrity of the NDR structure, PIC localization, and TSS selection at protein-coding genes. NF-Y can access its target DNA motif, the CCAAT box, in a heterochromatic environment 25 . Furthermore, NF-Y's unique DNA-binding mode, which induces an ∼80°bend in the DNA, may allow and/or promote binding of other TFs, whose recognition sequences become more accessible 24 . Supporting this thesis, DNase experiments have shown that NF-Y is essential for the maintenance of an accessible chromatin 24,42,43 . These attributes have led us and others to propose that NF-Y is a "pioneer factor" 22,24,25,[42][43][44] .
Through comparison of NF-YA ChIP-Seq and MNase-Seq data, we have shown mutual exclusivity between NF-Y and nucleosome occupancy genome-wide (Fig. 1b, g; Supplementary  Fig. 2a). Given that the structure and DNA-binding mode of NF-YB/NF-YC HFDs are similar to those of the core histones H2A/ H2B 22,23,45,46 , our findings suggest steric incompatibility between NF-Y and nucleosomes. This conclusion is supported by the observation that upon NF-YA KD, nucleosomes bind within the NDRs left vacant by NF-Y, positioning them in a manner that strongly reflects DNA sequence preferences ( Supplementary  Fig. 2e).
The presence of a well-defined NDR within active gene promoters is essential for access by GTFs and PIC assembly, and thus correct transcription initiation. While NF-Y binding does not seem to impact positioning of the +1 or −1 nucleosomes that demarcate the NDR (Fig. 1e, h), we find that NF-Y is essential for maintaining a nucleosome-depleted NDR. Although we cannot rule out the possibility that NF-Y recruits an ATP-dependent chromatin remodeler to orchestrate nucleosome removal, given NF-Y's capacity to disrupt the compaction of the chromatin, it is tempting to speculate that NF-Y, with its sequence-specific binding ability, could be acting as an ATP-independent chromatin remodeler. This idea is supported by previous findings that show NF-Y's capacity to displace nucleosomes in an in vitro context 47,48 . Overall, it seems that proteins with tertiary structures similar to core histones can independently preclude nucleosome occupancy. It will be interesting to see if all proteins containing histone-fold domains have a similar effect on nucleosome binding, as has been shown to be the case with subunits of the CHRAC complex 49 .
NF-Y has been shown to play a direct role in the recruitment of the pre-initiation complex through interactions with TBP and several TAFs 28,50 . We have shown here that NF-Y also plays an indirect role in the recruitment of PIC-associated proteins, since its binding to promoter-proximal regions is necessary for the maintenance of an open-chromatin structure over the TSS, allowing for effective binding of the transcription machinery. In the case of NF-Y's indirect impact on PIC recruitment, we observed an associated upstream shift in TSS location upon NF-YA KD (Fig. 2b). Yet, it is interesting to note that only a subset of NF-Y-bound genes exhibits this TSS shift. Besides the strength and stability of NF-Y binding and the number of binding events, this likely reflects involvement of additional factors. Our analyses show that efficient utilization of an ectopic TSS requires sequence features within the exposed DNA that are amenable to proper PIC binding, such as previously described alternative start sites (as is the case for Ezh2; Fig. 3a) or initiator motifs. Moreover, our discovery that a majority of the ectopic TSSs are observed to occur near CCAAT boxes (Fig. 3c) suggests that NF-Y binding within the NDR by itself could sterically hinder PIC from aberrant utilization of alternative sites for transcription initiation.
We thus conclude that NF-Y binding at promoters serves at least three roles: (1) direct PIC recruitment to the promoter region through its interactions with TBP and the TAFs, (2) prevent ectopic nucleosome binding within the NDR through its nucleosome-like structural properties and DNA-binding mode, and (3) occlude alternative transcription initiation sites to ensure correct TSS usage. The shift in TSS usage upon NF-YA KD appears to stem from the two latter roles, whereby, upon loss of NF-Y binding, a potential transcription initiation site is uncovered while, simultaneously, a nucleosome prohibits optimal PIC binding to the canonical TSS, forcing the PIC to relocate to an accessible site upstream. Our findings are consistent with studies in transgenic mice showing that the CCAAT-containing Y-box sequence is critical for accurate and efficient transcription and that deletion of the Y-box results in aberrant transcripts initiating from regions upstream of canonical TSS 51 .
As might be expected, the usage of an ectopic, upstream TSS has variable consequences on the steady-state levels of resulting mRNAs and protein products. The strength of the ectopic initiation site along with the regulatory potential of the additional upstream mRNA sequences (region between the ectopic and the canonical TSS) undoubtedly impact transcription levels and mRNA stability. In addition, the TSS employed in NF-YA depleted cells could represent a known alternative start site (as in the case of Ezh2; Fig. 3a) or generate a previously uncharacterized mRNA isoform (as with Hsp90b1; Fig. 2i). Furthermore, the upstream extension of mRNA to include a novel ORF can cause abnormal translation (as with Khsrp and C7orf50; Fig. 4a and Supplementary Fig. 4b, respectively). All of these sequelae can affect the quantity and quality of mRNAs and proteins, which could have significant, yet unpredictable, consequences on cell survival or function.
Notably, NF-Y predominantly binds the CG-rich promoters of essential genes (cell-cycle, transcription, DNA repair, etc.), whose accurate expression is vital in most cell types 24,25 . In fact,~80% of all promoter-proximal NF-Y-binding sites overlap CpG islands 24 .
Unlike CG-poor promoters, which often correspond to tissuespecific genes and initiate transcription from a well-defined site, CG-rich promoters contain a broad array of transcription initiation sites and often associate with housekeeping genes 2,11,52 . The requirement for NF-Y at promoters of essential genes could reflect both the role of NF-Y in PIC recruitment, and enforcement of appropriate TSS usage.
We suggest that this dual role for NF-Y may explain why promoter-proximal NF-Y binding is so well conserved across mouse and human cell types. Intriguingly, studies in Saccharomyces pombe have shown that deletion of Php5, an NF-YC orthologue, leads to an~250 -bp upstream shift in TSS of the gluconeogenesis gene Fbp1 53 . Moreover, in Saccharomyces cervisiae, out of the 46 TFs studied, the binding sites for NF-YA homolog Hap2 were shown to have the biggest difference in predicted nucleosome occupancy between Hap2-bound (lower nucleosome occupancy) and non-Hap2-bound (higher occupancy) sites 54 . This opens the exciting possibility of NF-Y's role in promoter chromatin organization being conserved throughout the eukaryotic kingdom.
A consequence of altered TSS selection upon NF-YA KD is that ribosomes can scan over any ectopic mRNA. Ribosomes typically initiate translation upon encountering the AUG start codon, although other codons have been shown to induce translation initiation [55][56][57] . In our study, we found that nearly three quarters of genes showing a TSS shift upon NF-YA KD possess at least one ATG triplet between the ectopic and the canonical TSS. Importantly, we found evidence for translation initiation from such sites (Fig. 4d). Given that uORFs can modulate downstream translation and thus act as potent regulators of translation and protein expression [38][39][40][41]58 , it is conceivable that translation initiation from noncanonical start codon(s) within uORFs alters the reading frame and/or protein length; alternatively, it may affect the efficiency with which ribosomes translate the rest of the transcript 59 .
In summary, our studies describe NF-Y's mechanistic role at promoters, where it is necessary for both maintenance of the NDR's structural architecture and correct positioning of the transcriptional machinery, therefore influencing TSS selection. Furthermore, our results strongly suggest that the sites of NF-Y binding and the +1 nucleosome demarcate the 5′ and 3′ boundaries, respectively, of the region available for PIC assembly, thereby directing the transcription machinery to the correct TSS while occluding alternative TSSs and other sites of sub-optimal transcription initiation. It will be interesting to explore whether other histone-fold domain proteins, with similar structural and DNA-binding properties analogous to NF-Y, may function in a similar manner.
Cell debris were pre-cleared by centrifugation at 14,000 rpm for 20 min, and 25 μg of chromatin was incubated with either NF-YA (Santa Cruz, G-2, sc-17753X), histone H3 (Abcam, ab1791), RNA Pol II (Covance, MMS-126R) or TBP (Abcam, ab51841) antibodies overnight at 4°C. Protein A/G-conjugated magnetic beads (Pierce Biotech, 88846/88847) were added the next day for 2 h. Subsequent washing and reverse cross-linking were performed as previously described (Heard et al., 2001) 61 . ChIP enrichment for a primer-set was evaluated using quantitative PCR, as percentage of input, and normalized to a negative primer set. See Supplementary Data 2 for the list of primers used.
DNase I hypersensitivity. DNase I hypersensitivity experiments were performed as previously described 24  Quantitative RT-PCR. Quantitative RT-PCR was performed as previously described 24 . Briefly, the total RNAs were prepared from cells using Qiazol lysis reagent (Qiagen, 79306), and cDNAs were generated using the iScript kit (Bio-Rad, 1708891) according to the manufacturer's instructions. Quantitative PCRs were performed on the Bio-Rad CFX-96 or CFX-384 Real-Time PCR System using the SsoFast EvaGreen supermix (Bio-Rad, 1725201). Three or more biological replicates were performed for each experiment. The data are normalized to Actin, Haz, and TBP expression, and plotted as mean +/− SEM. See Supplementary Data 2 for the list of primers used.
MNase-Seq data analysis. MNase-Seq read pairs for all samples were aligned to the mouse (mm9) genome using Bowtie 63 , retaining only uniquely mappable pairs (-m1, -v2, -X10000, -best). Fragments shorter than 120 nt and larger than 180 nt were filtered, as were all duplicate fragments, using custom scripts. Replicates were merged for each condition, and normalized per ten million uniquely mappable, non-duplicate fragments. BedGraph files containing single-nucleotide resolution fragment centers were generated to facilitate metagene analyses and creation of heatmaps, while whole-fragment coverage bedGraphs were generated for visualization purposes.
ATAC-Seq. 25,000 cells were incubated in CSK buffer (10 mM PIPES pH 6.8, 100 mM NaCl, 300 mM sucrose, 3 mM MgCl 2 , 0.1% Triton X-100) on ice for 5 min and then centrifuged for 5 min at 4°C and 500 g. After discarding the supernatant, an aliquot of 2.5 µl of Tn5 Transposase was added to a total 25 µl reaction mixture (TD buffer + H 2 O). The solution was then heated at 37°C for 30 min (with mixing every 10 min). The solution was cleaned up using a MinElute Qiagen kit. After PCR amplification (eight total cycles), DNA fragments were purified with two successive rounds of AMPure XP beads (1:3 ratio of sample to beads).
ATAC-Seq data analysis. Low-quality reads were removed if they had a mean Phred quality score of <20. Any reads with Nextera adapter sequence were trimmed using cutadapt v1.12. Reads were aligned using Bowtie v1.2 with the following parameters: "-v 2 -m 1 -best -strata". Reads aligning to the mitochrodia (chrM) were removed, and reads were deduplicated by removing read pairs with both mates aligning to the same location as another read pair. To measure open chromatin, coverage tracks were generated using the first 9 bp of both mates of the aligned reads (corresponding to where the transposase is bound). For smoother coverage tracks that provide better visibility in the genome browser, the original 9bp regions were extended in both directions an equal distance until the region was 51 -bp long. Coverage tracks were normalized to read coverage per ten million mapped reads (after removing chrM and deduplication).
RNA-Seq. The total RNA was extracted with Qiazol lysis reagent (Qiagen, 79306) treatment and ethanol precipitation. The samples were then treated with DNAseI Amplification grade (Thermo Scientific, 18068015) and stranded libraries were prepared using the TruSeq stranded RNA kit (Illumina, 20020598) with RiboZero depletion (Gold kit; Illumina, MRZG12324).
RNA-Seq data analysis. Reads were mapped to the mouse (mm9) genome using TopHat v2.1.0 64 . In order to get the transcripts GTF from our samples, Cufflinks 65 was run with the following options, -g (mm9 GTF from ENSEMBL, version 67, provided as guide). We generated transcriptome assemblies for each of these samples separately and then use Cuffmerge 65 to combine all the annotations. We used Deseq2 66 with default parameters for all differential expression analyses with gene count data from Salmon quantification 67 .
Start-Seq data analysis. Start-Seq reads were trimmed for adapter sequence using cutadapt 1.2.1; 70 pairs with either mate trimmed shorter than 20 nt were discarded. A single additional nucleotide was removed from the 3′ end of each read to facilitate mapping of fully overlapping pairs. Remaining pairs were filtered for rRNA and tRNA by aligning to indices containing each using Bowtie 0.12.8 (-v2, -X1000, -best, -un, -max), and retaining unmapped pairs. Following this, a similar alignment was performed to an index containing the sequence of spike-in RNAs only (-m1, -v2, -X1000, -best, -un, -max), and finally, the remaining unmapped reads were aligned to the mouse (mm9) genome utilizing the same parameters, retaining only uniquely mappable pairs.
Strand-specific bedGraph files containing the combined raw counts of shortcapped RNA 5′ ends for all control replicates were generated to facilitate observed TSS calling. For all other purposes, 5′ end counts were normalized per ten million mappable reads, then based on depth-normalized counts aligning to spike-in RNAs. Spike normalization factors were determined as the slope of the linear regression of each sample's depth-normalized spike-in read counts versus the single sample with the lowest total count. Control and NF-YA knockdown bedGraph files were generated from these spike-normalized counts by taking the mean of all replicates, genome-wide, at single-nucleotide resolution.
Observed TSS calling. Observed TSSs were identified as previously described 9 , based on the control Start-Seq data, using mm9 RefSeq annotations downloaded from the UCSC genome browser (January, 2015). Briefly, the position with the highest read count within 1000 nt of an annotated TSS, or that with the highest count within the 200 nt window of highest read density was selected, depending on proximity. When insufficient Pol II ChIP-Seq signal existed in the 501 nt window centered on the selected locus, relative to a comparable window about the annotated TSS (a ratio less than 2:3), the observed TSS was shifted to the location with the highest Start-Seq read count within 250 nt of the annotated TSS. When fewer than five reads were mapped to the selected locus, the annotated TSS was maintained. Groups of transcripts with identical observed TSS were filtered, maintaining a single representative with the shortest annotated to observed TSS distance. Groups of observed TSSs within 200 nt of one another were similarly reduced by first removing any RIKEN cDNAs, predicted genes, or observed TSSs moved to the annotation due to lack of Start-Seq reads. Following this, a single observed TSS was selected based on observed to annotated proximity. In this manner, observed TSSs were called for 24,498 transcripts; of these, 16,483 were selected based on Start-Seq data, while for 8015 the annotated position was maintained. NF-Y-bound promoters were then identified as those with an NF-YA ChIP-Seq peak intersecting the observed TSS −900 to +100 nt window.
Ectopic TSS calling. Ectopic TSSs were identified through the comparison of NF-YA knockdown Start-Seq read counts to control using DESeq 71 . Counts were determined for all samples in 10-nt bins tiling the region −995 to +995 nt, relative to each observed TSS. Bins closer to an upstream or downstream TSS than their own were excluded, as were those in the observed TSS −25 to +24 nt region. Normalization was performed based on size factors calculated within DESeq from spike-in RNAs alone, ensuring these values were equivalent across all samples. All bins with a positive log2 fold change and adjusted p-value less than 0.1 were identified. If more than one bin associated with a single observed TSS was selected, that with the lowest adjusted p-value was retained. Within each of these bins, the position with the total Start-Seq read count across all NF-YA knockdown samples was selected as the ectopic TSS. In cases where multiple sites exist with identical counts, that closest to the observed TSS was selected.
Ribo-Seq. Approximately 8-9 × 10 6 ESCs (per sample) were treated with cycloheximide (0.1 mg/ml; Sigma, 01810-1 G) for 1 min prior to trypsinization and cell lysis. Control and NF-YA KD cells were used as input material for the TruSeq Ribo Profile Mammalian Library Prep Kit (Illumina) following the manufacture's protocol.
Ribo-Seq data analysis. The total RNA-Seq and ribosome-protected-RNA-Seq (Ribo-Seq) read pairs were trimmed with cutadapt; 70 fragments shorter than 15 nt were discarded. Read pairs were filtered for rRNA and tRNA by aligning to respective indices using Bowtie 0.12.8 (-v2, -X1000, -best, -un, -max), and retaining unmapped pairs. The remaining read pairs were aligned to the mouse (mm9) genome using STAR v2.6.0c 72 . The read counts intersecting CDSs were determined per sample. We then determined the normalization factors from the aligned counts using DESeq2 v1.18.1 66 . Using STAR's bedGraph output (read pairs with unique alignment), both sequencing runs were merged using unionBedGraphs (a component of bedtools 2.25.0). We then applied the previously calculated normalization factors to the merged bedGraphs. To obtain bigwig format files, both normalized biological replicates were merged using unionBedGraphs and then converted to bigWig. To infer position of ribosomes from Ribo-Seq reads, we used the 5′ end of each fragment. We then added 12 nt in order to locate the highest peak on the A of ATG. The same process was applied to reads from total RNA-Seq for comparison, to check that RNA-Seq read positions do not exhibit a 3n periodicity like Ribo-Seq reads do. To study the coverage in shifted regions, we used a two-step process. First, we searched for all the ATGs in those regions. Then, the ATG with the highest coverage within the downstream 60 nt was selected. We finally plotted the cumulative coverage, taking the A of the selected ATG as the anchor. Genes whose cumulative Ribo-seq signal, within the shifted region, represented > 5% of total Ribo-Seq reads were removed from the phasing analysis (n = 3).
Motif analysis. Around observed TSSs bound by NF-YA, those not bound by NF-YA, those bound by NF-YA with an associated ectopic TSS, as well as bound and non-bound ectopic TSSs themselves, de novo motif discovery was performed in the TSS +/− 50 region using MEME 73 (-dna -mod zoops -nmotifs 25 -minw 6 -maxw 20 -revcomp). TRAP 74 was used to search for the enrichment of 680 known TF motifs, obtained from the JASPAR database 75 , within the promoter sequences (defined as 200 nucleotides upstream of the TSS). Statistical significance for enrichment of sequence motifs within promoters of interest were calculated in reference to promoter regions from all mouse genes. Benjamini-Hochberg method was used for multiple-testing correction.
Sequence conservation and predicted nucleosome occupancy data. Pernucleotide predicted nucleosome occupancy for the mouse (mm9) genome was obtained from the authors of a previously published study 26 . Per-nucleotide phyloP conservation scores, based on 20 placental mammals, were downloaded from the UCSC Genome Browser. Both data sets were converted to bedGraph format using custom scripts to facilitate generation of metagene analyses and heatmaps.

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request. ATAC-Seq, MNase-Seq, RNA-Seq, Start-Seq, and Ribo-Seq data generated for this study have been deposited in the GEO repository under the accession number GSE115110. The NF-YA ChIP-Seq data used in this study, generated for our previous study 24