BEN-solo factors partition active chromatin to ensure proper gene activation in Drosophila

The Drosophila genome encodes three BEN-solo proteins including Insensitive (Insv), Elba1 and Elba2 that possess activities in transcriptional repression and chromatin insulation. A fourth protein—Elba3—bridges Elba1 and Elba2 to form an ELBA complex. Here, we report comprehensive investigation of these proteins in Drosophila embryos. We assess common and distinct binding sites for Insv and ELBA and their genetic interdependencies. While Elba1 and Elba2 binding generally requires the ELBA complex, Elba3 can associate with chromatin independently of Elba1 and Elba2. We further demonstrate that ELBA collaborates with other insulators to regulate developmental patterning. Finally, we find that adjacent gene pairs separated by an ELBA bound sequence become less differentially expressed in ELBA mutants. Transgenic reporters confirm the insulating activity of ELBA- and Insv-bound sites. These findings define ELBA and Insv as general insulator proteins in Drosophila and demonstrate the functional importance of insulators to partition transcription units.


Introduction
Proper gene regulation requires coordinated activities of distinct classes of cis-and transregulators. A special type of cis elements, namely insulators (or boundary elements), have classic roles of constraining enhancer-promoter interaction (Geyer and Corces 1992;Bell et al. 1999) and setting chromatin boundaries (Kellum and Schedl 1991). Boundary or enhancer-blocking activities of newly identified insulators were mostly tested on a one-on-one basis in transgenic lines or genetically dissected for individual loci. Recent advances in genomics and chromatin structure capture techniques allowed more systematic identification of insulators and but also assigned new properties to them in chromatin architecture organization (reviewed in (Phillips-Cremins and Corces 2013), (Valenzuela and Kamakaka 2006)).
Insulator elements are often realized through the proteins that bind to them that are referred to as insulator proteins. The zinc-finger protein CTCF seems to be the only insulator protein conserved between vertebrates and invertebrates. In addition to its established roles as an insulator in chromatin organization, long-range regulatory element looping and enhancer segregating (Phillips-Cremins and Corces 2013), several original studies on mammalian CTCF indicated its direct role in transcriptional repression (Lutz et al. 2000;Perez-Juste et al. 2000). In contrast, more than a dozen proteins were indicated with insulator function in Drosophila (Kyrchanova and Georgiev 2014). According to the binding patterns bound by five classic insulators including CP190, BEAF32, CTCF, Su(Hw) and Mod(mdg4), Drosophila insulators were divided into two classes (Negre et al. 2010). Class I are mainly bound by CP190, BEAF32 and CTCF in active chromatin regions proximal to promoters, and class II mostly bound by Su(Hw) locate in more distal intergenic loci.
The BEN (BANP, E5R, and NAC1) domain is a recently recognized domain present in a variety of metazoan and viral proteins (Abhiman et al. 2008). Several BENcontaining proteins including mammalian BANP/SMAR1 (Kaul-Ghanekar et al. 2004;Rampalli et al. 2005), NAC1 (Korutla et al. 2005;Korutla et al. 2007), BEND3 (Sathyan et al. 2011), and the C isoform of Drosophila mod(mdg4) (Gerasimova et al. 1995;Negre et al. 2010) have chromatin associated function and have been linked to transcriptional silencing. We and others showed that the BEN domain possesses an intrinsic sequencespecific DNA binding activity. Mammalian RBB, a BEN and BTB domain protein binds to and directly represses expression of the HDM2 oncogene through interacting with the nucleosome remodeling and deacetylase (NuRD) complex (Xuan et al. 2013). Drosophila Insv binds to a palindromic motif, TCCAATTGGA and its variants (TCYAATHRGAA), and represses genes in the nervous system (Dai et al. 2013c). Two other Drosophila BEN proteins, Elba1 and Elba2, along with the adaptor protein Elba3, are assembled in a hetero-trimeric complex (Elba) and associate with the asymmetric site "CCAATAAG" in the Fab-7 insulator (Aoki et al. 2012). The closely linked elba1 and elba3 genes are only expressed during mid-blastula transition, which restricts Elba activity to a short developmental window. Interestingly, the genes encoding Insv and Elba2 are also arranged head-to-head in the genome, while their gene products are present throughout development.
Most of BEN domain proteins contain other characterized motifs. However, Insv, Elba1, Elba2 and several mammalian homologs, such as BEND5 and BEND6, harbor only one BEN domain and lack other known functional domain. Thus, we referred this sub-class as BEN-solo factors (Dai et al. 2013a;Dai et al. 2015). Our previous work suggests that Insv and Elba BEN-solo factors share common properties, e.g. binding to the palindromic sites as homodimers and repressing reporter genes in culture cells, but also display distinct activities, e.g. Insv being the only one that interacts with Notch signaling in Drosophila peripheral nervous system and its inability to bind to the asymmetric site (Dai et al. 2015). Interestingly, the Fab-7 insulator requires Elba for its early boundary activity but also requires Insv in later development (Fedotova et al. 2018).

It remains to be determined how the Elba factors regulate gene expression and
what the biological functions of Elba and Insv in embryogenesis are. In this study, we comprehensively characterized the three Drosophila BEN-solo factors and the adapter protein Elba3 in early Drosophila embryos, with focus on their DNA binding preferences (symmetric versus asymmetric), chromatin binding inter-dependence (homo-dimers versus hetero-trimeric complex) and mechanisms in gene regulation (repressor versus insulator). Our ChIP-seq analyses confirm that all three BEN-solo factors associate with both the symmetric (palindromic) and asymmetric types of sites, despite that Insv displays higher affinity to the symmetric type. Unexpectedly, Elba3 remains associated with chromatin even in the absence of its DNA binding partners Elba1 and Elba2, suggesting that it also uses other co-factor(s) to target chromatin. Our ChIP-nexus assay shows a genome-wide symmetric DNA binding pattern of Insv while an asymmetric pattern of Elba1 and Elba2. Consistent with the repressive function in culture cells, their direct targets became de-repressed in mutant embryos. Moreover, the Elba factors and the other insulator proteins including GAF and CP190 show strong genetic interactions in regulating embryonic patterning. Finally, using the Pro-seq approach, we show that adjacent genes flanked with Elba binding are less differentially expressed in all three elba mutants. Insv-associated adjacent genes do not show such a global effect, but individual neighbor promoters are insulated by Insv binding. These findings indicate a role of Elba and Insv as general insulators in partitioning transcription units in Drosophila. In support of this conclusion, Elba-and Insv-bound elements show blocking of enhancer-promoter interaction in transgenic reporters.

The Elba complex shares many genomic binding sites with Insv
We previously described genomic binding for Insv whose ChIP-seq peaks cover numerous genomic sites that bear its specific binding motif (CCAATTGG and variants thereof) (Dai et al. 2013a). Within the Fab-7 insulator, Insv localizes strongly to two sites of the palindromic sequence, as well as weakly to a divergent site known as the ELBA site (Dai et al. 2015;Fedotova et al. 2018;Fedotova et al. 2019).
The ELBA site in the Fab-7 insulator was the only known genomic location of the Elba complex (Aoki et al. 2012). We intended to broaden this perspective by generating ChIP-seq data for each of the three ELBA factors from blastoderm stage embryos, which includes the peak expression of the ELBA factors (Dai et al. 2015) (Supplemental To determine Elba and Insv binding regions, we first assessed the quality of the three controls, Input, IgG and mutant ChIP. Mutant ChIP data appeared to be the most stringent, as it gave the highest occurrence of the known Insv/Elba motif (Supplemental Fig. S1B). We called 3151, 1468, 6525 and 4927 peaks for Elba1, Elba2, Elba3 and Insv respectively after using wild-type ChIP-seq peas against the corresponding mutant ChIP peaks ( Fig. 1 A-B, Supplemental table2). When ranked according to the peak scores of the Elba3 ChIP signal, the three Elba peaks show extensive correlation, whereas a group of strong Insv peaks stand out and do not correlate with the Elba peaks ( Fig. 1 A).
Similarly, the Venn-diagram shows that about half of the Insv peaks are unique ( Fig. 1   B), while the Elba2 peaks are covered entirely by the Elba1 peaks that are further covered by the Elba3 peaks. We reasoned that the difference in the number of the three Elba ChIP peaks could be due to difference on their antibody affinities or in the number of in vivo binding regions. Indeed, the Elba2 antibody did not work well in immunofluorescent staining whereas the Elba3 antibody showed strongest signal (Supplemental Fig. S1A).
Our de novo motif discovery analysis identified the known Insv symmetric CCAATTGG and the ELBA-type asymmetric sites CCAATAAG as well as their variants from the ChIP-seq peaks of all four factors ( Fig. 1 C-D). The two types of motifs are similarly enriched in the Elba factor peaks, consistent with our previous report that Elba1 and Elba2 can bind to both motifs in vitro. For Insv, the peaks that contain the symmetric sites show higher ChIP signal than those with the asymmetric ones ( Fig. 1 D), suggesting that Insv binds to the symmetric site with higher affinity or in more loci.
Compared to the Elba3 peaks that enrich at promoter proximal regions, larger fraction of the Insv-unique sites locates at distal upstream regions, introns, exons and intergenic regions (Supplemental Fig. S1C). The Insv-unique portion also covers more peaks that contain the consensus sequences (Supplemental Fig. S1D). Thus, despite having similar DNA binding domains and expression patterns in early embryos, Insv displays differential binding preferences than Elba. Three loci exemplifying four-factor binding (CG12811), Elba unique binding (mRpS24) and Insv unique binding (kirre/Notch) are shown respectively ( Fig. 1E-G).

Elba1 and Elba3 are able to associate with chromatin independent of the Elba trimeric complex
The three subunits of Elba rely on one another to be able to bind to the ELBA site in Fab-7 in vitro (Aoki et al. 2012). To examine how the four factors depend on one another in vivo, we called ChIP peaks for each factor in every mutant condition against its own mutant ChIP signal ( Fig. 2A, Supplemental table 2). Elba1 and Elba2 lost nearly all of their peaks in elba3 mutant, indicating Elba3 is an essential component for the Elba complex to bind the genome. Unexpectedly, Elba3 maintains more than half of its peaks in elba1 or elba2 mutant ( Fig. 2A-B), demonstrating that Elba3 binding to these sites isindependent of Elba1 and Elba2. Most of the Elba1 binding sites are lost in elba2 mutant with only 712 peaks left. All of the Elba2 sites are lost in elba1 mutant, as the residual sites seem to be noises. We confirmed that loss of binding is not due to the absence of the Elba proteins in the other elba mutant conditions, as Elba1 is normally expressed in the elba3 mutant and Elba3 is normal in the elba1 mutant ( Supplemental   Fig. S1A). In contrast, Insv binding was not affected by any of the elba mutations, or vice versa ( Fig. 2A, Supplemental Fig. S2A-B).
We compared the two groups of Elba3 peaks, the 2478 Elba1/2-dependent and 3806 independent ones ( Fig. 2B-D, Supplemental Fig. S2C-E). The Elba1/2-dependent ones are more enriched in introns, exons and distal regions and have higher frequency of motif occurrence, whereas the Elba1/2-independent sites are mostly at promoter-TSS proximal regions and contain small fraction with the motifs (Supplemental Fig. S2D).
Given that Elba3 does not harbor a DNA binding domain, we asked whether Insv mediates Elba3 binding to the genome in the absence of Elba1 and Elba2. Although the fraction of overlapping peaks with Insv is higher in the independent sites than the dependent ones, half of the Elba1/Elba2-independent peaks do not overlap with Insv peaks (Supplemental Fig. S2D). This suggests that Insv may contribute to or enhance Elba3 binding in some but not all loci. Moreover, the Elba1/2 independent sites have higher peak scores (Supplementary Fig. S2E), indicating stronger binding of Elba3 to these sites. Thus, there are intrinsic differences between these two groups of Elba3 sites, pointing to two means of Elba3 targeting to the genome.
We next examined the 712 Elba1 peaks that remained in elba2 mutant (Fig. 2E,   Supplemental Fig. S2F-H). The Rfx gene locus is shown as an example (Fig. 2F). This fraction of Elba1 peaks has less frequent occurrence of the Insv/Elba motifs and are more enriched in promoter-proximal regions (Supplemental Fig. S2F). The peak scores of these are comparable to those of the Elba1 peaks in wt embryos (Supplemental Fig.   S2H), suggesting that the signals are not background noises. Interestingly, among these peaks, 496 do not overlap with the Elba1 peaks in wt but with the Elba3 peaks in elba2 mutant (Fig. 2E). This result suggests that Elba1 may have shifted or enhanced its binding to these new loci with the help of Elba3 in the absence of Elba2. Compared to its binding sites in wt embryos, the Elba1 sites in elba2 mutant are more similar to the Elba1/2-independent Elba3 sites, with a higher enrichment in promoter-TSS proximal region and fewer peaks that contain the motifs (Supplemental Fig. S2F). The fraction of Elba1 peaks in elba2 mutant overlaps less with Insv, arguing that Elba1, Insv and Elba3 unlikely form a complex at target sites in the absence of Elba2 (Supplemental Fig. S2G).
Together, the ChIP-seq analyses revealed unexpected in vivo binding capacity of the three Elba factors to the genome, in which Elba3 is essential for the complex and has the ability of targeting its genomic target sites without Elba1 and Elba2.

ChIP-Nexus differentiates heterotrimeric binding versus homodimer binding
We reported that all the three BEN-solo proteins could bind to the symmetric site as homodimers while only the Elba complex could bind to the asymmetric site (Aoki et al. 2012;Dai et al. 2015). Our ChIP-seq data suggested that Elba and Insv associate with both types of sites in the genome (Fig.1C-D) and that Elba and Insv extensively overlap ( Fig. 1). Given the issue that relatively broad ChIP-seq peaks can limit the resolution of distinguishing closely spaced factors, we performed ChIP-nexus (chromatin immunoprecipitation experiments with nucleotide resolution through exonuclease, unique barcode and single ligation) (He et al. 2015) to better discriminate their binding preference. We used the same sets of antibodies and the same stage of 2-4 hr wt embryos.
As ChIP-nexus datasets lack negative control, we manually spotted coverage intensity and set highly stringent cut-off (FDR < 1E-10 for Elba1, Elba3 and Insv, and FDR <1E-5 for Elba2) according to signal versus background ratio. After applying this cut-off, we compared motif occurrence frequencies between ChIP-seq and ChIP-nexus. The occurrence of the Insv/Elba motif is more frequent in the overlapping peaks of the two datasets ( Supplementary Fig. 3A). Therefore, to ensure signal specificity of the ChIPnexus peaks, we assigned the overlapping peaks as ChIP-nexus peaks and performed subsequent analysis on them. Compared with the ChIP-seq data alone, the ChIP-nexus peaks also have more centered distribution for both types of consensus sequences (Fig.   3A). This confirms higher specificity and resolution achieved by ChIP-nexus.
The Fab-7 insulator covers one asymmetric Insv/Elba motif bound by Elba (Aoki et al. 2012) and two symmetric sites bound by Insv (Dai et al. 2015). It was shown that Elba and Insv binding to Fab-7 contributes to its insulator function (Aoki et al. 2012;Fedotova et al. 2018). The ChIP-seq peaks at this locus are broad, making it difficult to distinguish specific binding for Elba or Insv (Fig. 3B). In contrast, the ChIP-nexus peaks are sharp and show only one high peak at the ELBA site for Elba1 and Elba2, confirming that only the ELBA site mediates direct binding for Elba1 and Elba2 here. Intriguingly, Elba1 and Elba2 peaks display strand asymmetry: Elba1 primarily shows signal coverage at the "+" strand while Elba2 shows higher coverage at the "-" strand of the CCAATAAG sequence. In contrast, Elba3 signal is relatively symmetric. Insv shows little binding to the ELBA site and instead a strong peak at the upstream symmetric site with no strand preference (Fig. 3B). Many individual Elba1/2-bound loci show a similar pattern, as exemplified by the Parp1 gene: Elba1 and Elba2 preferentially bind to the "+" and the "-" strands of the two tandem asymmetric motifs respectively (Fig. 3C).
These observations prompted us to examine binding symmetry at a global level.
To this end, we calculated the orientation index (OI) for every ChIP-nexus peak. OI value is determined as the ratio of the number of reads from the dominant strand to the total number of reads from both strands. Thus, OI value closer to 0.5 points to symmetric binding while closer to 1.0 indicates asymmetric binding. Similar to what we observed in individual loci, Elba3 and Insv bindings are symmetric at a global level as their OIs are mostly close to 0.5 (Fig. 3D). In contrast, the distribution of OIs of Elba1 and Elba2 are biased toward 1.0 ( Fig. 3D) on both types of DNA motifs. Thus, this result suggests that the ChIP-nexus assay was able to distinguish heterotrimeric binding from homo-dimer binding (illustration in Fig. 3

E).
As ChIP-nexus could improve detection of direct binding of Elba and Insv to their cognate DNA sites, we wondered whether directly bound regions have different overlapping fractions between the four factors. We performed overlapping analysis based on the ChIP-nexus data and found that the overlapping fractions look similar to those obtained from the ChIP-seq data (Supplemental Within the Elba3 ChIP-nexus peaks, we identified 724 Elba1/2-dependent and 1314 Elba1/2-independent peaks. Their genomic distributions appear similar to those from ChIP-seq peaks, but the frequency of motif occurrence was substantially increased (Supplemental Fig. 3F). Notably, the Elba1/2-independent ChIP-nexus peaks overlap more with the Insv ChIP-nexus peaks (Supplemental Fig. 3G, 76%, compared with 50% in Supplemental Fig. S2E). This result indicates that Elba3 associates with Insv more often in embryos that lack Elba1 and Elba2, and thus provides evidence that Insv is involved in recruiting Elba3.

All three Elba factors repress target gene expression in Drosophila embryo
We previously reported that Insv represses neural genes in Drosophila embryos and that Elba1, Elba2 and the Elba complex can all repress reporter gene expression in culture cells (Dai et al. 2013c;Dai et al. 2015). We asked whether Elba3 can repress transcription independent of Elba1 and Elba2. To address this, we tethered Elba3 with the Tet repressor DNA-binding domain (TetR-Elba3) and examined its activity in Drosophila S2 cells on a luciferase reporter driven by an actin enhancer and the tet Operator sites.
Remarkably, Elba3 represses reporter expression with similar efficiency as the other three proteins (Fig. 4A). Given that S2 cells lack Elba1, Elba2 and Insv, this result suggests that Elba3 is able to repress transcription when brought to target promoter by another mean than by Elba1 and Elba2.
To investigate how Elba regulates gene expression in vivo, we performed RNAseq analysis to determine gene expression changes in 2-4 hour embryos between wt and the four mutant genotypes. Using a gene set enrichment testing (see Methods) for all the targets identified from the Elba/Insv ChIP-seq peaks as a set, we found that the Elba/Insv targets associated with the top200 peaks and the peaks with the Insv/Elba motifs have a significant trend of de-repression in mutants (FDR<1E-5 for Elba1/2/3 and FDR<0.01 for Insv) but not significant in the peaks without the motifs (Fig. 4B). This result demonstrates that these targets are normally repressed by Elba/Insv in early embryo.
We then compared the genes that changed expression in these different mutants (FDR<0.2 and FC>1.3-fold). Most of the up-regulated genes in elba2 mutant are consistently up-regulated in elba1 and elba3 mutants (Supplemental Fig. S4A). The upregulated genes in elba1 mutant also became up-regulated in elba3 mutant. The overlapping pattern resembles their ChIP peak overlapping pattern (Fig. 1B), suggesting Elba3 binds to and regulates more target genes than Elba1 and Elba2. The up-regulated genes in insv mutant partially overlap with those in elba mutants, suggesting Insv and Elba regulate a subset of common targets. Notably, the down-regulated genes are fewer in all the mutants and show more random overlapping (Supplemental Fig. S4B), indicating down-regulation is an indirect effect, consistent with the conclusion that Elba and Insv act as transcriptional repressors.

ELBA and Insv associate with class I insulators
Next we sought to identify co-factors that work together with Elba and Insv. We noticed that the highly enriched motifs in the Elba or Insv ChIP peaks include the known binding sites for three insulator proteins, CP190, BEAF-32 and GAF (Fig. 5A). We performed pair-wise comparison for the Insv and the ELBA ChIP-seq peaks with the ChIP-ChIP peaks of CP190, BEAF-32, CTCF, GAF, Mod(Mdg4) and Su(Hw) (modEncode datasets). Consistent with our previous analysis (Dai et al. 2015), Insv cobind with CP190, BEAF-32, CTCF and Mod(Mdg4), and shows the least overlapping with GAF and Su(Hw). The Elba factors display similar co-occupancy patterns (Fig. 5B), suggesting that they all mainly associate with class I insulators.
To test whether Elba and Insv collaborate with other insulators in animal development, we set genetic interaction tests between elba or insv and GAF or CP190.
We crossed a null allele of GAF, Trl R85 (Bhat et al. 1996), a hypomorphic allele of GAF, Trl 13C (Farkas et al. 1994), and a null allele of CP190, CP190 P11 (Pai et al. 2004), into the background of elba1, elba2, elba3 or insv homozygous background, and scored for synthetic adult lethality (Supplemental Fig. S5A) and defects in embryonic patterning ( Fig. 5C). Animals homozygous for elba3 or elba2 in combination with heterozygous Trl R85 cannot survive to adulthood. Importantly, the lethality of elba2 homozygous with Trl R85 is fully rescued by a pBAC transgene expressing endogenous level of elba2. In the combinations with heterozygous CP190 P11 , it was the homozygous elba1 and elba3 mutants that are lethal, suggesting distinct involvement of the three Elba subunits with other insulator proteins in developmental processes. In contrast, animals of homozygous insv in combination with either Trl or CP190 alleles are viable and fertile.
A fraction of embryos mutant for elba and Trl R85 or Trl 13C also displayed severe embryonic patterning defects such as disrupted denticles and head involution, with the elba3 and Trl R85 combination showing the strongest effect (only 4% normal looking embryos) (Fig. 5C). Embryos of the elba2 and Trl R85 combination did not show patterning defect, presumably due to maternal contribution from elba2 heterozygous mothers. Indeed, when embryos were produced from homozygous elba2 and heterozygous Trl 13C females, a fraction of them showed patterning defects. Among all the elba or insv mutants, elba3 mutant shows the strongest interaction with CP190 P11 , despite overall milder severity than that with Trl (Fig. 5C, Supplemental   Fig. S5A). Importantly, elba, Trl or CP190 mutant alone did not show similar defect, suggesting the interaction is specific between elba and Trl or CP190. It was shown that insv genetically interacts with GAF in the function of Fab-7 (Fedotova et al. 2018) and the Insv protein physically interacts with CP190 (Dai et al. 2015;Fedotova et al. 2019).
However, we did not observe genetic interactions between insv and GAF or CP190 in viability and early embryonic patterning.
To detect direct association between Elba and BEAF-32, we performed coimmunoprecipitation experiments in embryos expressing HA-tagged BEAF-32 and a T2A construct that expresses Elba1, Elba2, and Elba3_V5 simultaneously. The HA antibody pulled down the V5-tagged Elba3 from embryos expressing both BEAF32_HA and ELBA_V5, but not from control embryos expressing only ELBA_V5 (Fig. 5D).
Together, we conclude that Elba and Insv associate with a subset of known insulator proteins, but the Elba factors seem to be selectively needed in early embryonic development in collaboration with other insulator proteins.

ELBA insulates adjacent transcription units
It was suggested that Class I insulators that are enriched in gene dense regions and proximal to promoters may partition closely spaced transcription units (Negre et al. 2010).
The observation that Insv and Elba bind to this class of insulators and gene dense regions prompted us to investigate the causal role of these factors in regulating densely spaced promoters. To this end, we performed PRO-seq assay from 2-4 hr wt and mutant embryos and identified real-time transcripts produced by RNA PolII. We then made de novo PROseq peak calling to define actively transcribed genes in all genotypes and determined differential expression between every pair of the two adjacent promoters flanked with an Elba or Insv ChIP peak by using the promoter reads. Remarkably, in elba mutant compared with wt (yw), there is a global reduction of expression difference between adjacent promoters (p-values adjusted by the Bonferroni correction < 0.001, Supplemental Fig. S6A-B). To test whether this global reduction is above background, we performed a Monte-Carlo simulation of expression difference between randomlychosen adjacent promoters in the genome (see Method). This confirmed that the fold change between Elba-flanked adjacent promoters is significantly higher than random. We reasoned that if the expression levels of two adjacent promoters differ more, there might be a higher need of insulation between them. Indeed, for the promoter pairs that differ more than 4 folds in their expression, the reduction of expression difference became even more apparent with p-values adjusted by the Bonferroni correction < 0.0001 (Fig. 6A,   Supplemental Fig. S6C). In contrast, for the promoter pairs that differ less than 4 folds in expression, no significant change was detected (Fig. 6B). All three types of promoterpair configuration, convergent, tandem and divergent, showed similar trend (Fig. 6). The trend of reduction is consistent when gene-body reads were used to call differential expression (Supplemental Fig. 6SD-E).
insv mutant did not show such a global effect. However, in many individual loci, we observed similar a reduction of expression difference between Insv-bound neighbour promoters in insv mutant (Fig. 6C-E), suggesting that the insulation function of Insvbound sites is present in early embryos but may be more restricted to certain gene pairs. Thus, we conclude that the Elba factors insulate transcription units to ensure proper gene expression in Drosophila embryos.

ELBA-bound elements block enhancer-promoter interaction
Insulator elements are often tested in transgenic reporters for their ability to block enhancer-promoter interaction when placed between an enhancer and a promoter. We sought to test whether regions bound by Elba or Insv have such activity. We made use of a reporter where the LacZ and white genes are controlled by both the 2xPE and iab-5 enhancers (Fig. 7A, (Zhou et al. 1996)). We selected eleven Elba and Insv bound genomic loci in which ten are promoter proximal and one locates in heterochromatin (Parp1) for enhancer blocking test (Supplemental Fig. S7). Six of these regions show strong blocking activity on the 2xPE enhancer from the lacZ gene, but weaker activity on the iab-5 enhancer from the white gene (Supplemental Fig. S7A-B, Fig. 7A). There is one region, the Lasp locus, only blocking the 2xPE enhancer (Supplemental Fig. S7A).
We also tested three loci that do not have Elba and Insv binding peaks. Two of the three regions did not show insulation activity. The third transgene that contains the Dpr8 region gave inconsistent results between two independent lines.
We focused our analysis on the element in the wg locus that gives strongest insulation. This fragment contains a ELBA-type of asymmetric motif where Elba2 shows preferential binding to the strand of CTTATTGG, similar to its preference to the ELBA site in Fab-7 (Fig. 7B). To test whether the insulation activity of this fragment is truly dependent on Elba or Insv, we crossed the transgene to the elba3 and insv mutants.
Remarkably, the lacZ staining shows strong staining in the 2xPE-controlled ventral strip in the elba3 mutant but not in the insv mutant (Fig. 7C), suggesting Elba3, but not Insv, is necessary for the insulation activity of the element.
Interestingly, the wg promoter positions back to back (divergent) with the neighbour gene Wnt4. The ratio of expression of Wnt4 versus wg decreased substantially in all the elba and insv mutants compared to wt (Fig. 7D), Suggesting Elba, probably also Insv, are required for the separation of these two genes in endogenous context.

Discussion
The BEN-domain containing proteins are conserved throughout metazoan, but our knowledge on the molecular and biological functions of this family is relatively poor.
Here we used Drosophila as an in vivo model system and investigated in depth the genomic functions of the BEN-solo proteins in early embryonic development. We show that Elba and Insv not only act as transcriptional repressors but also chromatin insulators.
Significantly, at a genome-wide level, Elba is required for separating transcription of differentially-expressed neighbour genes.
It was shown that any of the three Elba proteins including the adaptor Elba3 is needed for the Elba complex to bind to the Fab-7 insulator in vitro (Aoki et al. 2012). Our ChIP-seq analyses suggest that in vivo Elba3 is able to bind many genomic loci including Fab-7 in the absence of Elba1 and Elba2 but Elba1 and Elba2 mostly rely on each other and on Elba3 for targeting chromatin (Figure 2). The Elba3 protein does not have any known functional motif and not even a predictable DNA binding domain. One potential factor that can bring Elba3 to chromatin is Insv. Indeed, the Elba3 peaks that are independent of Elba1/2 overlap more with Insv peaks. But Insv should not be the only co-factor, as many of the Elba1/2 independent peaks do not overlap with Insv binding regions. Other insulator proteins with DNA binding property, such as BEAF-32 that can associate with Elba3 in embryo extract, may be able to bring Elba3 to the genome.
Our previous work indicated that the BEN domains of all three BEN-solo factors are able to bind the palindromic DNA motif CCAATTGG in synthetic reporter (Dai et al. 2013c;Dai et al. 2015). Work by Paul Schedl and his colleagues showed that Elba binds to the asymmetric site CCAATAAG while Insv also binds to an extended fragment that covers this site in the Fab-7 insulator using electrophoretic-mobility-shift-assay (Aoki et al. 2012;Fedotova et al. 2018). Here we used high-resolution ChIP-nexus approach and confirm that Insv and Elba factors all associate with genomic regions containing either types of DNA motifs (Figure 3). Our ChIP-nexus analyses also provided evidence that the Elba complex associates with DNA in an asymmetric configuration. Intriguingly, at some of the loci, such as the asymmetric sites in Fab-7 and Parp, Elba1 and Elba2 show + versus -strand preference. The genomic loci with asymmetric binding should represent weak association of Elba with DNA, as strong DNA binding would allow equal pull down of the subunits with the antibody against any of the three components. There are many loci showing symmetric read distribution for Elba1 and Elba2. These sites either mediate strong binding of the complex or symmetric binding of Elba1 and Elba2, e.g. as homodimers. Insv binding is always symmetric, suggesting it mostly binds to the sites as homodimers.
Despite both ELBA and Insv associating with other known insulator proteins, such as CP190, BEAF-32 and GAF ( (Dai et al. 2015), Figure 5), elba showed strong genetic interactions with CP190 and GAF in viability and early embryonic patterning while insv did not. This result could mean that Insv is less needed during embryonic stage and/or another unknown factor strongly compensates for its joint function with CP190 and GAF. In support of the first possibility, insv is required for maintaining segmentation of adult flies when the GAF sites are mutated from Fab-7 (Fedotova et al. 2018). It awaits to be identified in which other developmental contexts Insv collaborates with other insulator proteins.
Genes in the Drosophila genome are more compact than vertebrates. There may be a need to partition closely spaced transcription units and ensure enhancer specificity.
Thanks to many years of genetic studies in Drosophila, a list of individual genomic loci were identified in separating enhancers or promoters (Hagstrom et al. 1996;Barges et al. 2000;Belozerov et al. 2003;Schweinsberg et al. 2004;Sultana et al. 2011;Wood et al. 2011). Insulator proteins such as GAF, CTCF, CP190 and BEAF-32 were found to mediate these activities. It was shown that the Drosophila insulator, BEAF-32, separates closely apposed genes with a head-to-head configuration (divergent) (Yang et al. 2012).
Remarkably, our results from Pro-seq analysis suggest ELBA and Insv are required to separate linked transcription units in vivo, evidenced by highly differentially expressed neighbour genes tend to become more equally expressed in elba or insv mutant embryos.
In this case, all three types of promoter configurations, divergent, tandem and convergent, show similar requirement of Elba. In support of the endogenous function of Elba and Insv in blocking enhancers, a subset of genomic elements bound by ELBA and Insv are sufficient to block enhancer-promoter interaction in transgene assays (Figure 7).
In more recent years, new properties have been assigned to insulators, especially in chromatin architecture organization and long-range cis-element interactions. In this study, we focused more on the functions of Elba and Insv in active chromatin regions because of their enrichment in close proximity to active promoters. However, we detected enrichment of Elba and Insv in several known elements that could mediate long-range interactions, such as the homie-nhomie (Fujioka et al. 2009) and scs and scs' loci (Kellum and Schedl 1991;Blanton et al. 2003). Future studies will be needed to determine the roles of Elba and Insv in chromatin organization.

Fly strain culturing and generation of transgenes
All fly stocks were kept at 25°C. The insv mutant allele insv 23B was described previously (Duan et al. 2011a). elba mutants were created using CRISPR: transgenic flies carrying single guide RNA targeting the coding sequence of each gene were crossed into the nos-Cas9 transgenic flies. Frame-shift mutations were identified by PCR and sanger sequencing. The Trl R85 and Trl 13C alleles were kindly provided by Dr. Ana Busturia (Centro de Biología Molecular "Severo Ochoa" CSIC-UAM), the CP190 P11 allele was from Bloomington Stock Center and used for genetic interaction crosses.
For making the insulator transgenes, selected fragments were amplified and cloned into the insulator transgene backbone (kindly provided by Dr. Jumin Zhou, (Zhou et al. 1996)). The sequences of cloning oligos are provided in Supplemental table 7. All transgenic flies were created at BestGene, Inc.

Cuticle preparation
Embryos were collected and aged to 24-36 hr before dechorionization with bleach.
They were rinsed, directly mounted in 85% lactic acid, and cleared at 60°C for 3-6 hr.

In situ hybridization
The LacZ and white probes were generated by transcription from linearized pBluescript template plasmids (kindly provided by Dr. Mattias Mannervik) with T3 or T7 RNA polymerase (Thermo Fisher) and Dig RNA labelling mix (Roche) according to manufacturer. Embryos were aged and fixed with 9% formaldehyde. in-situ hybridization was performed as previously described (Qi et al. 2008). In brief, fixated embryos were permeabilized with xylene and re-hydrated as well as post-fixated with 5% formaldehyde in PBT (1x PBS, 0.1% Tween-20) for 25 min. Embryos were treated with proteinase K (4µg/ml) for 8 min, followed by another round of post-fixation for 25 min, before hybridization with the probes at 55°C for over-night in hybridization buffer (50% formamide, 5x SSC, 100µg/ml sonicated boiled ssDNA, 0.1% Tween 20). Samples were incubated with alkaline-phosphatase-labelled anti-Digoxigenin antibody (1:2000, Roche) over night at 4°C, and developed with 0.6mg/ml Nitrotetrazolium Blue chloride (NBC) and 0.3mg/ml 5-Bromo-4-chloro-3-indolyl phosphate disodium salt (BCIP). Samples were dehydrated by repeated washes in ethanol, rinsed in xylene and mounted in Permount (Fisher).

Cell culture and luciferase assay
To generate the TetR-DBD fusions with the Elba factors and Insv, the open reading frames of Elba1, Elba2 Elba3 and Insv were PCR amplified and cloned into the pAC-TetR vector. All transfections were performed using Drosophila S2-R+ cells grown in Schneider Drosophila medium containing 10% fetal calf serum. Cells were cotransfected with TetR fusion, 2xTetO-Firefly luciferase and pAc-Renilla plasmids in 96well plate using the Effectene Transfection kit (Qiagene). Luciferase assays were performed and measured as previously described (Dai et al. 2013b) and using the Dual Luciferase Assay System (Promega). Expression was calculated as the ratio between the firefly and Renilla luciferase activities.
Mouse or rabbit IgG was used as a control in IP. To minimize unspecific signal from IgG heavy chains, light chain specific secondary antibodies of Goat Anti-Mouse IgG, (1:40000 Jackson ImmunoResearch) and Goat Anti-Rabbit IgG, (1:40000 Jackson ImmunoResearch) were used, and the signals were developed with ECL Plus reagent (GE Healthcare).

ChIP-seq assay and peak calling
Chromatin immunoprecipitation (ChIP) was done mostly as previously described (Dai et al. 2013c) with one modification. In the fixation step, 2.5 mM DSG (Di(Nsuccinimidyl) glutarate) (Sigma) was added to the fixation buffer containing 1.8% of formaldehyde. The rest of ChIP steps were unchanged. The Elba antisera were tested in ChIP previously (Aoki et al. 2014) and kindly provided by Dr. Paul Schedl (Princeton University). For each ChIP reaction, 5ul of antibody and 50 ul of embryos were used.
ChIP-seq libraries were made using the NEBNext Ultra™ II DNA Library Prep Kit.
The ChIP-seq samples were mapped to the Drosophila melanogaster (dm3) genome assembly using Bowtie2 with the default parameters, after the adaptor trimming by Trimmomatic. The uniquely mapped reads with a mapping quality MAPQ > 20 were used for further analysis. For all ChIP-seq samples, we generated coverage tracks at 1-nt resolution and normalized to the library sizes to give read per million (RPM) in "bigwig" format. We further generated the coverage differential tracks for four factors by subtracting mutant from wt coverage (log2 wt/mutant).
For each of the four factors, the peak calling was performed by the ChIP-seq reads of wt or a mutant condition to its own mutant ChIP or IgG or Input. The peaks were called using MACS2 (Zhang et al. 2008) with default parameters and the confident peaks were determined by an FDR < 1%. The peaks overlapped with Drosophila blacklist were also removed. Peak overlap analysis was performed by "mergePeaks" function in Homer2 package with the default parameters.
The de novo motif search was performed for all the called peaks for each factor by MEME-ChIP (Machanick and Bailey 2011). We extended 500 bps of the summits of the called peaks for each factor in each direction, and run MEME-ChIP to search for 5-15 nt motifs in the central regions (100 nucleotides) using default parameters. The summits of the called peaks for each factor were extended by 500 nucleotides in each direction, and MEME-chip was run to search for 5-15 nt motifs in the central regions (100 nucleotides) using default parameters.
Pairwise comparison was done for the ChIP-seq peaks of Elba and Insv factors with the modEncode insulator datasets (Negre et al. 2010) that include ChIP-ChIP data for CP190, BEAF32, CTCT, GAF, Mod(Mdg4) and Su(Hw). As the ChIP-seq peaks are generally narrower than ChIP-chip regions, we used the summit of ChIP-ChIP regions with 100 nt extension on each side for the overlapping analysis. The range of 50 nt distance between the two peak summits was used. The overlap fraction of set1 and set2 peaks was calculated by #overlapped peaks divided by minimum of #set1 peaks and #set2 peaks.

ChIP-nexus and analysis
ChIP-nexus was performed following the protocol step by step described previously (He et al. 2015). 20 ul of each antibody and 200 ul of embryos were used in each ChIP-nexus reaction. All the ChIP libraries were sequenced on the Illumina Hiseq2500 platform with 1x50 bp SR configuration.
Before aligning the ChIP-nexus reads to the genome, the 5' fixed barcode (1)(2)(3)(4)(5) was first removed and the random 4nt barcode was retained for each read. After the 3' adaptor trimming by Trimmomatic, the sequencing reads were collapsed to only include unique reads. The random 4nt barcode was further removed and the reads with at least 22nt were retained for mapping. We mapped the reads using bowtie with the parameter setting "-k 1 -m 1 -v 2 --best --strata". Similar to ChIP-seq, we generated normalized coverage tracks separately for each strand in "bigwig" format. Similar to the ChIP-seq data, the ChIP-nexus peak calling was performed by MACS2 using the default parameter.
To obtain highly confident binding sites for each factor, we required the binding sites to be called by both ChIP-seq and ChIP-nexus and set a highly stringent cut-off (FDR < 1E-10 for Elba1, Elba3 and Insv, and FDR <1E-5 for Elba2).
To examine asymmetry of the binding sites, we calculated an orientation index (OI) for each binding site by ChIP-nexus for each factor. OI was defined by maximum #reads between two strands divided by sum of reads of two strands, max(forward,reverse)/sum(forward,reverse), ranging from 0.5 to 1.

RNA-seq and analysis
Total RNA was extracted from stages 2-4 hr embryos using Trizol reagent (Invitrogen). RNA quality of three biological replicates was tested by Agilent Bioanalyzer. RNA-seq libraries were made using the Illumina Truseq Total RNA library Prep Kit LT. Sequencing was performed on the Illumina Hiseq2500 platform.
After trimming the adaptor sequences using Trimmomatic, for each factor, the RNA-seq reads from the replicated wild type (x3) and mutant samples (x3) were mapped to the Drosophila melanogaster (dm3) genome assembly genome assembly using HISAT2. RNA-seq signal was normalized by the TMM method implemented in the Limma Bioconductor library (Ritchie et al. 2015). Gene annotation was obtained from the FlyBase dm3 gene annotation. Differentially expressed mRNAs between BEN factors mutants versus wild type were identified, and FDR (Benjamini-Hochberg) was estimated, using Limma.
To test whether a set of genes are significantly changed (up-or down-regulated as set) amongst the differentially expressed (DE) genes from wild type and mutant RNA-seq data, gene set enrichment testing function "camera" in the R limma package was used [Ritchie ME et al, 2015]. It is a ranking based gene set test accounting for inter-gene correlation, to test whether the called peaks by ChIP-seq, top200 peaks, all peaks with or without insv motifs, are significantly changed as a set.

PRO-seq assay and analysis
The PRO-seq procedure was performed according to the previously reported method (Kwak et al. 2013). Embryos were collected from yw, elba, and insv mutants and aged for 3-4 hr. After the run-on reaction, Biotin-labelled RNAs were purified, enriched and cloned into cDNA libraries. In the PCR amplification step, 14 cycles were used to enrich the cDNAs for sequencing. Barcoded libraries were pooled and sequenced on the Illumina Hiseq2500 platform with 1x50 bp SR configuration.
The adaptors were first trimmed from the sequencing reads by cutadapt software and the reads with at least 15 nt were retained. We then removed reads that mapped to rRNAs and the remaining reads were further mapped to the Drosophila melanogaster (dm3) genome assembly using BWA with the default parameters. The PRO-seq normalized coverage tracks with separate strands were generated for each factor. To detect de novo transcripts from PRO-seq, we combined all genotypes and adapted the Homer2 (Heinz et al. 2010) GRO-seq transcript identification method using a parameter setting "findPeaks -style groseq -tssFold 4 -bodyFold 3". The pausing regions (promoter region) were defined from the de novo transcript starts to 200nt downstream, and gene body regions were defined from 400nt downstream to the end of the de novo transcripts.
The de novo transcripts having a promoter expression of greater than 1 transcript per million (TPM) were retained for further analysis.

Analysis of Elba/Insv factors acting as insulators
For each Elba/Insv binding site in the high confident binding set, which was called by both ChIP-seq and ChIP-nexus (see Method above), we looked for the adjacent upstream and downstream PRO-seq promoter pair and calculated absolute differential expression between them (abs log2FC adjacent pair). We classified adjacent promoter pairs flanking an Elba/Insv peak into 3 types: convergent, divergent, and tandem.
To test whether the change of differential expression between the adjacent pairs is above background, we performed a Monte-Carlo simulation. We randomly located the same number of regions with the same length as the Elba/Insv ChIP peaks in the same chromosome and repeated the random selection and calculation 2000 times. P-values were calculated by dividing the number of instances that show bigger fold change between the random adjacent genes than that between the Elba/Insv bound genes by 2000 iterations. These were done separately for convergent, divergent, and tandem pairs in each of wt and four mutants.     (A) Comparison of ChIP-seq and ChIP-nexus shows that ChIP-nexus has a higher frequency of motif occurrence and more centered motif distribution around the peak summits. The motif occurrence is centered at the peak summits (x-axis) and mean motif coverage in motif per base is on y-axis. (B) The screenshot of the Fab-7 region shows broad ChIP-seq peaks and sharp ChIP-nexus peaks. Note: asymmetric binding of Elba1 and Elba2 versus symmetric binding of Elba3 and Insv. Elba1 prefers the + strand and Elba2 the -strand of the CCAATAAG motif. (C) Another example locus, Parp, exhibits a similar biased strand asymmetry (OI near 1) for Elba1/2. (D) The orientation indexes (OI) for the peaks with the symmetric or asymmetric motifs, ranging from 0.5-1, were calculated from the ChIP-nexus reads of the four factors (see Method). The distribution of orientation indexes (OI) shows that Elba1/2 display a higher OI tendency than Elba3 and Insv. (E) Illustration of how ChIP-nexus can capture symmetric versus asymmetric binding patterns by homodimers versus hetero-trimeric complex.  Insv is also present in the complex of BEAF-32_HA/Elba_V5 (F).  A screenshot of the divergent pair Wnt4 and wg with an Elba/Insv peak proximal to the wg promoter. The ratio of PRO-seq promoter expression of Wnt versus wg decreased in the elba and insv mutants compared to the wt.