H4K16ac activates the transcription of transposable elements and contributes to their cis-regulatory function

Mammalian genomes harbor abundant transposable elements (TEs) and their remnants, with numerous epigenetic repression mechanisms enacted to silence TE transcription. However, TEs are upregulated during early development, neuronal lineage, and cancers, although the epigenetic factors contributing to the transcription of TEs have yet to be fully elucidated. Here, we demonstrate that the male-specific lethal (MSL)-complex-mediated histone H4 acetylation at lysine 16 (H4K16ac) is enriched at TEs in human embryonic stem cells (hESCs) and cancer cells. This in turn activates transcription of subsets of full-length long interspersed nuclear elements (LINE1s, L1s) and endogenous retrovirus (ERV) long terminal repeats (LTRs). Furthermore, we show that the H4K16ac-marked L1 and LTR subfamilies display enhancer-like functions and are enriched in genomic locations with chromatin features associated with active enhancers. Importantly, such regions often reside at boundaries of topologically associated domains and loop with genes. CRISPR-based epigenetic perturbation and genetic deletion of L1s reveal that H4K16ac-marked L1s and LTRs regulate the expression of genes in cis. Overall, TEs enriched with H4K16ac contribute to the cis-regulatory landscape at specific genomic locations by maintaining an active chromatin landscape at TEs.

Mammalian genomes harbor abundant transposable elements (TEs) and their remnants, with numerous epigenetic repression mechanisms enacted to silence TE transcription. However, TEs are upregulated during early development, neuronal lineage, and cancers, although the epigenetic factors contributing to the transcription of TEs have yet to be fully elucidated. Here, we demonstrate that the male-specific lethal (MSL)-complex-mediated histone H4 acetylation at lysine 16 (H4K16ac) is enriched at TEs in human embryonic stem cells (hESCs) and cancer cells. This in turn activates transcription of subsets of full-length long interspersed nuclear elements (LINE1s, L1s) and endogenous retrovirus (ERV) long terminal repeats (LTRs). Furthermore, we show that the H4K16ac-marked L1 and LTR subfamilies display enhancer-like functions and are enriched in genomic locations with chromatin features associated with active enhancers. Importantly, such regions often reside at boundaries of topologically associated domains and loop with genes. CRISPR-based epigenetic perturbation and genetic deletion of L1s reveal that H4K16ac-marked L1s and LTRs regulate the expression of genes in cis. Overall, TEs enriched with H4K16ac contribute to the cis-regulatory landscape at specific genomic locations by maintaining an active chromatin landscape at TEs.

Results
We aimed to investigate the role of MSL-mediated H4K16ac in human genome regulation. We performed two to three replicates of cleavage under targets and tagmentation (CUT&Tag) 39 in human embryonic stem cells (H9-hESCs) for histone modifications associated with active regulatory elements (H3K27ac, H3K122ac, H4K12ac, H4K16ac, monomethylated H3 K4 (H3K4me1) and H3K4me3), polycomb repressed domains (H3K27me3) and heterochromatin (H3K9me3) (Extended Data Fig. 1a and Supplementary Table 1). We evaluated overall data quality and similarity among our CUT&Tag replicates (Extended Data Fig. 1a). We generated peaks by merging the replicates, and we used reproducible peaks in at least two replicates to validate our findings (Extended Data Figs. 1 and 2 and Supplementary Tables 2 and 3). To prevent the same reads from mapping to multiple regions in the repeat elements, uniquely mapped CUT&Tag sequencing reads were used for the analyses. Except for the analysis in Figure 5e, we used multi-mapping reads for L1 subfamily-level enrichment analysis.

H4K16ac and H3K122ac are enriched at TEs
Chromatin-state discovery and genome annotation analysis (Chrom-HMM) 40 of CUT&Tag peaks revealed the expected enrichment of H3K4me1, H3K4me3, H3K27ac and H4K12ac at chromatin features associated with active transcription, including active promoters and enhancers. Intriguingly, H4K16ac and H3K122ac, but not H3K27ac or H4K12ac, were enriched at heterochromatin, insulator and transcription elongation states (Fig. 1a). Further analysis revealed specific enrichment of H4K16ac and H3K122ac at the 5′ UTR of full-length L1s and ERV/ LTR elements, compared with gene promoters (Fig. 1b-e). H4K16ac was also detected at gene bodies, consistent with previous findings showing its role in transcription elongation 41 (Fig. 1a,b). Interestingly, however, H4K16ac shows a very low level of enrichment at the gene promoters ( Fig. 1b, Table 3), similar to recent chromatin immunoprecipitation and sequencing (ChIP-seq) data in human cell lines 34 . Fifty-one percent of full-length L1s (n = 10,000) have reproducible H4K16ac peaks (Supplementary Table 3), and although H4K16ac and H3K9me3 are enriched at L1s (Fig. 1b), they are anti-correlated at L1 subfamilies (Extended Data Fig. 1c). Less than 10% of the H4K16ac peaks at TEs overlapped with H3K9me3 (Extended Data Fig. 1b). The H3K9me3 level was also lower at H4K16ac peaks that overlapped with L1 5′ UTRs, and the H4K16ac level was lower at L1s with H3K9me3 peaks (Extended Data Fig. 1c). Reanalysis of public ChIP-seq datasets showed enrichment of H4K16ac at the 5′ UTRs of L1s in human brain tissue (Extended Data Fig. 3a) 42 . H4K16ac is also enriched at the 5′ UTRs of L1s in neuroblastoma (SH-SY5Y), erythroleukemia (K562), and transformed dermal fibroblast (TDF) cell lines and mESCs (Extended Data Fig. 3b-e). This analysis suggests that H4K16ac enrichment at TEs is not unique to hESCs, but is conserved in cancer cells, human brain tissue and mice. Although H3K27ac and most species-specific DNase hypersensitive sites (which are on accessible chromatin) are occupied by remnants of TEs 11,12 , suggesting that TEs have been co-opted, becoming tissue-and species-specific cis-regulatory elements (CREs). TEs are transiently upregulated during early development 13 , in the neuronal lineage 14 , and in cancer 1 . The ERV superfamily of LTRs (LTR/ERV) and Alu family of short interspersed nuclear elements (SINE/Alu) often exhibit chromatin features associated with active CREs [15][16][17] and function either as enhancers to regulate genes in cis or act as alternative promoters 15 . The 5′ UTR of L1 repeats are also bound by tissue-specific TFs, are enriched with chromatin features that are associated with CREs 18 , and can function as nuclear noncoding RNAs 13,19 ; still, it is unclear whether they can act as CREs. Although TEs have been suggested to contribute to nearly one-quarter of the regulatory epigenome 10,13,20,21 , the chromatin-based mechanisms contributing to regulatory activity in the vast number of TEs are unclear.

d,e and Supplementary
Chromatin features, such as a combination of H3K4me1 and H3K27ac, bidirectional transcription of enhancer RNAs (eRNAs), and accessible chromatin (determined, for example, using the assay for transposase-accessible chromatin with sequencing (ATAC-seq)) are widely used to predict enhancer activity, including for TE-derived enhancers 17,[22][23][24][25] . Yet the level of H3K27ac does not correlate with and is dispensable for enhancer activity, suggesting that other uncharacterized chromatin features could contribute to regulatory activity [26][27][28] . H4K16ac and H3K122ac are particularly interesting among many histone acetylations, because they alter chromatin structure directly and increase transcription in vitro 29,30 . H4K16ac and H3K122ac are enriched at enhancers, and they identify new repertoires of active enhancers that lack detectable H3K27ac 27,31 . However, it is challenging to decipher the causal role of specific histone acetylations, as many acetylations, including H3K27ac, are catalyzed by multiple lysine acetyltransferases (KATs), and KATs also have a broad substrate specificity. H4K16ac is an exception, as it is catalyzed explicitly by KAT8 when associated with the MSL complex.
Nevertheless, when KAT8 is associated with non-specific lethal (NSL), it catalyzes H4K5ac, H4K8ac and H4K12ac (refs. 32-35). In mouse embryonic stem cells (mESCs), KAT8 and H4K16ac mark active enhancers and promoters of genes that maintain the identity of the mESCs 27,36 . Loss-of-function mutations in KAT8 or MSL3 lead to reduced H4K16ac levels and are known to cause neurodevelopmental disorders 37,38 . However, the mechanism through which KAT8 containing MSL complex-mediated acetylation of H4K16 contributes to genome regulation during normal development is less clear, especially in the human genome.
Here, we show that H4K16ac is enriched at L1s and LTRs and is depleted at gene promoters, and that H4K16ac regulates transcription across the L1 and ERV LTR superfamily of TEs. TEs marked with acetylations loop with the neighboring genes and regulate their expression. CRISPR interference and genetic deletion of H4K16ac-marked (H4K16ac + ) TEs leads to the downregulation of genes in cis, demonstrating that H4K16ac + TEs function as enhancers. Furthermore, depletion of H4K16ac is sufficient for downregulation of L1 and LTRs and genes  H4K12ac were detected at some L1 5′ UTRs, they were enriched at a much higher level at the promoters of genes (Fig. 1b). Interestingly, along with H4K16ac and H3K122ac, L1 5′ UTRs were also enriched with H3K4me1 but were depleted of H3K4me3 (Fig. 1d), suggesting that these elements could function as CREs.
H4K16ac + L1 5′ UTRs are enriched with enhancer features LTR subfamilies function as enhancers to regulate genes in a tissue-specific manner in humans and mice (reviewed in ref. 16). LTR5and LTR7-related subfamilies function as enhancers in hESCs 21,43,44   in cis is not known. Here we found that H4K16ac is particularly enriched at the 5′ UTR of full-length L1 subfamilies and correlates with chromatin features associated with active enhancers, such as H3K27ac, H3K4me1, BRD4 and ATAC-seq signal (Fig. 2a,d).
Interestingly, not all L1 subfamilies are enriched with active enhancer features at the same level. The evolutionarily younger L1s (L1HS, L1PA2 and L1PA3, 3-12.5 million years) are enriched with active enhancer features, including H4K16ac. These L1s are known to be transcriptionally active. Despite evolutionarily older L1s being transcriptionally inactive, the 5′ UTRs of these L1 subfamilies (L1PA7-L1PA16, 31-80 million years) are enriched with H4K16ac, along with other active enhancer features, but H3K9me3 is less enriched (Fig. 2a), suggesting that the 5′ UTRs of older full-length L1s have been co-opted to function as functional regulatory elements.
Analysis of genome-wide enhancer activity data (using selftranscribing active regulatory region sequencing, or STARR-seq), generated by ENCODE 45 from neuroblastoma (SH-SY5Y) and erythroleukemia (K562) cell lines, showed enhancer activity specifically at the 5′ UTR of L1s (Fig. 2b) in a cell-type-specific manner. The presence of active enhancer chromatin features (Fig. 2a) and the ability of L1 5′ UTR to drive transcription of the minimal promoter in an in vitro enhancer reporter assay (Fig. 2b) further confirmed that 5′ UTRs of full-length L1s could function as transcriptional enhancers.

LTRs with H4K16ac process higher enhancer activity
Our data show that, apart from the LTR5 and LTR7 elements that show clear enrichment of active enhancer chromatin features, some of the subfamilies of LTR16 and LTR33 may also serve as enhancers in hESCs, because they are enriched with H4K16ac and other active enhancer chromatin features (Fig. 2c,d). Interestingly, analysis of STARR-seq data from K562 and SH-SY5Y cells revealed that H4K16ac + LTRs in these cell lines show significantly higher enhancer activity than LTRs that overlap with only H3K4me1 or H3K27ac peaks (Fig. 2e,f and Extended Data Fig. 4a). These results further support the notion that H4K16ac + LTRs function as enhancers. The rest of the LTR and Alu families are not likely to act as enhancers in hESCs, as they lack known enhancer chromatin features (Extended Data Fig. 4b).

TEs marked with H4K16ac are bound by looping factors
We aimed to identify TFs bound at H4K16ac + , H3K27ac + and H3K122ac + TEs using TF ChIP-seq data from ENCODE. Expectedly, EP300 is enriched at LTRs marked with H3K27ac (Fig. 3a). YY1 is enriched at L1s marked with all three marks, supporting the known role of YY1 in activating L1 transcription 46 . CTCF and RAD21 showed higher enrichment at H4K16ac + and H3K122ac + L1s and LTRs than at H3K27ac + L1s and LTRs. MYC and KDM1A were depleted at H4K16ac + and H3K27ac + L1s. These observations are consistent with previous reports showing the role of CTCF and RAD21 in activating L1 transcription 47,48 , and of MYC and KDM1A in repressing L1 transcription 49 . SP1, TCF12 and NANOG binding was also specifically enriched at H3K27ac + L1 and LTRs, suggesting that they have a role in transcription at these elements.

L1s and LTRs with acetylation marks loop with genes
YY1, enriched at acetylated L1s, functions as a looping factor that facilitates interaction between enhancers and promoters 50 . Compared with L1s with acetylation marks, LTRs and Alu elements marked with histone acetylations are enriched with USF1, REST and a looping factor ZNF143 ( Fig. 3a and Extended Data Fig. 5). Meta-analysis confirmed the enrichment of CTCF, RAD21 and YY1 at both H3K27ac + and H4K16ac + L1 5′ UTRs and LTRs (Fig. 3b). Analysis of published Hi-C data revealed that, compared with TEs that lack acetylation marks, TEs with such marks are enriched at topologically associated domain (TAD) borders (Fig. 3c). Moreover, H4K16ac levels are relatively higher at TEs overlapping with the TAD borders than at TEs that do not overlap with TAD borders (Fig. 3d,e). Furthermore, to identify whether TEs with histone acetylations loop with genes, we called significant loops from publicly available micro-C data from H1 hESCs 51 . This revealed that the fraction of TEs (L1, LTRs and Alu elements) with acetylation marks that form chromatin loops with genes is significantly higher than the fraction of TEs lacking these marks (Fig. 3f,g). These analyses provide evidence that transcribed TEs enriched with histone acetylation marks could contribute to three-dimensional (3D) chromatin folding and looping interactions with genes.

H4K16ac + LTR/HERVs act as enhancers
We aimed to use CRISPR interference (CRISPRi) to investigate the role of H4K16ac + TEs in regulating genes in cis, by recruiting KRAB repressor domain (dCAS9-KRAB) to TEs. We performed CRISPRi for individual LTRs of human endogenous retrovirus (HERV) or L1 5′ UTRs by co-transfecting two independent guide RNAs that recruit dCAS9-KRAB to specific TEs enriched with H4K16ac in hESCs. We then performed quantitative reverse transcription PCR (RT-qPCR) for nearby expressed genes or genes that show the looping interaction in the RAD21-HiChIP data (Fig. 4a) 52 . CRISPRi targeting an H4K16ac + LTR7/ HERV-H (HERV type H family) element located ~50 kb away from PEX1 and ~30 kb from the GATAD1 promoter led to downregulation of PEX1, but not GATAD1 (Fig. 4b,d). CRISPRi targeting another H4K16ac + LTR7/ HERV-H element located ~50 kb away from the NUS1 promoter led to the downregulation of the NUS1, but not of GOPC (Fig. 4c,d). However, CRISPRi targeting H4K16ac + LTRs/HERV-L-18 (HERV type L-18 family) and HERV-L-18 int (internal portion of HERV-L-18) that are close to TAD borders (Figs. 3d and 4d) did not show downregulation of nearby genes ZC3H15 and ODF2L, suggesting that some but not all H4K16ac + HERV/LTR loci function as enhancers. However, it is possible that such H4K16ac + TEs could contribute to LTR/HERV transcription and 3D genome folding (Fig. 3c-e) 53 .

H4K16ac + L1 5′ UTRs function as enhancers
We next focused on L1s and asked whether L1 5′ UTRs enriched with H4K16ac regulate genes in cis by performing CRISPRi for H4K16ac + L1 5′ UTRs, together with two L1 5′ UTRs that lack detectable histone acetylation marks. CRISPRi for the H4K16ac + 5′ UTR of an L1PA10 located ~110 kb upstream of USP38 led to specific downregulation of USP38  each panel. e, Violin plots showing STARR-seq signal from K562 cells (n = 2 biological replicates, signal normalized to input) across LTRs intersecting H3K4me1 peaks; H4K16ac but not H3K4me1 peaks; H3K4me1 and H4K16ac peaks; and H3K4me1 and both H3K27ac and H4K16ac peaks. f, Like e, but for SH-SY5Y cells (n = 2 biological replicates, signal normalized to control) across LTRs that intersect with no H4K16ac or H3K27ac peaks (n = 40,000 LTRs); H3K27ac but not H4K16ac peaks (n = 22,447 LTRs); H4K16ac but not H3K27ac peaks (n = 35,349 LTRs); and H4K16ac and H3K27ac peaks (n = 15,602 LTRs). In all box plots, center lines indicate the median, bounds indicate the 25th and 75th percentiles, and whisker limits show 1.5 × interquartile range; P values for all the violin and box plots were calculated using the pairwise two-sided multicomparison Dunn test for post hoc testing, following the Kruskal-Wallis test with Bonferroni correction. Article https://doi.org/10.1038/s41594-023-01016-5 but not other nearby genes GAB1 and SMARCA5. Notably, CRISPRi for two H4K16ac -5′ UTRs of L1s, located ~30 kb and ~85 kb from the USP38 promoter, led to no change in the USP38 transcript level, showing the specificity of the H4K16ac + L1PA10 element in regulating USP38 (Fig. 4e,j). Similarly, CRISPRi for the H4K16ac + 5′ UTR of L1PA10, located ~270 kb from the TANC2 promoter, led to downregulation of TANC2,  but not CYB561 (Fig. 4f,j). CRISPRi for H4K16ac + L1PA7, located ~24 kb from the COMMD10 promoter, also led to a significant downregulation of COMMD10, but not the nearby gene SEM6A (Fig. 4g,j). RAD21-HiChIP data and the micro-C analysis revealed significant looping interactions between MOXD1 and STX7 genes with the H4K16ac + L1PA8, located ~100 kb away from the MOXD1 promoter (Fig. 4h,j). CRISPRi for the 5′ UTR of this L1PA8 led to significant downregulation of both MOXD1 and STX7. However, the expression of ENPP1, which does not loop 6.5 kb -1.

Fig. 3 | H4K16ac + L1 and LTRs are enriched at TAD borders and loops with
genes. a, Heatmap shows the difference/sum (details in Methods) ratio for observed and expected occurrences of TF-binding sites in H4K16ac, H3K27ac and H3K122ac peaks at the 5′ UTR of L1, ERV/LTR or SINE/Alu, over the random background. Looping factors that are known to be enriched at enhancerpromoter loops are in bold; a complete list of TFs is in Extended Data Figure 6. b, Average type summary plots showing the mean signal distribution (fold change/control) of YY1 (green), RAD21 (red), and CTCF (blue) at LTRs (top) and full-length L1 (>5 kb, bottom) that overlaps with H3K27ac (left) or H4K16ac (right). c, Violin plot showing the distance to TAD borders (y axis, log 10 bins) for LTR (H4K16ac + #10258, H3K27ac + #8063, H3K122ac + #17132), Alu element (H4K16ac + #7394, H3K27ac + #4659, H3K122ac + #61312) and L1 (H4K16ac + #892, H3K27ac + #550, H3K122ac + #1439) marked with H3K27ac, H4K16ac or H3K122ac, and for TEs that lack these marks (LTR #31678, Alu #31589 and L1 #452). P values for the violin plots were calculated by Mann-Whitney U test. d, Example UCSC Genome Browser tracks showing H4K16ac, H3K122ac and H3K27ac signals at TAD borders (arrow marks) (micro-C data from H9 hESC). CRISPRi was used for some of these HERV/LTRs for validation (Fig. 4d). e, Average type summary plot depicting IgG-normalized H4K16ac signal (CPM), with standard error (shaded area), at the LTRs overlapping the TAD border (blue) and LTRs elsewhere in the genome (red). f, Bar graph showing the percentage of H4K16ac + , H3K27ac + and H3K122ac + TEs and TEs that lack these marks (full-length L1, LTR and Alu) that contact genes through chromatin loops (P values were calculated by Fisher's exact test). (# same as in c). g, Aggregate peak analysis (APA) plots for H4K16ac + and H3K27ac + LTRs, Alu elements, and L1s that contact genes through loops. The number of contacts (in thousands) are shown in the scale bars. In all box plots, center lines indicate the median, bounds indicate the 25th and 75th percentiles, and whisker limits show 1.5 × interquartile range. Article https://doi.org/10.1038/s41594-023-01016-5 with this L1, was not altered (Fig. 4h,j), demonstrating the specific cis-regulatory function of these L1s.
To further confirm that H4K16ac + L1 5′ UTRs regulate genes in cis, we used CRISPR-CAS9 to delete full-length L1 elements in H1 hESCs. Owing to the difficulty of specific deletion of L1 5′ UTRs, we nucleofected the cells with pairs of synthetic guide RNAs along with CAS9 (ribonucleocomplex) that target the flanking region of four full-length L1s (~7 kb deletions). We generated two independent clonal lines with heterozygous deletions for L1PA10 and one clone for L1PA7; both are H4K16ac + and are located upstream of USP38 ( Fig. 4e and Extended Data Fig. 6a,b). In accordance with CRISPRi data, RT-qPCR data showed that the deletion of L1PA10 and L1PA7 led to the downregulation of USP38, but not other nearby genes that were tested, namely GAB1 and SMARCA5 (Fig. 4k). For deletion of L1s located at the MOXD1 and RLN2 loci (Fig. 4g,i), we nucleofected gRNA-CAS9 ribonucleoprotein complexes and used two independent pools of hESCs that showed ~50% deletion efficiency (Extended Data Fig. 6c). Although CRISPRi for L1PA8 resulted in the downregulation of both MOXD1 and STX7, genetic deletion led to specific downregulation of MOXD1, but not STX7 and ENPP1 (Fig. 4h,k). Deletion of another H4K16ac + L1PA7 located downstream of RLN2, ~12 kb away from the RLN2 promoter, led to the downregulation of RLN2 but not a nearby gene PLGRKT (Fig. 4i,k). Overall, CRISPRi and genetic deletion experiments confirmed that H4K16ac + L1s and LTRs are involved in regulation of transcription of genes in cis.
RNA-seq data analysis from MSL3-KO TDFs showed significant downregulation of L1 and LTR transcripts (Extended Data Fig. 7b,c). Notably, H4K16ac + L1s, but not H4K16ac -L1s, are significantly downregulated in MSL1-KO TDFs (Extended Data Fig. 7b). All these results confirm the direct role of MSL mediated H4K16ac in the transcriptional activation of L1. MSL3-KD RNA-seq analysis in hESCs showed that pluripotency and differentiation-associated genes were unaffected (Extended Data Fig. 8a,b). However, H4K16ac + genes were more affected than were H4K16acgenes (Extended Data Fig. 8c). Further analysis of L1s and LTRs showed significant downregulation of both human-specific (L1HS) and primate-specific (L1PA2 to L1PA16) full-length L1 and LTR subfamily transcripts (Fig. 5b,g). L1, LTRs, HERV-K and HERV-L transcripts and protein-coding genes also show small but significant downregulation in MSL3-KD cells (Fig. 5e,f and Extended Data Fig. 8d).

MSL/H4K16ac at TEs maintain active cis-regulatory landscape
H4K16ac causes chromatin decompaction in vitro, and depletion of H4K16ac has been shown to reduce chromatin accessibility 29,55 . Therefore, we asked whether the lack of H4K16ac leads to altered accessibility at TEs. ATAC-seq data showed a specific reduction in accessible DNA at the 5′ UTR of L1s in MSL3-depleted hESCs (Fig. 5g). In particular, evolutionarily younger L1s show a decrease in DNA accessibility, accompanied by reduced transcriptional activity at these elements (Fig. 5g).
Genes closer to H4K16ac + L1 and H4K16ac + LTRs are significantly highly expressed compared with genes farther away from these L1s and LTRs. By contrast, genes closer to H4K16ac -L1 and H4K16ac -LTRs show significantly lower expression levels than farther genes (Fig. 6a,b). Next, we asked whether depletion of MSL/H4K16ac at L1 and LTRs affects the expression of genes located near these TEs marked with H4K16ac + TEs. MSL3 depletion led to a small but significant downregulation of many transcripts (n = 3,312) closer (<10 kb) to H4K16ac + L1s (Fig. 6c). Similarly, many transcripts that are closer (<10 and <25 kb) to H4K16ac + LTRs are significantly downregulated compared to transcripts that are 25 to 50 kb away (Fig. 6d).
Overall, our results confirm the role of MSL/H4K16ac at L1 and LTRs in transcriptional activation of TEs (Fig. 5b,d-f) and in regulating genes that they are associated in linear distance or 3D space (Figs. 4 and 6a-e). Therefore, we conclude that MSL complex-mediated acetylation of H4K16 leads to the opening of chromatin structure and increased transcriptional activity at L1 and LTRs in a cell-type-specific manner. The permissive local chromatin environment at H4K16ac + TEs shapes the cis-regulatory landscape across the mammalian genome (Fig. 6e).

Discussion
TEs are repressed by many epigenetic pathways, such as DNA methylation, H3K9me3, KRAB-ZNF, HUSH complex and piwi-interacting RNA (piRNAs). We have discovered that the MSL-H4K16ac axis functions as  b and c, the genome browser track shows CUT&Tag data at L1PA10 at the TANC2 locus, L1PA7 at the COMMD10 locus, L1PA7 at the MOXD1 locus, L1PA10 and L1PA7 at the USP38 locus, and L1PA7 at the RLN2 locus. L1PA2 and L1MA27 at the USP38 locus, which lack histone acetylation marks, were used as controls. j, Same as d, but for H4K16ac + or H4K16acputative target genes for L1s (L1PA2 and L1MA2). TANC2, COMMD10, MOXD1, STX7, and USP38 were selected as putative target genes, along with CYB561, SEMA10A, ENPP1, GAB1, and SMARCA5 were selected as putative non-targets. k, Same as j, but RT-qPCR was done upon CRISPR-CAS9-mediated deletion of full-length L1. Two independent clones for L1PA10 (H4K16ac + ) and one for L1PA7 (H4K16ac + ), L1PA2 (H4K16ac -) and L1MA2 (H4K16ac -) located at the upstream of USP38 were tested, and for L1PA7 located at MOXD1 and RLN2, the pools of cells were tested. For all RT-qPCR experiments, data are shown as mean ± s.d. from n = 3 independent experiments; P values are from unpaired t-test with Welch correction; the two-stage step-up (Benjamini, Krieger and Yekutieli) method was used, and the false-discovery rate (FDR) was 1.00% for multiple comparisons. n.s., not significant.      Roadmap epigenomics data have shown that TEs are depleted of H3K27ac and accessible chromatin; only 3% of TE bases are annotated with active regulatory chromHMM states, compared with 32% of promoter bases 56 . Despite that, TEs contribute up to 40% of TF-binding sites; hence, TEs have been proposed to contribute to species-and tissue-specific rewiring of gene regulatory networks [57][58][59] . This suggests that an unknown chromatin pathway could contribute to the enhancer activity of TEs in a cell-type-or species-specific manner, which could be independent of H3K27ac. Our CUT&Tag data show that the level of H3K27ac is much higher at genes and promoters than at TEs. However, H4K16ac is enriched explicitly at the L1 5′ UTRs, along with several other chromatin features associated with active enhancers. We now demonstrate that L1 5′ UTRs marked with H4K16ac along or together with H3K122ac and H3K27ac function as enhancers to regulate the expression of genes in cis. Although L1s are expressed at higher levels during early development, including stem cells, they are also upregulated in . Genes close to L1 that lack detectable H4K16ac peaks (H4K16ac -) are in TDF cells (right). X-axis shows the distance from the TEs. b, Like a, but for LTRs. c, hESC RNA-seq signals for control (shControl) and MSL3 knockdown (shMSL3) at genes that lie 10 kb, 10-25 kb, or 25-50 kb away from the H4K16ac-overlapping full-length L1s. d, Like c, RNA-seq signals at genes that lie in 10 kb, 10-25 kb or 25-50 kb away from the H4K16ac-overlapping LTRs. e, The working model shows MSL/KAT8-mediated H4K16ac maintains accessible chromatin, activates transcription at TEs, and contributes to their enhancer activity to regulate genes in cis. In all box plots, center lines indicate the median, bounds indicate the 25th and 75th percentiles, and whisker limits show 1.5 × interquartile range. P values for all the violin and box plots were calculated using the pairwise two-sided multi-comparison Dunn test, used for post hoc testing following the Kruskal-Wallis test with Bonferroni correction. Article https://doi.org/10.1038/s41594-023-01016-5 cancer and the neuronal lineage. Consistently, we found that H4K16ac is enriched at L1 5′ UTRs in human and mouse stem cells, cancer cell lines, and post mortem brain tissues, suggesting that 5′ UTRs of L1s bound by tissue-specific TFs and enriched with histone acetylations could function as tissue-specific enhancers. Enrichment of H4K16ac at TEs, which constitute a major part of the mammalian genome, is consistent with findings showing that nearly 30% of the histone H4 is acetylated at H4K16 (ref. 34).
Many LTR subfamilies are enriched with active enhancer associated chromatin features indicating that they could function as active enhancers. It has been proposed that some of the LTR subfamilies are essential in driving the expression of lineage-specific genes 43,57 . However, only a minority of putative RLTR13D6 subfamily-derived enhancers identified through epigenomic analyzes have been experimentally validated to function as enhancers 17 . This highlights the importance of functional validations using CRISPR-based perturbation of candidate TEs enriched with enhancer chromatin features. Although we found all of the tested CRISPR-edited H4K16ac + L1s downregulated their putative target genes in cis. Genome-wide enhancer reporter assays, in combination with systematic genome-scale perturbation, are needed to identify what fraction of L1s and LTRs with H4K16ac function as enhancers.
TEs with acetylation marks, including H4K16ac, are bound by looping factors, including CTCF, RAD21, YY1, and ZNF143. Moreover, the fraction of these TEs that loop with genes is significantly larger than the fraction of TEs without acetylation marks that do so, further supporting the role of transcriptionally active TEs in rewiring the regulatory landscape in a species-and cell-type-specific manner. 53 . Since our results show that the MSL-H4K16ac axis drives transcription at TEs, including HERVs (Fig. 5b,f and Extended Data Fig. 4b), we hypothesize that MSL-H4K16ac-mediated transcription at TEs likely contributes to the rewiring of 3D chromatin organization at transcriptionally active TEs, as RNA polymerase II transcription drives enhancer-promoter contact 60 . The factors contributing to the recruitment of the MSL complex to the specific genomic region are unknown in mammals. Intriguingly, the role of MSL complex in co-opting TEs to rewire cis-regulatory elements appears to have been conserved during the evolution of dosage compensation in Drosophila miranda, in which a mutant helitron TE has been shown to recruit the MSL complex to the evolutionarily young X chromosome to increase transcription 61 . In Drosophila dosage compensation, expression of most X-linked genes is increased approximately twofold by H4K16ac, specifically in males 62 . This MSL-mediated X upregulation appears to be conserved in mammals, in which H4K16ac has been shown to upregulate genes on the single active X chromosome to balance expression with two copies of the autosomes 63 . Interestingly, X chromosomes have a higher number of L1s than autosomes 64 , suggesting that MSL-H4K16ac at L1s in the X chromosome could contribute to X upregulation.
TFs enriched at H4K16ac + TEs ( Fig. 3a and Extended Data Fig. 5) could contribute to maintaining MSL-H4K16ac and transcription at TEs. Notably, the MSL complex recruits YY1 to the Tsix promoter to activate its expression in mESCs 32 , suggesting a possible interplay between MSL and YY1 in regulating L1 transcription. Interestingly, MAFK, which has previously been reported to be enriched at TEs 59 , is enriched explicitly at H4K16ac + L1 5' UTRs, suggesting a potential interplay between MAFK and MSL complex.
Neuronal cells have high L1 expression and retrotransposition 65 ; retrotransposon dysregulation is also linked with neurological disorders 1 . TEs and their transcriptional regulators play wider roles in shaping transcriptional networks during early human development 66 Loss of function mutations in genes encoding KAT8 containing protein complexes such as KANSL1, MSL3 and KAT8 lead to neurodevelopmental disorders 37,[67][68][69] . Enrichment of H4K16ac at the 5′ UTRs of L1s in human brain tissues suggests that altered gene expression programme due to TE dysregulation in the nervous system could be a possible mechanism for these disorders (Extended Data Fig. 3) 42 . Further studies on the specific role of H4K16ac in neuronal cell types will reveal whether H4K16ac dysregulation could contribute to neuronal-specific dysregulation of TEs and gene expression, contributing to neurodevelopmental and neurodegenerative disorders.
In yeast, H4K16ac regulates lifespan and cellular senescence 70 . Senescent cells show enrichment of H4K16ac in promoter regions of expressed genes 71 . Analysis of publicly available H4K16ac ChIPseq data showed a dramatic loss of H4K16ac across L1s and LTRs in senescent cells in comparison with proliferating cells (Extended Data Fig. 9), suggesting that proliferating cells, compared with replicative senescent cells, have adapted to the permissive chromatin state at TEs. However, by contrast, L1 elements are known to be transcriptionally derepressed during cellular senescence and to activate the interferon I (IFN-I) response 72 . Further investigation will be needed to understand the direct role of the H4K16ac pathway in regulating L1 transcription linked to aging and senescence.
In summary, we show that H4K16ac-marked L1s and LTRs act as enhancers to regulate genes in cis. The act of transcription at L1 5′ UTRs and LTRs mediated by H4K16ac could contribute to chromatin topology and enhancer-mediated regulation of host gene expression in cis, as L1 and LTRs that are marked with histone acetylations are located within the regulatory elements, or they interact with genes. The permissive chromatin structure mediated by H4K16ac and H3K122ac could counteract the epigenetic repressive environment at TEs within the regulatory elements (Fig. 6e) 73 .

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41594-023-01016-5.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Cell culture and transduction
The H9 hESC line was a gift from L. Vallier's lab with the MTA from WiCell. hESCs were grown on geltrex-coated plates (Thermo Fisher Scientific, A1413302) in mTeSR Plus medium (Stem Cell Technologies, 100-0276) supplemented with 100 U ml -1 penicillin-streptomycin (Gibco, 15140122) and passaged every 3-4 d with ReLeSR (StemCell Technologies, 100-0484), according to the manufacturer's protocols. The doxycycline-inducible SpCas9 (iCas9-H1) hES cells were generated using parental H1-hESCs from WiCell. Briefly, H1 cells were transfected with plasmids from the Genome-CRISP Inducible Cas9 human AAVS1 Safe Harbor Knock-in Kit (GeneCopoeia) using Fugene HD (Promega) and selected with Puromycin (500 ng ml -1 ). Cells were single-cell sorted using FACS and grown in mTESR to make monoclonal lines. The resulting SpCas9 line was confirmed to be karyotypically normal and was tested for mycoplasma every 3 weeks.
iCAS9 cells were transduced with three lentiviral guide RNAs targeting MSL1 and MSL3 (ref. 54). Parental iCAS9 H1, iCAS9 with MSL guide RNAs, TDF iCas9 transduced with MSL1, and MSL3 guide RNA pools were treated with 1 µg ml -1 doxycycline (Sigma) to generate the inducible MSL-KO lines. After 4 to 7 d of doxycycline induction, the knockout was validated by immunofluorescence followed by high-content microscopy and western blot using antibodies to H4K16ac and H3K27ac.

CUT&Tag
CUT&Tag was performed according to Kaya-Okur et al. 39 protocol with modifications to tissue processing, as described below. Experiments were performed in biological duplicates from each cell type. Approximately 100,000 cells were pelleted by centrifugation for 3 min at 600g at room temperature and resuspended in 500 µl of ice-cold NE1 buffer (20 mM HEPES-KOH pH 7.9, 10 mM KCl, 0.5 mM spermidine, 1% Triton X-100, and 20 % glycerol and cOmplete EDTA free protease inhibitor tablet) and were left to sit for 10 min on ice. Nuclei were pelleted by centrifugation for 4 min at 1,300g at 4 °C, resuspended in 500 µl of wash buffer, and held on ice until beads were ready. The required amount of BioMag Plus Concanavalin-A-conjugated magnetic beads (ConA beads, Polysciences) was transferred into the binding buffer (20 mM HEPES-KOH pH 7.9, 10 mM KCl, 1 mM CaCl 2 and 1 mM MnCl 2 ) and washed once in the same buffer; each time they were placed on a magnetic rack to allow the beads to separate from the buffer and resuspended in binding buffer. Then, 10 µl of beads was added to each tube containing cells and rotated on an end-to-end rotator for 10 min. After a quick spin to remove liquid from the cap, tubes were placed on a magnet stand to be cleared, the liquid was withdrawn, and 800 µl of antibody buffer containing 1 µg of the following primary antibodies was added: normal rabbit IgG (Santa Cruz Cat no sc-2027), H3K27ac (Abcam, ab4729), H4K16ac (Abcam, ab109463), H3K122ac (Abcam, ab33309), H3K4me1 (Abcam, ab8895), H3K36me3 (Abcam, ab9050)) H3K4me3 (Millipore, 07-473), H3K27me3 (Abcam, ab192985) and H3K9me3 (Abcam, ab176916)). The mixture was incubated at 4 °C overnight in a nutator. Secondary antibodies (guinea pig α-rabbit antibody, Antibodies online, ABIN101961) were added 1:100 in Dig-wash buffer (5% digitonin in wash buffer), and 100 µl was squirted in per sample while they were gently vortexed, to allow the solution to dislodge the beads from the sides, followed by incubation for 60 min on a nutator. Unbound antibodies were washed in 1 ml of Dig-wash buffer three times. Then, 100 µl of (1:250 diluted) protein-A-Tn5 loaded with adapters in Dig-300 buffer (20 mM HEPES pH 7.5, 300 mM NaCl, 0.5 mM spermidine with Roche cOmplete EDTA free protease inhibitor) was added to the samples, placed on nutator for 1 h and washed three times in 1 ml of Dig-300 buffer to remove unbound pA-Tn5. Next, 300 µL Tagmentation buffer (Dig-300 buffer + 5 mM MgCl 2 ) was added while being gently vortexed, and samples were incubated at 37 °C for 1 h on an incubator. Tagmentation was stopped by adding 10 µl 0.5 M EDTA, 3 µl 10% SDS, and 2.5 µl 20 mg ml -1 Proteinase K to each sample. Samples were mixed by full-speed vortexing for ~2 s and incubated for 1 h at 55 °C to digest proteins. DNA was purified by phenol:chloroform extraction using phase-lock tubes (Quanta Bio) Article https://doi.org/10.1038/s41594-023-01016-5 followed by ethanol precipitation. Libraries were prepared using NEB-Next HiFi 2× PCR Master mix (M0541S) with a 72 °C gap-filling step, followed by 13 cycles of PCR with 10-second combined annealing and extension for the enrichment of short DNA fragments. Libraries were sequenced in Novaseq 6000 (Novogene) with 150 bp paired-end reads.

RT-qPCR
Total RNA was isolated from H9 hESCs using TRIzol reagent (Ther-moFisher Scientific, 15596026). For RT-qPCR, cDNAs were prepared with LunaScript RT SuperMix Kit (NEB, E3010). For CRISPRi experiments, RNA isolation was done using a kit (Monarch, T2040S) followed by reverse transcription using LunaScript RT SuperMix Kit (NEB, E3010), qPCR using qPCRBIO SyGreen Mix Lo-ROX (PCRBio) in LightCycler 480 instrument (Roche). The list of specific primers used is given in Supplementary Table 4. RT-qPCR was done with three independent biological replicates, each of control shRNA and two independent shRNAs targeting MSL3 or relevant empty vector controls and dCAS9 systems for CRISPRi, on a StepOnePlus Real-Time PCR System (Applied Biosystems). Data were normalized to β-actin from three biological replicates.

RNA sequencing
RNA was isolated using Monarch RNA mini prep kit (NEB) with genomic DNA elimination column and on-column DNase treatment. MSL3 KD RNA sequencing libraries were prepared by spiking in equal amounts of The External RNA Controls Consortium (ERCC) Spike-in RNA Variant Control Sets (SIRV set 3, Lexogen), and 500 ng of RNA was used for depletion of rRNA using RiboCOP kit (Lexogen), followed by RNA-seq library preparation using CORALL Total RNA-Seq Library Prep Kit (Lexogen). Libraries were sequenced as 150 bp paired-end reads using Novaseq 6000. In the case of H1 iCAS9 and MSL1 KO RNA-seq, Ribosomal RNAs were depleted using NEBNext rRNA Depletion Kit (Human/Mouse/Rat) (NEB no. E7400) followed by library preparation using NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (NEB no. E7765).

ATAC-seq
ATAC-seq was performed as described in ref. 23, with modifications. The freshly collected 50,000 cells were washed in PBS and resuspended in a resuspension buffer (10 mM Tris-HCl, 10 mM NaCl, 3 mM MgCl2). Cells were resuspended and incubated on ice for 3 min in 50 µl of cold lysis buffer (0.1% NP-40, 0.1% Tween-20, 0.01% digitonin in resuspension buffer). Nuclei were washed in 1 ml of wash buffer (990 µl resuspension buffer, 0.1% Tween-20) by inversion three times. Nuclei were pelleted by centrifugation at 500g for 10 min at 4 °C. The nuclei were resuspended in 47.5 ml of Nextera Tagmentation buffer (Nextera DNA Sample Preparation Kit) and incubated with 2.5 µl of the Tn5 transposase (Nextera kit, Illumina) at 37 °C for 30 min. The resulting DNA fragments were purified using a miniElute column (Qiagen) and amplified by NEBNext High-Fidelity PCR Master Mix in a total volume of 50 µl. The thermocycling protocol for this reaction was 72 °C for 5 min, 98 °C for 30 s and five cycles of 98 °C for 10 s, 63 °C for 30 s, and 72 °C for 1 min. The universal adapter primer and a unique barcoded adapter primer (same as CUT&Tag primers) were used. To avoid over-amplification, after the initial five cycles, the number of remaining cycles required was estimated for each sample using qPCR by adding SYBRGreen and using 5 µl of the previous PCR as a template. The number of additional cycles was determined to be the number that it took for the qPCR to reach one-third of maximal fluorescence. The original PCR was then resumed, and each sample was cycled as necessary. After amplification, the samples were purified using AMPure XP beads. The libraries were sequenced as a minimum of 50 million 150 bp paired-end reads in Novoseq (Novogene PLC).

CRISPRi with dCAS9-KRAB
CRISPRi using dCAS9-KRAB was performed as described in ref. 31, with the following modifications. The CRISPR-Bac plasmid (PB_tre_dCas9_KRAB, Addgene ID 126030) (ref. 74), a kind gift from J. Mauro Calabrese, was mixed with the piggyBac-transposase plasmid in a 1:1 ratio (2 µg in each well of a 6-well plate) into opti-MEM, along with TransIT-LT1 in a 1:3 ratio (Mirus, MIR2300), and reverse transfected into H9 hESCs according to manufacturer's protocol. The next day, the cells were allowed to recover from the transfection for 24 h and then selected with 100 µg ml -1 hygromycin B for 5 d. Surviving colonies were then expanded and reverse-transfected with various gRNA-expressing plasmids (cloned into pSLQ1371 as described in ref. 75, kind gift from S. Qi) with TransIT-LT1. Then, 1.25 × 10 6 cells were reverse transfected with 1 µg of the gRNA-expressing plasmid (per well of a 24-well plate). To improve the efficiency of plasmid delivery, the transfection was repeated the next day (forward transfection). At 48 h after the first transfection, cells were briefly selected with puromycin (0.5 µg ml -1 ) for 24 h and left to recover for 96 h. Cells were collected for RNA isolation and RT-qPCR.

Analysis of CUT&Tag-seq data
Mapping. For the CUT&Tag-seq, 150-bp paired-end reads were trimmed for adapters using the Trimmomatic tool and aligned to the hg38 genome through local Bowtie2 (version 2.4.5) with these parameters for pair-end mapping:-very-sensitive-local-no-unalno-mixed-no-discordant-phred33 -I 10 -X 700 (ref. 76). For analyzes, multi-mapped reads were filtered out, and only uniquely mapped reads were retained with the samtools flag of -q 2 -f 0x200 (ref. 77). For Figure 5e, total reads, including multi-mapped reads, were retained for plotting heatmaps of ATAC-seq, CUT&Tag, and RNA-seq reads. For individual replicates, the bam files were sorted, indexed, and used for generating bedgraphs (for peak calling) and bigwigs. The bam files were sorted and indexed using the samtools (version 1.9) sort and the samtools index. Merging of multiple replicates was performed using samtools merge. The sorted bam files were used to generate bed, bedgraph and bigwig formats for individual modifications.
Peak calling and analyzes. The reads were extracted from the bam to bed by the bedtools bamtobed option 78 . Further reads were processed as mentioned in the SEACR (version 1.3) manual to get the bedgraph 79 . These bedgraph files were subjected to peak calling through SEACR with a stringent P of ≤1 × 10 -6 with the norm and relaxed options.
Further bedtools with various options were used for transforming bed files, such as intersect, closest, sample, or shuffle. GNU awk editor was used for processing the bed files wherever required. Chromatin Article https://doi.org/10.1038/s41594-023-01016-5 state predictions for the histone modification peaks were performed using ChromHMM 40 (v1.10). For further analyses, the reproducible peaks were obtained by performing an intersection between peak files for each CUT&Tag replicates for histone PTMs. Overlap between the peaks for histone modifications for the H9 cells was assessed using the Intervene package 80 . While the overlapping peak counts were plotted as a Venn Diagram for each histone PTM and IgG, the peaks for histone PTM combinations were plotted as an upset plot showing the number of overlapping peaks (y axis) along with the histone modification peak numbers on the x axis. TE enrichment analyses. Tracks for the repeats (rmsk) were obtained from the UCSC Genome Table Browser for hg19 and hg38. Reproducible peaks (consistent between two replicates) were used to generate the observed versus expected frequency for different TE classes (Alu, full-length L1, and LTR), gene body, and TSSs. These were calculated across various histone modification CUT&Tag peaks. The intersect count was obtained for each histone modification using bedtools (version v2.28.0) intersect (bedtools intersect -wa -u options) for the mentioned genomic elements. The expected occurrences in the genome were calculated by intersecting the genomic elements with the randomized genomic coordinates (number, length, and chromosome ID matched) across different histone modifications. The ratios for observed versus expected at these genomic elements for each histone PTM were calculated and used to plot as a heatmap using ggplot2 heatmap function in R.

RepeatMasker.
To analyze the repeat content of the different histone modification peak sets, the fasta was obtained using the bedtools getfasta tool from the hg38.fa reference genome. The sequences were subjected to RepeatMasker (version 4.0.7) to get the repeat content across these genomic sequences for different histone modifications (http://www.repeatmasker.org).
Bigwig generation and plotting. Sorted bam files were subjected to bigwig generation via deepTools (version 3.5.1) (ref. 81) bamCoverage tool with -binSize 20 -normalize Using BPM or CPM-scaleFactor = 1.0-smoothLength 60-extendReads 150-centerReads options. The signal was normalized to IgG through bigwigCompare with option -operation first or subtract. The bigwig files were used for plotting signals or visualization in the genome browser. The genome-browser views were obtained by viewing the signal tracks in the UCSC Genome Browser. For the knockdown (in H9 cells) or knockout (in TDF cells) studies, the samples were normalized on the basis of the number of reads mapping to the Escherichia coli genome.
The plotting of signals at various genomic landmarks and bed coordinates was carried out through deepTools. Matrices were generated using deepTools computeMatrix reference-point or scale-regions option. These matrices were used for plotting heatmaps or average summary plots by the plotHeatmap or plotProfile function in deep-Tools, with or without clustering by the k-means algorithm. The sorted bam files were also used to study the correlation between the individual replicates for the CUT&Tag across histone PTMs and IgG using multi-BamSummary function in deeptools with options bins and plotted as Pearson correlation heatmap using deeptools plotCorrelation function with options-skipZeros.
Further, H4K16ac signals from GSE84618 for brain (prefrontal lobe) tissues from young individuals, old individuals and individuals with Alzheimer's disease were compared for the TE elements. Similarly, the bigwigs were obtained for the proliferative and senescence model in IMR90 cells (GSM1358821) for L1 and LTR subfamilies. The signals were compared as heatmaps or average summary profiles using above-mentioned tools. The signals at the H4K16ac-marked TEs were also plotted as average summary plots using plotProfile function for transcription factors YY1 (ENCODE ID: ENCFF904SDR), RAD21 (ENCODE ID: ENCFF506AAX) and CTCF (ENCODE ID: ENCFF473IZV) using ENCODE datasets (bigwigs) normalized as fold change over control.

TAD border annotation.
To call TADs in human embryonic stem cells (H9), we used Hi-C data for two replicates from ref. 53 under accession numbers (GSM3262956 and GSM3262957). We first generated contact domains for all chromosomes at a 10-kb resolution using the Arrowhead tool from Juicer using Knight-Ruiz Normalization 82 . We extracted the borders of these TADs. To ensure we identify a robust set of TAD borders, we selected with a score above one that are common borders between the two replicates, assuming a maximum gap of 1 bin (10 kb). This resulted in 9,952 robust TAD borders. The TAD calling was performed on the hg19 reference genome, and to allow integration with the rest of the analysis, we lifted over the common TAD borders from hg19 to hg38.
Significant loops calling using Micro-C. Chromatin loops were called with the HiCCUPS tool from the Juicer software suite 82 on micro-C data in H1 hESCs 51 . Loops were called using 5-and 10-kb resolution, 10% FDR, Knight-Ruiz normalization, a window of 7 and 5, peak width of 2 and 4 and, thresholds for merging loops of 0.02, 1.5, 1.75 and 2, and distance to merge peaks of 20 kb (-r 5000,10000 -k KR -f . Motif enrichment analysis. Enrichment of TF-binding sites (TFBS) at the TEs (>5 kb L1, Alu, and LTR) overlapping with histone modifications (H4K16ac, H3K27ac and H3K122ac) or a similar number of randomized genomic bins (chromosome, length matched) was performed. The experimentally determined TFBS for the H1-hESCs was fetched from the UCSC Genome Browser as TFBS clusters. The number of motifs for each TE class was either positive for histone modification or randomized genomic bins for histone modification for all TFBS. The internal distribution profile of motifs across each TEs was determined as percentage distribution and enrichment score defined as (Diff/Sum) of motif counts' percentage between observed (TEs positive for histone modification) versus expected (randomized genomic bins) occurrence of motifs. The ratios obtained for each TFBS were plotted as a heatmap using the R package ComplexHeatmap 83 .
ATAC-seq data analysis. The ATAC-seq reads were processed for mapping by trimming for adapters using the Trimmomatic tool, followed by aligning to the hg38 genome through local Bowtie2(version) with these parameters for pair-end mapping: -very-sensitive-localno-unal-no-mixed-no-discordant-phred33 -I 10 -X 700 (Alteration in -X to 2000 was done to allow the mapping of reads for H9 cells). The mapped reads were processed as described above (CUT&Tag data analysis) to generate the bigwigs. The signal was normalized as log 2 (fold change) for control over MSL3 knockdown using the bigwig-Compare function in deeptools with-skipZeroOverZero-operation log2. Using the same matrix generation and heatmap tools, further ATAC-seq signals were compared at the full-length L1 subfamilies and LTR subclasses.
RNA-seq data analysis. The reads obtained from RNA-seq for H9 cells were mapped to the human genome using STAR 84 following the Bluebee-CORALL pipeline of mapping. For TDFs, the RNA-seq datasets were downloaded from NCBI-GEO for accession ID GSE144019. The reads were mapped to hg38 following the same pipeline as H9 except for the single-end specification in TDFs.
For differential enrichment analysis, the fragment counts for each dataset were obtained using the featurecounts tool from the SubRead package. The GTF file for genes was obtained from ENSEMBL, and for different TE classes (Alu, L1 and LTR), it was fetched from the UCSC Article https://doi.org/10.1038/s41594-023-01016-5 Table Browser. These feature counts were used for the differential enrichment analyses using the DESeq2 package in R. The DESeq was performed with defaults 85 . The differential expression of genes was visualized as a volcano plot. The differential gene expression table can be found in the additional data.
The uniquely mapped reads were filtered using samtools for MAPQ of 255. Further, the unique alignments' bam files were merged, sorted, and indexed using samtools, followed by bigwig generation using the deepTools function bamCoverage. The normalized signal was generated as log 2 (fold change) for control over MSL3 knockdown using bigwigCompare function in deeptools with-operation log2. The signal was compared at the full-length L1, as well as at genes.
The RNA signals across the various subfamilies of TEs, as well as genes in the flanks (<10,000, 10,000-25,000 and 25,000-50,000 kilo base (kb) of the H4K16ac-marked LTR and full-length L1, were calculated as RPKM from the read counts obtained for each gene or TEs across the multiple replicates for H9 (control or MSL3 KD) and TDF (WT or MSL1 KO). The RPKM signal was then plotted as a violin and box plot using ggplot2 in R. For comparison, the same number of TEs that are H4K16ac + and H4K16acin TDFs was obtained by subsampling using the bedtools sample. The signal was plotted as the log10 value of the RPKM on the y axis. The RNA signals were plotted as violin plots with box plots with a median. The statistical analyses for all the violin plot comparisons were performed using the Dunn test with Bonferroni correction.

STARR-seq data analysis.
To assess the potential of the TEs marked by H4K16ac to act as enhancers, we compared the STARR-seq signal in K562 (ENCFF611ZHY) and SH-SY5Y (ENCFF571ARG) cells at the TE elements: full-length L1 (>5 kb) and ERV/LTRs. The signal was plotted as a heatmap from the start (L1) or center (LTR) of the TE elements sorted according to the H4K16ac signal. Further, the signal was compared as violin plots for the four sets of peak combinations with respect to overlap among peaks that overlap with TEs. These were H3K4me1 + only, H3K4me1 + H3K27ac + H4K16ac + , H4K16ac + H3K4me1and H4K16ac + H3K4me1 + peaks that overlap with LTRs.

Statistical tests.
For all the RT-qPCRs, an unpaired t-test with Welch correction (two-stage step-up) was performed between the groups using GraphPad Prism9. The statistical tests were performed for all the violin plots using the Dunn test function in the R tool rstatix. Dunn's test with Bonferroni correction was used for multiple-group comparisons between the groups.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The data discussed in this publication have been deposited in NCBI's Gene Expression Omnibus and are accessible through GEO Series accession number GSE200770 (https://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE200770). CUT&Tag raw data as well as processed data files (peaks and bigwigs), can be accessed at NCBI under accession ID GSE200768, RNA-seq raw data files can be accessed under accession ID GSE200769, and ATAC-seq datasets can be accessed under accession ID GSE200767. All the datasets generated and public datasets used in this study are detailed in Supplementary Table 1. Source data are provided with this paper.

Code availability
All the analyses in this manuscript have been carried out using publicly available tools. No custom code was generated for this purpose. The methodology contains the details of the analysis steps involved.