Mammalian genomes harbor abundant transposable elements (TEs) and their remnants, with numerous epigenetic repression mechanisms enacted to silence TE transcription. However, TEs are upregulated during early development, neuronal lineage, and cancers, although the epigenetic factors contributing to the transcription of TEs have yet to be fully elucidated. Here, we demonstrate that the male-specific lethal (MSL)-complex-mediated histone H4 acetylation at lysine 16 (H4K16ac) is enriched at TEs in human embryonic stem cells (hESCs) and cancer cells. This in turn activates transcription of subsets of full-length long interspersed nuclear elements (LINE1s, L1s) and endogenous retrovirus (ERV) long terminal repeats (LTRs). Furthermore, we show that the H4K16ac-marked L1 and LTR subfamilies display enhancer-like functions and are enriched in genomic locations with chromatin features associated with active enhancers. Importantly, such regions often reside at boundaries of topologically associated domains and loop with genes. CRISPR-based epigenetic perturbation and genetic deletion of L1s reveal that H4K16ac-marked L1s and LTRs regulate the expression of genes in cis. Overall, TEs enriched with H4K16ac contribute to the cis-regulatory landscape at specific genomic locations by maintaining an active chromatin landscape at TEs.
Dysregulation of TEs and their insertions into gene exons are usually disruptive and have been implicated in cancer and neurological disorders1,2. When inserted into noncoding DNA, including introns, they can affect the host gene expression in cis or trans. Most TEs cannot transpose owing to acquired mutations and epigenetic and post-transcriptional silencing mechanisms (reviewed in refs. 3,4). Transcription of TEs is repressed by DNA methylation, trimethylated histone H3 K9 (H3K9me3), TRIM28 and Krüppel-associated box-containing zinc finger proteins (KRAB-ZFPs), and the human silencing hub (HUSH) complex5,6,7,8,9. Apart from these repressive mechanisms, several pluripotency-associated transcription factors (TFs), namely SP1, SP3, LBP9, DUX4, DUX, GATA2 and YY1, are enriched at ERV LTRs, and SOX11, RUNX3 and YY1 are enriched at the 5′ untranslated regions (UTRs; containing promoters) of L1 (reviewed in ref. 10). Interestingly, most species-specific DNase hypersensitive sites (which are on accessible chromatin) are occupied by remnants of TEs11,12, suggesting that TEs have been co-opted, becoming tissue- and species-specific cis-regulatory elements (CREs). TEs are transiently upregulated during early development13, in the neuronal lineage14, and in cancer1. The ERV superfamily of LTRs (LTR/ERV) and Alu family of short interspersed nuclear elements (SINE/Alu) often exhibit chromatin features associated with active CREs15,16,17 and function either as enhancers to regulate genes in cis or act as alternative promoters15. The 5′ UTR of L1 repeats are also bound by tissue-specific TFs, are enriched with chromatin features that are associated with CREs18, and can function as nuclear noncoding RNAs13,19; still, it is unclear whether they can act as CREs. Although TEs have been suggested to contribute to nearly one-quarter of the regulatory epigenome10,13,20,21, the chromatin-based mechanisms contributing to regulatory activity in the vast number of TEs are unclear.
Chromatin features, such as a combination of H3K4me1 and H3K27ac, bidirectional transcription of enhancer RNAs (eRNAs), and accessible chromatin (determined, for example, using the assay for transposase-accessible chromatin with sequencing (ATAC-seq)) are widely used to predict enhancer activity, including for TE-derived enhancers17,22,23,24,25. Yet the level of H3K27ac does not correlate with and is dispensable for enhancer activity, suggesting that other uncharacterized chromatin features could contribute to regulatory activity26,27,28. H4K16ac and H3K122ac are particularly interesting among many histone acetylations, because they alter chromatin structure directly and increase transcription in vitro29,30. H4K16ac and H3K122ac are enriched at enhancers, and they identify new repertoires of active enhancers that lack detectable H3K27ac27,31. However, it is challenging to decipher the causal role of specific histone acetylations, as many acetylations, including H3K27ac, are catalyzed by multiple lysine acetyltransferases (KATs), and KATs also have a broad substrate specificity. H4K16ac is an exception, as it is catalyzed explicitly by KAT8 when associated with the MSL complex.
Nevertheless, when KAT8 is associated with non-specific lethal (NSL), it catalyzes H4K5ac, H4K8ac and H4K12ac (refs. 32,33,34,35). In mouse embryonic stem cells (mESCs), KAT8 and H4K16ac mark active enhancers and promoters of genes that maintain the identity of the mESCs27,36. Loss-of-function mutations in KAT8 or MSL3 lead to reduced H4K16ac levels and are known to cause neurodevelopmental disorders37,38. However, the mechanism through which KAT8 containing MSL complex-mediated acetylation of H4K16 contributes to genome regulation during normal development is less clear, especially in the human genome.
Here, we show that H4K16ac is enriched at L1s and LTRs and is depleted at gene promoters, and that H4K16ac regulates transcription across the L1 and ERV LTR superfamily of TEs. TEs marked with acetylations loop with the neighboring genes and regulate their expression. CRISPR interference and genetic deletion of H4K16ac-marked (H4K16ac+) TEs leads to the downregulation of genes in cis, demonstrating that H4K16ac+ TEs function as enhancers. Furthermore, depletion of H4K16ac is sufficient for downregulation of L1 and LTRs and genes linked to these TEs, confirming the significance of H4K16ac-mediated activation of TEs in rewiring the regulatory landscape of a substantial fraction of the mammalian genome.
We aimed to investigate the role of MSL-mediated H4K16ac in human genome regulation. We performed two to three replicates of cleavage under targets and tagmentation (CUT&Tag)39 in human embryonic stem cells (H9-hESCs) for histone modifications associated with active regulatory elements (H3K27ac, H3K122ac, H4K12ac, H4K16ac, monomethylated H3 K4 (H3K4me1) and H3K4me3), polycomb repressed domains (H3K27me3) and heterochromatin (H3K9me3) (Extended Data Fig. 1a and Supplementary Table 1). We evaluated overall data quality and similarity among our CUT&Tag replicates (Extended Data Fig. 1a). We generated peaks by merging the replicates, and we used reproducible peaks in at least two replicates to validate our findings (Extended Data Figs. 1 and 2 and Supplementary Tables 2 and 3). To prevent the same reads from mapping to multiple regions in the repeat elements, uniquely mapped CUT&Tag sequencing reads were used for the analyses. Except for the analysis in Figure 5e, we used multi-mapping reads for L1 subfamily-level enrichment analysis.
H4K16ac and H3K122ac are enriched at TEs
Chromatin-state discovery and genome annotation analysis (ChromHMM)40 of CUT&Tag peaks revealed the expected enrichment of H3K4me1, H3K4me3, H3K27ac and H4K12ac at chromatin features associated with active transcription, including active promoters and enhancers. Intriguingly, H4K16ac and H3K122ac, but not H3K27ac or H4K12ac, were enriched at heterochromatin, insulator and transcription elongation states (Fig. 1a). Further analysis revealed specific enrichment of H4K16ac and H3K122ac at the 5′ UTR of full-length L1s and ERV/LTR elements, compared with gene promoters (Fig. 1b–e). H4K16ac was also detected at gene bodies, consistent with previous findings showing its role in transcription elongation41 (Fig. 1a,b). Interestingly, however, H4K16ac shows a very low level of enrichment at the gene promoters (Fig. 1b,d,e and Supplementary Table 3), similar to recent chromatin immunoprecipitation and sequencing (ChIP–seq) data in human cell lines34. Fifty-one percent of full-length L1s (n = 10,000) have reproducible H4K16ac peaks (Supplementary Table 3), and although H4K16ac and H3K9me3 are enriched at L1s (Fig. 1b), they are anti-correlated at L1 subfamilies (Extended Data Fig. 1c). Less than 10% of the H4K16ac peaks at TEs overlapped with H3K9me3 (Extended Data Fig. 1b). The H3K9me3 level was also lower at H4K16ac peaks that overlapped with L1 5′ UTRs, and the H4K16ac level was lower at L1s with H3K9me3 peaks (Extended Data Fig. 1c). Reanalysis of public ChIP–seq datasets showed enrichment of H4K16ac at the 5′ UTRs of L1s in human brain tissue (Extended Data Fig. 3a)42. H4K16ac is also enriched at the 5′ UTRs of L1s in neuroblastoma (SH-SY5Y), erythroleukemia (K562), and transformed dermal fibroblast (TDF) cell lines and mESCs (Extended Data Fig. 3b–e). This analysis suggests that H4K16ac enrichment at TEs is not unique to hESCs, but is conserved in cancer cells, human brain tissue and mice. Although H3K27ac and H4K12ac were detected at some L1 5′ UTRs, they were enriched at a much higher level at the promoters of genes (Fig. 1b). Interestingly, along with H4K16ac and H3K122ac, L1 5′ UTRs were also enriched with H3K4me1 but were depleted of H3K4me3 (Fig. 1d), suggesting that these elements could function as CREs.
H4K16ac+ L1 5′ UTRs are enriched with enhancer features
LTR subfamilies function as enhancers to regulate genes in a tissue-specific manner in humans and mice (reviewed in ref. 16). LTR5- and LTR7-related subfamilies function as enhancers in hESCs21,43,44. However, whether L1 elements can act as enhancers to regulate genes in cis is not known. Here we found that H4K16ac is particularly enriched at the 5′ UTR of full-length L1 subfamilies and correlates with chromatin features associated with active enhancers, such as H3K27ac, H3K4me1, BRD4 and ATAC-seq signal (Fig. 2a,d).
Interestingly, not all L1 subfamilies are enriched with active enhancer features at the same level. The evolutionarily younger L1s (L1HS, L1PA2 and L1PA3, 3–12.5 million years) are enriched with active enhancer features, including H4K16ac. These L1s are known to be transcriptionally active. Despite evolutionarily older L1s being transcriptionally inactive, the 5′ UTRs of these L1 subfamilies (L1PA7–L1PA16, 31–80 million years) are enriched with H4K16ac, along with other active enhancer features, but H3K9me3 is less enriched (Fig. 2a), suggesting that the 5′ UTRs of older full-length L1s have been co-opted to function as functional regulatory elements.
Analysis of genome-wide enhancer activity data (using self-transcribing active regulatory region sequencing, or STARR-seq), generated by ENCODE45 from neuroblastoma (SH-SY5Y) and erythroleukemia (K562) cell lines, showed enhancer activity specifically at the 5′ UTR of L1s (Fig. 2b) in a cell-type-specific manner. The presence of active enhancer chromatin features (Fig. 2a) and the ability of L1 5′ UTR to drive transcription of the minimal promoter in an in vitro enhancer reporter assay (Fig. 2b) further confirmed that 5′ UTRs of full-length L1s could function as transcriptional enhancers.
LTRs with H4K16ac process higher enhancer activity
Our data show that, apart from the LTR5 and LTR7 elements that show clear enrichment of active enhancer chromatin features, some of the subfamilies of LTR16 and LTR33 may also serve as enhancers in hESCs, because they are enriched with H4K16ac and other active enhancer chromatin features (Fig. 2c,d). Interestingly, analysis of STARR-seq data from K562 and SH-SY5Y cells revealed that H4K16ac+ LTRs in these cell lines show significantly higher enhancer activity than LTRs that overlap with only H3K4me1 or H3K27ac peaks (Fig. 2e,f and Extended Data Fig. 4a). These results further support the notion that H4K16ac+ LTRs function as enhancers. The rest of the LTR and Alu families are not likely to act as enhancers in hESCs, as they lack known enhancer chromatin features (Extended Data Fig. 4b).
TEs marked with H4K16ac are bound by looping factors
We aimed to identify TFs bound at H4K16ac+, H3K27ac+ and H3K122ac+ TEs using TF ChIP–seq data from ENCODE. Expectedly, EP300 is enriched at LTRs marked with H3K27ac (Fig. 3a). YY1 is enriched at L1s marked with all three marks, supporting the known role of YY1 in activating L1 transcription46. CTCF and RAD21 showed higher enrichment at H4K16ac+ and H3K122ac+ L1s and LTRs than at H3K27ac+ L1s and LTRs. MYC and KDM1A were depleted at H4K16ac+ and H3K27ac+ L1s. These observations are consistent with previous reports showing the role of CTCF and RAD21 in activating L1 transcription47,48, and of MYC and KDM1A in repressing L1 transcription49. SP1, TCF12 and NANOG binding was also specifically enriched at H3K27ac+ L1 and LTRs, suggesting that they have a role in transcription at these elements.
L1s and LTRs with acetylation marks loop with genes
YY1, enriched at acetylated L1s, functions as a looping factor that facilitates interaction between enhancers and promoters50. Compared with L1s with acetylation marks, LTRs and Alu elements marked with histone acetylations are enriched with USF1, REST and a looping factor ZNF143 (Fig. 3a and Extended Data Fig. 5). Meta-analysis confirmed the enrichment of CTCF, RAD21 and YY1 at both H3K27ac+ and H4K16ac+ L1 5′ UTRs and LTRs (Fig. 3b). Analysis of published Hi-C data revealed that, compared with TEs that lack acetylation marks, TEs with such marks are enriched at topologically associated domain (TAD) borders (Fig. 3c). Moreover, H4K16ac levels are relatively higher at TEs overlapping with the TAD borders than at TEs that do not overlap with TAD borders (Fig. 3d,e). Furthermore, to identify whether TEs with histone acetylations loop with genes, we called significant loops from publicly available micro-C data from H1 hESCs51. This revealed that the fraction of TEs (L1, LTRs and Alu elements) with acetylation marks that form chromatin loops with genes is significantly higher than the fraction of TEs lacking these marks (Fig. 3f,g). These analyses provide evidence that transcribed TEs enriched with histone acetylation marks could contribute to three-dimensional (3D) chromatin folding and looping interactions with genes.
H4K16ac+ LTR/HERVs act as enhancers
We aimed to use CRISPR interference (CRISPRi) to investigate the role of H4K16ac+ TEs in regulating genes in cis, by recruiting KRAB repressor domain (dCAS9-KRAB) to TEs. We performed CRISPRi for individual LTRs of human endogenous retrovirus (HERV) or L1 5′ UTRs by co-transfecting two independent guide RNAs that recruit dCAS9-KRAB to specific TEs enriched with H4K16ac in hESCs. We then performed quantitative reverse transcription PCR (RT–qPCR) for nearby expressed genes or genes that show the looping interaction in the RAD21-HiChIP data (Fig. 4a)52. CRISPRi targeting an H4K16ac+ LTR7/HERV-H (HERV type H family) element located ~50 kb away from PEX1 and ~30 kb from the GATAD1 promoter led to downregulation of PEX1, but not GATAD1 (Fig. 4b,d). CRISPRi targeting another H4K16ac+ LTR7/HERV-H element located ~50 kb away from the NUS1 promoter led to the downregulation of the NUS1, but not of GOPC (Fig. 4c,d). However, CRISPRi targeting H4K16ac+ LTRs/HERV-L-18 (HERV type L-18 family) and HERV-L-18 int (internal portion of HERV-L-18) that are close to TAD borders (Figs. 3d and 4d) did not show downregulation of nearby genes ZC3H15 and ODF2L, suggesting that some but not all H4K16ac+ HERV/LTR loci function as enhancers. However, it is possible that such H4K16ac+ TEs could contribute to LTR/HERV transcription and 3D genome folding (Fig. 3c–e)53.
H4K16ac+ L1 5′ UTRs function as enhancers
We next focused on L1s and asked whether L1 5′ UTRs enriched with H4K16ac regulate genes in cis by performing CRISPRi for H4K16ac+ L1 5′ UTRs, together with two L1 5′ UTRs that lack detectable histone acetylation marks. CRISPRi for the H4K16ac+ 5′ UTR of an L1PA10 located ~110 kb upstream of USP38 led to specific downregulation of USP38 but not other nearby genes GAB1 and SMARCA5. Notably, CRISPRi for two H4K16ac– 5′ UTRs of L1s, located ~30 kb and ~85 kb from the USP38 promoter, led to no change in the USP38 transcript level, showing the specificity of the H4K16ac+ L1PA10 element in regulating USP38 (Fig. 4e,j). Similarly, CRISPRi for the H4K16ac+ 5′ UTR of L1PA10, located ~270 kb from the TANC2 promoter, led to downregulation of TANC2, but not CYB561 (Fig. 4f,j). CRISPRi for H4K16ac+ L1PA7, located ~24 kb from the COMMD10 promoter, also led to a significant downregulation of COMMD10, but not the nearby gene SEM6A (Fig. 4g,j). RAD21-HiChIP data and the micro-C analysis revealed significant looping interactions between MOXD1 and STX7 genes with the H4K16ac+ L1PA8, located ~100 kb away from the MOXD1 promoter (Fig. 4h,j). CRISPRi for the 5′ UTR of this L1PA8 led to significant downregulation of both MOXD1 and STX7. However, the expression of ENPP1, which does not loop with this L1, was not altered (Fig. 4h,j), demonstrating the specific cis-regulatory function of these L1s.
To further confirm that H4K16ac+ L1 5′ UTRs regulate genes in cis, we used CRISPR–CAS9 to delete full-length L1 elements in H1 hESCs. Owing to the difficulty of specific deletion of L1 5′ UTRs, we nucleofected the cells with pairs of synthetic guide RNAs along with CAS9 (ribonucleocomplex) that target the flanking region of four full-length L1s (~7 kb deletions). We generated two independent clonal lines with heterozygous deletions for L1PA10 and one clone for L1PA7; both are H4K16ac+ and are located upstream of USP38 (Fig. 4e and Extended Data Fig. 6a,b). In accordance with CRISPRi data, RT–qPCR data showed that the deletion of L1PA10 and L1PA7 led to the downregulation of USP38, but not other nearby genes that were tested, namely GAB1 and SMARCA5 (Fig. 4k). For deletion of L1s located at the MOXD1 and RLN2 loci (Fig. 4g,i), we nucleofected gRNA–CAS9 ribonucleoprotein complexes and used two independent pools of hESCs that showed ~50% deletion efficiency (Extended Data Fig. 6c). Although CRISPRi for L1PA8 resulted in the downregulation of both MOXD1 and STX7, genetic deletion led to specific downregulation of MOXD1, but not STX7 and ENPP1 (Fig. 4h,k). Deletion of another H4K16ac+ L1PA7 located downstream of RLN2, ~12 kb away from the RLN2 promoter, led to the downregulation of RLN2 but not a nearby gene PLGRKT (Fig. 4i,k). Overall, CRISPRi and genetic deletion experiments confirmed that H4K16ac+ L1s and LTRs are involved in regulation of transcription of genes in cis.
MSL and H4K16ac activate transcription of TEs
Next, we aimed to deplete H4K16ac to investigate whether it regulates TE transcription. H4K16ac is catalyzed explicitly by KAT8 when associated with the MSL complex, but not the NSL complex32,33,34,35 (Fig. 5a). Because depletion of the individual MSL complex proteins MSL1, MSL2 and MSL3 is sufficient to reduce H4K16ac level54, we knocked down MSL3 using two independent lentiviral small hairpin RNAs (shRNAs) in H9 hESCs; we first validated the depletion by RT–qPCR, which showed ~50% downregulation of MSL3. RT–qPCR with primers recognizing full-length L1 subfamilies, such as human-specific (L1HS), mammalian-wide (L1M) and primate-specific (L1PA and L1PB) full-length L1s, showed significant downregulation upon MSL3 knockdown (KD). Similarly, RT–qPCR with primers recognizing HERV-K and HERV-H transcripts showed significant downregulation of HERV-H and HERV-K in MSL3-depleted hESCs (Fig. 5b). Western blotting confirmed that MSL3 depletion led to a specific reduction in H4K16ac but not H3K27ac (Fig. 5c and Supplementary Fig. 1). Like the transcript data, L1-ORF1 protein (L1-ORF1p), encoded by full-length L1s (Fig. 1d) and HERV envelope protein (antibody raised against ERVW-1) were also reduced upon MSL3 and H4K16ac depletion (Fig. 5c), consistent with the high level of H4K16ac at L1 5′ UTRs (Fig. 1d) and ERVW-1 locus (Fig. 4c).
We further used doxycycline-inducible Cas9 (iCAS9)-mediated knockout (KO) of MSL1 in H1 hESCs (Fig. 5d) and in TDFs (Extended Data Fig. 7) to confirm our findings from the shRNA-mediated MSL3 depletion. Immunofluorescence for H4K16ac and L1-ORF1 protein followed by high-content imaging revealed a significantly reduced number of L1-ORF1p foci in H4K16ac-depleted MSL1-KO hESCs (Fig. 5d). Like hESC data, MSL3 KO in TDFs reduced the bulk of H4K16ac (Extended Data Fig. 7a) and at L1 5′ UTRs and LTRs (Extended Data Fig. 7d).
RNA-seq data analysis from MSL3-KO TDFs showed significant downregulation of L1 and LTR transcripts (Extended Data Fig. 7b,c). Notably, H4K16ac+ L1s, but not H4K16ac– L1s, are significantly downregulated in MSL1-KO TDFs (Extended Data Fig. 7b). All these results confirm the direct role of MSL mediated H4K16ac in the transcriptional activation of L1. MSL3-KD RNA-seq analysis in hESCs showed that pluripotency and differentiation-associated genes were unaffected (Extended Data Fig. 8a,b). However, H4K16ac+ genes were more affected than were H4K16ac– genes (Extended Data Fig. 8c). Further analysis of L1s and LTRs showed significant downregulation of both human-specific (L1HS) and primate-specific (L1PA2 to L1PA16) full-length L1 and LTR subfamily transcripts (Fig. 5b,g). L1, LTRs, HERV-K and HERV-L transcripts and protein-coding genes also show small but significant downregulation in MSL3-KD cells (Fig. 5e,f and Extended Data Fig. 8d).
MSL/H4K16ac at TEs maintain active cis-regulatory landscape
H4K16ac causes chromatin decompaction in vitro, and depletion of H4K16ac has been shown to reduce chromatin accessibility29,55. Therefore, we asked whether the lack of H4K16ac leads to altered accessibility at TEs. ATAC-seq data showed a specific reduction in accessible DNA at the 5′ UTR of L1s in MSL3-depleted hESCs (Fig. 5g). In particular, evolutionarily younger L1s show a decrease in DNA accessibility, accompanied by reduced transcriptional activity at these elements (Fig. 5g).
Genes closer to H4K16ac+ L1 and H4K16ac+ LTRs are significantly highly expressed compared with genes farther away from these L1s and LTRs. By contrast, genes closer to H4K16ac– L1 and H4K16ac– LTRs show significantly lower expression levels than farther genes (Fig. 6a,b). Next, we asked whether depletion of MSL/H4K16ac at L1 and LTRs affects the expression of genes located near these TEs marked with H4K16ac+ TEs. MSL3 depletion led to a small but significant downregulation of many transcripts (n = 3,312) closer (<10 kb) to H4K16ac+ L1s (Fig. 6c). Similarly, many transcripts that are closer (<10 and <25 kb) to H4K16ac+ LTRs are significantly downregulated compared to transcripts that are 25 to 50 kb away (Fig. 6d).
Overall, our results confirm the role of MSL/H4K16ac at L1 and LTRs in transcriptional activation of TEs (Fig. 5b,d–f) and in regulating genes that they are associated in linear distance or 3D space (Figs. 4 and 6a–e). Therefore, we conclude that MSL complex-mediated acetylation of H4K16 leads to the opening of chromatin structure and increased transcriptional activity at L1 and LTRs in a cell-type-specific manner. The permissive local chromatin environment at H4K16ac+ TEs shapes the cis-regulatory landscape across the mammalian genome (Fig. 6e).
TEs are repressed by many epigenetic pathways, such as DNA methylation, H3K9me3, KRAB-ZNF, HUSH complex and piwi-interacting RNA (piRNAs). We have discovered that the MSL-H4K16ac axis functions as a transcriptional activator of TEs, particularly L1s and LTRs. TEs have contributed substantially to the evolution of mammalian genomes by helping to shape both the coding and noncoding regulatory landscape. Several ERV/LTR subfamilies have been demonstrated to function as tissue-specific enhancers. Here, we have demonstrated that L1 5′ UTRs and LTR/ERVs enriched with acetylated histones loop with genes, and L1s and LTRs marked with H4K16ac function as enhancers to regulate genes in cis.
Roadmap epigenomics data have shown that TEs are depleted of H3K27ac and accessible chromatin; only 3% of TE bases are annotated with active regulatory chromHMM states, compared with 32% of promoter bases56. Despite that, TEs contribute up to 40% of TF-binding sites; hence, TEs have been proposed to contribute to species- and tissue-specific rewiring of gene regulatory networks57,58,59. This suggests that an unknown chromatin pathway could contribute to the enhancer activity of TEs in a cell-type- or species-specific manner, which could be independent of H3K27ac. Our CUT&Tag data show that the level of H3K27ac is much higher at genes and promoters than at TEs. However, H4K16ac is enriched explicitly at the L1 5′ UTRs, along with several other chromatin features associated with active enhancers. We now demonstrate that L1 5′ UTRs marked with H4K16ac along or together with H3K122ac and H3K27ac function as enhancers to regulate the expression of genes in cis. Although L1s are expressed at higher levels during early development, including stem cells, they are also upregulated in cancer and the neuronal lineage. Consistently, we found that H4K16ac is enriched at L1 5′ UTRs in human and mouse stem cells, cancer cell lines, and post mortem brain tissues, suggesting that 5′ UTRs of L1s bound by tissue-specific TFs and enriched with histone acetylations could function as tissue-specific enhancers. Enrichment of H4K16ac at TEs, which constitute a major part of the mammalian genome, is consistent with findings showing that nearly 30% of the histone H4 is acetylated at H4K16 (ref. 34).
Many LTR subfamilies are enriched with active enhancer associated chromatin features indicating that they could function as active enhancers. It has been proposed that some of the LTR subfamilies are essential in driving the expression of lineage-specific genes43,57. However, only a minority of putative RLTR13D6 subfamily-derived enhancers identified through epigenomic analyzes have been experimentally validated to function as enhancers17. This highlights the importance of functional validations using CRISPR-based perturbation of candidate TEs enriched with enhancer chromatin features. Although we found all of the tested CRISPR-edited H4K16ac+ L1s downregulated their putative target genes in cis. Genome-wide enhancer reporter assays, in combination with systematic genome-scale perturbation, are needed to identify what fraction of L1s and LTRs with H4K16ac function as enhancers.
TEs with acetylation marks, including H4K16ac, are bound by looping factors, including CTCF, RAD21, YY1, and ZNF143. Moreover, the fraction of these TEs that loop with genes is significantly larger than the fraction of TEs without acetylation marks that do so, further supporting the role of transcriptionally active TEs in rewiring the regulatory landscape in a species- and cell-type-specific manner.53. Since our results show that the MSL-H4K16ac axis drives transcription at TEs, including HERVs (Fig. 5b,f and Extended Data Fig. 4b), we hypothesize that MSL-H4K16ac-mediated transcription at TEs likely contributes to the rewiring of 3D chromatin organization at transcriptionally active TEs, as RNA polymerase II transcription drives enhancer-promoter contact60. The factors contributing to the recruitment of the MSL complex to the specific genomic region are unknown in mammals. Intriguingly, the role of MSL complex in co-opting TEs to rewire cis-regulatory elements appears to have been conserved during the evolution of dosage compensation in Drosophila miranda, in which a mutant helitron TE has been shown to recruit the MSL complex to the evolutionarily young X chromosome to increase transcription61. In Drosophila dosage compensation, expression of most X-linked genes is increased approximately twofold by H4K16ac, specifically in males62. This MSL-mediated X upregulation appears to be conserved in mammals, in which H4K16ac has been shown to upregulate genes on the single active X chromosome to balance expression with two copies of the autosomes63. Interestingly, X chromosomes have a higher number of L1s than autosomes64, suggesting that MSL-H4K16ac at L1s in the X chromosome could contribute to X upregulation.
TFs enriched at H4K16ac+ TEs (Fig. 3a and Extended Data Fig. 5) could contribute to maintaining MSL-H4K16ac and transcription at TEs. Notably, the MSL complex recruits YY1 to the Tsix promoter to activate its expression in mESCs32, suggesting a possible interplay between MSL and YY1 in regulating L1 transcription. Interestingly, MAFK, which has previously been reported to be enriched at TEs59, is enriched explicitly at H4K16ac+ L1 5’ UTRs, suggesting a potential interplay between MAFK and MSL complex.
Neuronal cells have high L1 expression and retrotransposition65; retrotransposon dysregulation is also linked with neurological disorders1. TEs and their transcriptional regulators play wider roles in shaping transcriptional networks during early human development66 Loss of function mutations in genes encoding KAT8 containing protein complexes such as KANSL1, MSL3 and KAT8 lead to neurodevelopmental disorders37,67,68,69. Enrichment of H4K16ac at the 5′ UTRs of L1s in human brain tissues suggests that altered gene expression programme due to TE dysregulation in the nervous system could be a possible mechanism for these disorders (Extended Data Fig. 3)42. Further studies on the specific role of H4K16ac in neuronal cell types will reveal whether H4K16ac dysregulation could contribute to neuronal-specific dysregulation of TEs and gene expression, contributing to neurodevelopmental and neurodegenerative disorders.
In yeast, H4K16ac regulates lifespan and cellular senescence70. Senescent cells show enrichment of H4K16ac in promoter regions of expressed genes71. Analysis of publicly available H4K16ac ChIP–seq data showed a dramatic loss of H4K16ac across L1s and LTRs in senescent cells in comparison with proliferating cells (Extended Data Fig. 9), suggesting that proliferating cells, compared with replicative senescent cells, have adapted to the permissive chromatin state at TEs. However, by contrast, L1 elements are known to be transcriptionally derepressed during cellular senescence and to activate the interferon I (IFN-I) response72. Further investigation will be needed to understand the direct role of the H4K16ac pathway in regulating L1 transcription linked to aging and senescence.
In summary, we show that H4K16ac-marked L1s and LTRs act as enhancers to regulate genes in cis. The act of transcription at L1 5′ UTRs and LTRs mediated by H4K16ac could contribute to chromatin topology and enhancer-mediated regulation of host gene expression in cis, as L1 and LTRs that are marked with histone acetylations are located within the regulatory elements, or they interact with genes. The permissive chromatin structure mediated by H4K16ac and H3K122ac could counteract the epigenetic repressive environment at TEs within the regulatory elements (Fig. 6e)73.
Cell culture and transduction
The H9 hESC line was a gift from L. Vallier’s lab with the MTA from WiCell. hESCs were grown on geltrex-coated plates (Thermo Fisher Scientific, A1413302) in mTeSR Plus medium (Stem Cell Technologies, 100-0276) supplemented with 100 U ml–1 penicillin–streptomycin (Gibco, 15140122) and passaged every 3–4 d with ReLeSR (StemCell Technologies, 100-0484), according to the manufacturer’s protocols. The doxycycline-inducible SpCas9 (iCas9-H1) hES cells were generated using parental H1-hESCs from WiCell. Briefly, H1 cells were transfected with plasmids from the Genome-CRISP Inducible Cas9 human AAVS1 Safe Harbor Knock-in Kit (GeneCopoeia) using Fugene HD (Promega) and selected with Puromycin (500 ng ml–1). Cells were single-cell sorted using FACS and grown in mTESR to make monoclonal lines. The resulting SpCas9 line was confirmed to be karyotypically normal and was tested for mycoplasma every 3 weeks.
Transformed dermal fibroblasts (TDF) expressing guide RNAs (3 guides per pool) targeting MSL1 and MSL3 and parental (WT) TDF lines were generated in P. Scaffidi’s lab (The Crick Institute). Cells were grown in MEM (Gibco, 11095080) supplemented with 10% FBS (Sigma, F7524), 1× Glutamax (Gibco, 35050061), 1× non-essential amino acid solution (Sigma, M7145) and 100 U ml–1 penicillin–streptomycin.
iCAS9 cells were transduced with three lentiviral guide RNAs targeting MSL1 and MSL3 (ref. 54). Parental iCAS9 H1, iCAS9 with MSL guide RNAs, TDF iCas9 transduced with MSL1, and MSL3 guide RNA pools were treated with 1 μg ml–1 doxycycline (Sigma) to generate the inducible MSL-KO lines. After 4 to 7 d of doxycycline induction, the knockout was validated by immunofluorescence followed by high-content microscopy and western blot using antibodies to H4K16ac and H3K27ac.
HEK293T cells were grown in DMEM, high glucose (Lonza, BE12-614Q), supplemented with 10% FBS (Sigma, F7524), 1× Glutamax (Gibco, 35050061) and 100 U/ml penicillin–streptomycin. HEK293 and HeLa cells were grown in DMEM, high glucose (Lonza, BE12-614Q) supplemented with 10% FBS (Sigma, F7524), 1× Glutamax (Gibco, 35050061) and 100 U ml–1 penicillin–streptomycin. PC3 and LNCaP cells were grown in RPMI medium (Gibco, 21875034) supplemented with 10% FBS (Sigma, F7524) and 100 U ml–1 penicillin–streptomycin. RWPE1 cells were grown in a keratinocyte serum-free medium (Gibco, 10724011) supplemented with 100 U ml–1 penicillin–streptomycin. K562 cells were grown in Iscove’s Modified Dulbecco’s Medium (Lonza, BE12-722F) supplemented with 10% FBS (Sigma, F7524) and 100 U ml–1 penicillin–streptomycin. SH-SY5Y cells were grown in DMEM/F12 (1:1) medium (Gibco, 11320033) supplemented with 10% FBS (Sigma, F7524) and 100 U ml–1 penicillin–streptomycin. All the cell lines were tested for mycoplasma contamination using EZ-PCR Kit (Geneflow, K1-0210).
For the generation of MSL3 stable knockdown H9 hESCs, cells were transduced with lentiviral particles (Sigma, Mission shRNAs, MSL3 sh1 TRCN0000022105, MSL3 sh2 TRCN0000022107) and mammalian nontargeting shRNA (SHC002V) at an MOI of 6. At 48 h after transduction, cells were selected with 0.5 μg ml–1 puromycin (Gibco, A1113803) for 48 h, and surviving cells were then allowed to recover until they formed viable colonies.
Cells were pelleted by centrifugation at 228g for 5 min at 4 °C and resuspended in RIPA buffer (150 mM sodium chloride, 1.0% NP-40, or Triton X-100, 0.5% sodium deoxycholate, 0.1% SDS (sodium dodecyl sulfate) and 50 mM Tris, pH 8.0) and protease inhibitors with benzonase (Novagen; final concentration, 1.25 U µl–1) and incubated for 30 min on ice with intermittent mixing. Extracts were sonicated for 5 cycles with Bioruptor (Diagenode) with the 30 s on and 30 s off cycles, and were cleared by centrifugation at 15,500g for 10 min at 4 °C. Equal amounts of protein extract were denatured in 1× Bolt LDS sample buffer (Thermo Fisher Scientific, B0007) and separated on Bolt Bis-Tris gels (Thermo Fisher Scientific, NW04120BOX, NW00122BOX), blotted on a polyvinylidene fluoride (PVDF) membrane (BioRad, 1704156) and immunoblotted with antibodies to MSL3 (Merck Millipore, ABE467, 1:1,000 dilution), L1 ORF1 (Merck Millipore, MABC1152, 1:1,000 dilution), H4K16ac (Abcam, ab109463, 1:5,000 dilution), H3K27ac (Abcam, ab4729, 1:5,000 dilution), ɑ-tubulin (Sigma, T9026, 1:5,000 dilution), and HERV (Novus Biologicals, NB100-93579, 1:500 dilution), and horseradish peroxidase (HRP)-conjugated goat anti-rabbit IgG H&L (Abcam, ab6721) and HRP-conjugated goat anti-mouse H&L (Thermo Fisher Scientific, 31430) secondary antibodies.
Immunofluorescence and imaging
Cells were grown on 24-well cell culture plates, fixed with 4% formaldehyde, incubated for 5 min with permeabilization buffer (PBS containing 0.1% Triton X-100), and blocked with PBS containing 0.1% Triton X-100 and 2% BSA) for 1 h. Primary antibodies to H4K16ac (Abcam, Ab109463, 1:500) and L1 ORF1 (Merck Millipore, MABC1152, 1:500 dilution) were added overnight at 4 °C, washed three times with PBS (10 min each) and incubated with anti-rabbit secondary antibodies (Abcam, Ab150080, 1:500) and DAPI (1:1000). After washing 3 times with PBS (10 min each), the cells were left in PBS and imaged with Incell2000.
CUT&Tag was performed according to Kaya-Okur et al.39 protocol with modifications to tissue processing, as described below. Experiments were performed in biological duplicates from each cell type. Approximately 100,000 cells were pelleted by centrifugation for 3 min at 600g at room temperature and resuspended in 500 μl of ice-cold NE1 buffer (20 mM HEPES-KOH pH 7.9, 10 mM KCl, 0.5 mM spermidine, 1% Triton X-100, and 20 % glycerol and cOmplete EDTA free protease inhibitor tablet) and were left to sit for 10 min on ice. Nuclei were pelleted by centrifugation for 4 min at 1,300g at 4 °C, resuspended in 500 μl of wash buffer, and held on ice until beads were ready. The required amount of BioMag Plus Concanavalin-A-conjugated magnetic beads (ConA beads, Polysciences) was transferred into the binding buffer (20 mM HEPES-KOH pH 7.9, 10 mM KCl, 1 mM CaCl2 and 1 mM MnCl2) and washed once in the same buffer; each time they were placed on a magnetic rack to allow the beads to separate from the buffer and resuspended in binding buffer. Then, 10 μl of beads was added to each tube containing cells and rotated on an end-to-end rotator for 10 min. After a quick spin to remove liquid from the cap, tubes were placed on a magnet stand to be cleared, the liquid was withdrawn, and 800 μl of antibody buffer containing 1 μg of the following primary antibodies was added: normal rabbit IgG (Santa Cruz Cat no sc-2027), H3K27ac (Abcam, ab4729), H4K16ac (Abcam, ab109463), H3K122ac (Abcam, ab33309), H3K4me1 (Abcam, ab8895), H3K36me3 (Abcam, ab9050)) H3K4me3 (Millipore, 07-473), H3K27me3 (Abcam, ab192985) and H3K9me3 (Abcam, ab176916)). The mixture was incubated at 4 °C overnight in a nutator. Secondary antibodies (guinea pig α-rabbit antibody, Antibodies online, ABIN101961) were added 1:100 in Dig-wash buffer (5% digitonin in wash buffer), and 100 µl was squirted in per sample while they were gently vortexed, to allow the solution to dislodge the beads from the sides, followed by incubation for 60 min on a nutator. Unbound antibodies were washed in 1 ml of Dig-wash buffer three times. Then, 100 μl of (1:250 diluted) protein-A-Tn5 loaded with adapters in Dig-300 buffer (20 mM HEPES pH 7.5, 300 mM NaCl, 0.5 mM spermidine with Roche cOmplete EDTA free protease inhibitor) was added to the samples, placed on nutator for 1 h and washed three times in 1 ml of Dig-300 buffer to remove unbound pA-Tn5. Next, 300 µL Tagmentation buffer (Dig-300 buffer + 5 mM MgCl2) was added while being gently vortexed, and samples were incubated at 37 °C for 1 h on an incubator. Tagmentation was stopped by adding 10 µl 0.5 M EDTA, 3 µl 10% SDS, and 2.5 µl 20 mg ml–1 Proteinase K to each sample. Samples were mixed by full-speed vortexing for ~2 s and incubated for 1 h at 55 °C to digest proteins. DNA was purified by phenol:chloroform extraction using phase-lock tubes (Quanta Bio) followed by ethanol precipitation. Libraries were prepared using NEBNext HiFi 2× PCR Master mix (M0541S) with a 72 °C gap-filling step, followed by 13 cycles of PCR with 10-second combined annealing and extension for the enrichment of short DNA fragments. Libraries were sequenced in Novaseq 6000 (Novogene) with 150 bp paired-end reads.
Total RNA was isolated from H9 hESCs using TRIzol reagent (ThermoFisher Scientific, 15596026). For RT–qPCR, cDNAs were prepared with LunaScript RT SuperMix Kit (NEB, E3010). For CRISPRi experiments, RNA isolation was done using a kit (Monarch, T2040S) followed by reverse transcription using LunaScript RT SuperMix Kit (NEB, E3010), qPCR using qPCRBIO SyGreen Mix Lo-ROX (PCRBio) in LightCycler 480 instrument (Roche). The list of specific primers used is given in Supplementary Table 4. RT–qPCR was done with three independent biological replicates, each of control shRNA and two independent shRNAs targeting MSL3 or relevant empty vector controls and dCAS9 systems for CRISPRi, on a StepOnePlus Real-Time PCR System (Applied Biosystems). Data were normalized to β-actin from three biological replicates.
RNA was isolated using Monarch RNA mini prep kit (NEB) with genomic DNA elimination column and on-column DNase treatment. MSL3 KD RNA sequencing libraries were prepared by spiking in equal amounts of The External RNA Controls Consortium (ERCC) Spike-in RNA Variant Control Sets (SIRV set 3, Lexogen), and 500 ng of RNA was used for depletion of rRNA using RiboCOP kit (Lexogen), followed by RNA-seq library preparation using CORALL Total RNA-Seq Library Prep Kit (Lexogen). Libraries were sequenced as 150 bp paired-end reads using Novaseq 6000. In the case of H1 iCAS9 and MSL1 KO RNA-seq, Ribosomal RNAs were depleted using NEBNext rRNA Depletion Kit (Human/Mouse/Rat) (NEB no. E7400) followed by library preparation using NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (NEB no. E7765).
ATAC-seq was performed as described in ref. 23, with modifications. The freshly collected 50,000 cells were washed in PBS and resuspended in a resuspension buffer (10 mM Tris-HCl, 10 mM NaCl, 3 mM MgCl2). Cells were resuspended and incubated on ice for 3 min in 50 µl of cold lysis buffer (0.1% NP-40, 0.1% Tween-20, 0.01% digitonin in resuspension buffer). Nuclei were washed in 1 ml of wash buffer (990 µl resuspension buffer, 0.1% Tween-20) by inversion three times. Nuclei were pelleted by centrifugation at 500g for 10 min at 4 °C. The nuclei were resuspended in 47.5 ml of Nextera Tagmentation buffer (Nextera DNA Sample Preparation Kit) and incubated with 2.5 μl of the Tn5 transposase (Nextera kit, Illumina) at 37 °C for 30 min. The resulting DNA fragments were purified using a miniElute column (Qiagen) and amplified by NEBNext High-Fidelity PCR Master Mix in a total volume of 50 μl. The thermocycling protocol for this reaction was 72 °C for 5 min, 98 °C for 30 s and five cycles of 98 °C for 10 s, 63 °C for 30 s, and 72 °C for 1 min. The universal adapter primer and a unique barcoded adapter primer (same as CUT&Tag primers) were used. To avoid over-amplification, after the initial five cycles, the number of remaining cycles required was estimated for each sample using qPCR by adding SYBRGreen and using 5 μl of the previous PCR as a template. The number of additional cycles was determined to be the number that it took for the qPCR to reach one-third of maximal fluorescence. The original PCR was then resumed, and each sample was cycled as necessary. After amplification, the samples were purified using AMPure XP beads. The libraries were sequenced as a minimum of 50 million 150 bp paired-end reads in Novoseq (Novogene PLC).
CRISPRi with dCAS9-KRAB
CRISPRi using dCAS9-KRAB was performed as described in ref. 31, with the following modifications. The CRISPR-Bac plasmid (PB_tre_dCas9_KRAB, Addgene ID 126030) (ref. 74), a kind gift from J. Mauro Calabrese, was mixed with the piggyBac-transposase plasmid in a 1:1 ratio (2 μg in each well of a 6-well plate) into opti-MEM, along with TransIT-LT1 in a 1:3 ratio (Mirus, MIR2300), and reverse transfected into H9 hESCs according to manufacturer’s protocol. The next day, the cells were allowed to recover from the transfection for 24 h and then selected with 100 μg ml–1 hygromycin B for 5 d. Surviving colonies were then expanded and reverse-transfected with various gRNA-expressing plasmids (cloned into pSLQ1371 as described in ref. 75, kind gift from S. Qi) with TransIT-LT1. Then, 1.25 × 106 cells were reverse transfected with 1 μg of the gRNA-expressing plasmid (per well of a 24-well plate). To improve the efficiency of plasmid delivery, the transfection was repeated the next day (forward transfection). At 48 h after the first transfection, cells were briefly selected with puromycin (0.5 μg ml–1) for 24 h and left to recover for 96 h. Cells were collected for RNA isolation and RT–qPCR.
CRISPR–Cas9 deletion of LINE1 elements in hESCs
Two crRNAs performed LINE1 element deletions and were designed to target nonrepetitive flanking sites of the LINE1 elements (Supplementary Table 5). Individual crRNAs were mixed with tracerRNAs Alt-R CRISPR–Cas9 tracrRNA, ATTO 550, and with CAS9 protein (Alt-R S.p. HiFi Cas9 Nuclease V3) to form ribo-nucleocomplex. Then, 200,000 H1 hESCs per well were nucleofected in the presence of Alt-R Cas9 Electroporation Enhancer in 16 strips format using primary cell kit (P3). hESCs were electroporated using a 4D nucleofector, the P3 Primary Cells 4D-Nucleofector X kit S (Lonza, LOV4XP3032), with the pulse program. After nucleofection, cells were resuspended in an hESC medium supplemented with ROCK inhibitors and seeded to geltrex-coated 96 wells for 2 d at 37 °C in a humidified incubator with 5% CO2. hESCs were split into 96 wells and 6-well plates for picking of single-cell colonies. The pool of cells 5 d after nucleofection was collected to check the deletion efficiency and for RT–qPCR. Cells were seeded in 6-well plates for picking single-cell colonies; the deletion was assessed by rapid DNA lysis and PCR using PCRBIO Rapid Extract PCR kit (PB10.24-40). For deletion of L1s at MOXD1 and RLN2 locus, pools of cells that were collected 5 d after nucleofection were used for RT–QPCR. PCR products were subjected to Sanger sequencing. Primer sequences used for screening are listed in Supplementary Table 4.
Analysis of CUT&Tag-seq data
For the CUT&Tag-seq, 150-bp paired-end reads were trimmed for adapters using the Trimmomatic tool and aligned to the hg38 genome through local Bowtie2 (version 2.4.5) with these parameters for pair-end mapping:–very-sensitive-local–no-unal–no-mixed–no-discordant–phred33 -I 10 -X 700 (ref. 76). For analyzes, multi-mapped reads were filtered out, and only uniquely mapped reads were retained with the samtools flag of -q 2 -f 0x200 (ref. 77). For Figure 5e, total reads, including multi-mapped reads, were retained for plotting heatmaps of ATAC-seq, CUT&Tag, and RNA-seq reads. For individual replicates, the bam files were sorted, indexed, and used for generating bedgraphs (for peak calling) and bigwigs. The bam files were sorted and indexed using the samtools (version 1.9) sort and the samtools index. Merging of multiple replicates was performed using samtools merge. The sorted bam files were used to generate bed, bedgraph and bigwig formats for individual modifications.
Peak calling and analyzes
The reads were extracted from the bam to bed by the bedtools bamtobed option78. Further reads were processed as mentioned in the SEACR (version 1.3) manual to get the bedgraph79. These bedgraph files were subjected to peak calling through SEACR with a stringent P of ≤1 × 10–6 with the norm and relaxed options.
Further bedtools with various options were used for transforming bed files, such as intersect, closest, sample, or shuffle. GNU awk editor was used for processing the bed files wherever required. Chromatin state predictions for the histone modification peaks were performed using ChromHMM40 (v1.10). For further analyses, the reproducible peaks were obtained by performing an intersection between peak files for each CUT&Tag replicates for histone PTMs. Overlap between the peaks for histone modifications for the H9 cells was assessed using the Intervene package80. While the overlapping peak counts were plotted as a Venn Diagram for each histone PTM and IgG, the peaks for histone PTM combinations were plotted as an upset plot showing the number of overlapping peaks (y axis) along with the histone modification peak numbers on the x axis.
TE enrichment analyses
Tracks for the repeats (rmsk) were obtained from the UCSC Genome Table Browser for hg19 and hg38. Reproducible peaks (consistent between two replicates) were used to generate the observed versus expected frequency for different TE classes (Alu, full-length L1, and LTR), gene body, and TSSs. These were calculated across various histone modification CUT&Tag peaks. The intersect count was obtained for each histone modification using bedtools (version v2.28.0) intersect (bedtools intersect -wa -u options) for the mentioned genomic elements. The expected occurrences in the genome were calculated by intersecting the genomic elements with the randomized genomic coordinates (number, length, and chromosome ID matched) across different histone modifications. The ratios for observed versus expected at these genomic elements for each histone PTM were calculated and used to plot as a heatmap using ggplot2 heatmap function in R.
To analyze the repeat content of the different histone modification peak sets, the fasta was obtained using the bedtools getfasta tool from the hg38.fa reference genome. The sequences were subjected to RepeatMasker (version 4.0.7) to get the repeat content across these genomic sequences for different histone modifications (http://www.repeatmasker.org).
Bigwig generation and plotting
Sorted bam files were subjected to bigwig generation via deepTools (version 3.5.1) (ref. 81) bamCoverage tool with –binSize 20 –normalize Using BPM or CPM–scaleFactor = 1.0–smoothLength 60–extendReads 150–centerReads options. The signal was normalized to IgG through bigwigCompare with option –operation first or subtract. The bigwig files were used for plotting signals or visualization in the genome browser. The genome-browser views were obtained by viewing the signal tracks in the UCSC Genome Browser. For the knockdown (in H9 cells) or knockout (in TDF cells) studies, the samples were normalized on the basis of the number of reads mapping to the Escherichia coli genome.
The plotting of signals at various genomic landmarks and bed coordinates was carried out through deepTools. Matrices were generated using deepTools computeMatrix reference-point or scale-regions option. These matrices were used for plotting heatmaps or average summary plots by the plotHeatmap or plotProfile function in deepTools, with or without clustering by the k-means algorithm. The sorted bam files were also used to study the correlation between the individual replicates for the CUT&Tag across histone PTMs and IgG using multiBamSummary function in deeptools with options bins and plotted as Pearson correlation heatmap using deeptools plotCorrelation function with options–skipZeros.
Further, H4K16ac signals from GSE84618 for brain (prefrontal lobe) tissues from young individuals, old individuals and individuals with Alzheimer’s disease were compared for the TE elements. Similarly, the bigwigs were obtained for the proliferative and senescence model in IMR90 cells (GSM1358821) for L1 and LTR subfamilies. The signals were compared as heatmaps or average summary profiles using above-mentioned tools. The signals at the H4K16ac-marked TEs were also plotted as average summary plots using plotProfile function for transcription factors YY1 (ENCODE ID: ENCFF904SDR), RAD21 (ENCODE ID: ENCFF506AAX) and CTCF (ENCODE ID: ENCFF473IZV) using ENCODE datasets (bigwigs) normalized as fold change over control.
TAD border annotation
To call TADs in human embryonic stem cells (H9), we used Hi-C data for two replicates from ref. 53 under accession numbers (GSM3262956 and GSM3262957). We first generated contact domains for all chromosomes at a 10-kb resolution using the Arrowhead tool from Juicer using Knight-Ruiz Normalization82. We extracted the borders of these TADs. To ensure we identify a robust set of TAD borders, we selected with a score above one that are common borders between the two replicates, assuming a maximum gap of 1 bin (10 kb). This resulted in 9,952 robust TAD borders. The TAD calling was performed on the hg19 reference genome, and to allow integration with the rest of the analysis, we lifted over the common TAD borders from hg19 to hg38.
Significant loops calling using Micro-C
Chromatin loops were called with the HiCCUPS tool from the Juicer software suite82 on micro-C data in H1 hESCs51. Loops were called using 5- and 10-kb resolution, 10% FDR, Knight-Ruiz normalization, a window of 7 and 5, peak width of 2 and 4 and, thresholds for merging loops of 0.02, 1.5, 1.75 and 2, and distance to merge peaks of 20 kb (–r 5000,10000 -k KR -f .1,.1 -p 4,2 -i 7,5 -t 0.02,1.5,1.75,2 -d 20000,20000).
Motif enrichment analysis
Enrichment of TF-binding sites (TFBS) at the TEs (>5 kb L1, Alu, and LTR) overlapping with histone modifications (H4K16ac, H3K27ac and H3K122ac) or a similar number of randomized genomic bins (chromosome, length matched) was performed. The experimentally determined TFBS for the H1-hESCs was fetched from the UCSC Genome Browser as TFBS clusters. The number of motifs for each TE class was either positive for histone modification or randomized genomic bins for histone modification for all TFBS. The internal distribution profile of motifs across each TEs was determined as percentage distribution and enrichment score defined as (Diff/Sum) of motif counts’ percentage between observed (TEs positive for histone modification) versus expected (randomized genomic bins) occurrence of motifs. The ratios obtained for each TFBS were plotted as a heatmap using the R package ComplexHeatmap83.
ATAC-seq data analysis
The ATAC-seq reads were processed for mapping by trimming for adapters using the Trimmomatic tool, followed by aligning to the hg38 genome through local Bowtie2(version) with these parameters for pair-end mapping: –very-sensitive-local–no-unal–no-mixed–no-discordant–phred33 -I 10 -X 700 (Alteration in -X to 2000 was done to allow the mapping of reads for H9 cells). The mapped reads were processed as described above (CUT&Tag data analysis) to generate the bigwigs. The signal was normalized as log2(fold change) for control over MSL3 knockdown using the bigwigCompare function in deeptools with–skipZeroOverZero–operation log2. Using the same matrix generation and heatmap tools, further ATAC-seq signals were compared at the full-length L1 subfamilies and LTR subclasses.
RNA-seq data analysis
The reads obtained from RNA-seq for H9 cells were mapped to the human genome using STAR84 following the Bluebee-CORALL pipeline of mapping. For TDFs, the RNA-seq datasets were downloaded from NCBI-GEO for accession ID GSE144019. The reads were mapped to hg38 following the same pipeline as H9 except for the single-end specification in TDFs.
For differential enrichment analysis, the fragment counts for each dataset were obtained using the featurecounts tool from the SubRead package. The GTF file for genes was obtained from ENSEMBL, and for different TE classes (Alu, L1 and LTR), it was fetched from the UCSC Table Browser. These feature counts were used for the differential enrichment analyses using the DESeq2 package in R. The DESeq was performed with defaults85. The differential expression of genes was visualized as a volcano plot. The differential gene expression table can be found in the additional data.
The uniquely mapped reads were filtered using samtools for MAPQ of 255. Further, the unique alignments’ bam files were merged, sorted, and indexed using samtools, followed by bigwig generation using the deepTools function bamCoverage. The normalized signal was generated as log2(fold change) for control over MSL3 knockdown using bigwigCompare function in deeptools with–operation log2. The signal was compared at the full-length L1, as well as at genes.
The RNA signals across the various subfamilies of TEs, as well as genes in the flanks (<10,000, 10,000–25,000 and 25,000–50,000 kilo base (kb) of the H4K16ac-marked LTR and full-length L1, were calculated as RPKM from the read counts obtained for each gene or TEs across the multiple replicates for H9 (control or MSL3 KD) and TDF (WT or MSL1 KO). The RPKM signal was then plotted as a violin and box plot using ggplot2 in R. For comparison, the same number of TEs that are H4K16ac+ and H4K16ac– in TDFs was obtained by subsampling using the bedtools sample. The signal was plotted as the log10 value of the RPKM on the y axis. The RNA signals were plotted as violin plots with box plots with a median. The statistical analyses for all the violin plot comparisons were performed using the Dunn test with Bonferroni correction.
STARR-seq data analysis
To assess the potential of the TEs marked by H4K16ac to act as enhancers, we compared the STARR-seq signal in K562 (ENCFF611ZHY) and SH-SY5Y (ENCFF571ARG) cells at the TE elements: full-length L1 (>5 kb) and ERV/LTRs. The signal was plotted as a heatmap from the start (L1) or center (LTR) of the TE elements sorted according to the H4K16ac signal. Further, the signal was compared as violin plots for the four sets of peak combinations with respect to overlap among peaks that overlap with TEs. These were H3K4me1+ only, H3K4me1+H3K27ac+H4K16ac+, H4K16ac+H3K4me1– and H4K16ac+H3K4me1+ peaks that overlap with LTRs.
For all the RT–qPCRs, an unpaired t-test with Welch correction (two-stage step-up) was performed between the groups using GraphPad Prism9. The statistical tests were performed for all the violin plots using the Dunn test function in the R tool rstatix. Dunn’s test with Bonferroni correction was used for multiple-group comparisons between the groups.
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
The data discussed in this publication have been deposited in NCBI’s Gene Expression Omnibus and are accessible through GEO Series accession number GSE200770 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE200770). CUT&Tag raw data as well as processed data files (peaks and bigwigs), can be accessed at NCBI under accession ID GSE200768, RNA-seq raw data files can be accessed under accession ID GSE200769, and ATAC-seq datasets can be accessed under accession ID GSE200767. All the datasets generated and public datasets used in this study are detailed in Supplementary Table 1. Source data are provided with this paper.
All the analyses in this manuscript have been carried out using publicly available tools. No custom code was generated for this purpose. The methodology contains the details of the analysis steps involved.
Hancks, D. C. & Kazazian, H. H. Roles for retrotransposon insertions in human disease. Mob. DNA 7, 9 (2016).
Burns, K. H. Transposable elements in cancer. Nat. Rev. Cancer 17, 415–424 (2017).
Molaro, A. & Malik, H. S. Hide and seek: how chromatin-based pathways silence retroelements in the mammalian germline. Curr. Opin. Genet. Dev. 37, 51–58 (2016).
Almeida, M. V., Vernaz, G., Putman, A. L. K. & Miska, E. A. Taming transposable elements in vertebrates: from epigenetic silencing to domestication. Trends Genet. 38, 529–553 (2022).
Karimi, M. M. et al. DNA methylation and SETDB1/H3K9me3 regulate predominantly distinct sets of genes, retroelements, and chimeric transcripts in mescs. Cell Stem Cell 8, 676–687 (2011).
Robbez-Masson, L. et al. The HUSH complex cooperates with TRIM28 to repress young retrotransposons and new genes. Genome Res. 28, 836–845 (2018).
Rowe, H. M. et al. KAP1 controls endogenous retroviruses in embryonic stem cells. Nature 463, 237–240 (2010).
Bulut-Karslioglu, A. et al. Suv39h-dependent H3K9me3 marks intact retrotransposons and silences LINE elements in mouse embryonic stem cells. Mol. Cell 55, 277–290 (2014).
Walsh, C. P., Chaillet, J. R. & Bestor, T. H. Transcription of IAP endogenous retroviruses is constrained by cytosine methylation. Nat. Genet. 20, 116–117 (1998).
Hermant, C. & Torres-Padilla, M. E. TFs for TEs: the transcription factor repertoire of mammalian transposable elements. Genes Dev. 35, 22–39 (2021).
Jacques, P. É., Jeyakani, J. & Bourque, G. The majority of primate-specific regulatory sequences are derived from transposable elements. PLoS Genet. 9, e1003504 (2013).
Vierstra, J. et al. Mouse regulatory DNA landscapes reveal global principles of cis-regulatory evolution. Science 346, 1007–1012 (2014).
Jachowicz, J. W. et al. LINE-1 activation after fertilization regulates global chromatin accessibility in the early mouse embryo. Nat. Genet. 49, 1502–1510 (2017).
Upton, K. R. et al. Ubiquitous L1 mosaicism in hippocampal neurons. Cell 161, 228–239 (2015).
Fueyo, R., Judd, J. & Feschotte, C. Roles of transposable elements in the regulation of mammalian transcription. Nat. Rev. Mol. Cell Biol. 24, 19–24 (2022).
Sundaram, V. & Wysocka, J. Transposable elements as a potent source of diverse cis-regulatory sequences in mammalian genomes. Philos. Trans. R. Soc. B Biol. Sci. 375, 20190347 (2020).
Todd, C. D, Taylor, D. & Branco, M. R. Functional evaluation of transposable elements as enhancers in mouse embryonic and trophoblast stem cells. eLife 8, e44344 (2019).
He, J. et al. Transposable elements are regulated by context-specific patterns of chromatin marks in mouse embryonic stem cells. Nat. Commun. 10, 34 (2019).
Percharde, M. et al. A LINE1-nucleolin partnership regulates early development and ESC identity. Cell 174, 391–405 (2018).
Schmidt, D. et al. Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages. Cell 148, 335–348 (2012).
Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory activities of transposable elements: from conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2017).
Creyghton, M. P. et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl Acad. Sci. USA 107, 21931–21936 (2010).
Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Deniz, Ö. et al. Endogenous retroviruses are a source of enhancers with oncogenic potential in acute myeloid leukaemia. Nat. Commun. 11, 3506 (2020).
Kheradpour, P. et al. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23, 800–811 (2013).
Taylor, G., Eskeland, R., Hekimoglu-Balkan, B., Pradeepa, M. & Bickmore, W. A. H4K16 acetylation marks active genes and enhancers of embryonic stem cells, but does not alter chromatin compaction. Genome Res. 23, 2053–2065 (2013).
Wang, Z. et al. Prediction of histone post-translational modification patterns based on nascent transcription data. Nat. Genet. 54, 295–305 (2022).
Shogren-Knaak, M. et al. Histone H4-K16 acetylation controls chromatin structure and protein interactions. Science 311, 844–847 (2006).
Tropberger, P. et al. Regulation of transcription through acetylation of H3K122 on the lateral surface of the histone octamer. Cell 152, 859–872 (2013).
Pradeepa, M. M. et al. Histone H3 globular domain acetylation identifies a new class of enhancers. Nat. Genet. 48, 681–686 (2016).
Chelmicki, T. et al. MOF-associated complexes ensure stem cell identity and Xist repression. eLife 3, e02024 (2014).
Ravens, S. et al. Mof-associated complexes have overlapping and unique roles in regulating pluripotency in embryonic stem cells and during differentiation. eLife 2014, 1–23 (2014).
Radzisheuskaya, A. et al. Complex-dependent histone acetyltransferase activity of KAT8 determines its role in transcription and cellular homeostasis. Mol. Cell 81, 1749–1765 (2021).
Chatterjee, A. et al. Acetyl transferase regulates transcription and respiration in mitochondria. Cell 167, 722–738 (2016).
Li, X. et al. The histone acetyltransferase MOF is a key regulator of the embryonic stem cell core transcriptional network. Cell Stem Cell 11, 163–178 (2012).
Basilicata, M. F. et al. De novo mutations in MSL3 cause an X-linked syndrome marked by impaired histone H4 lysine 16 acetylation. Nat. Genet. 50, 1 (2018).
Li, L. et al. Lysine acetyltransferase 8 is involved in cerebral development and syndromic intellectual disability. J. Clin. Invest. 130, 1431–1445 (2020).
Kaya-okur, H. S. et al. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat. Commun. 10, 1930 (2019).
Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).
Larschan, E. et al. X chromosome dosage compensation via enhanced transcriptional elongation in Drosophila. Nature 471, 115–118 (2011).
Nativio, R. et al. Dysregulation of the epigenetic landscape of normal aging in Alzheimer’s disease. Nat. Neurosci. 21, 1018 (2018).
Fuentes, D. R., Swigut, T. & Wysocka, J. Systematic perturbation of retroviral LTRs reveals widespread long-range effects on human gene regulation. eLife 7, 1–29 (2018).
Pontis, J. et al. Hominoid-specific transposable elements and KZFPs facilitate human embryonic genome activation and control transcription in naive human ESCs. Cell Stem Cell 24, 724–735 (2019).
Lee, D. et al. STARRPeaker: uniform processing and accurate identification of STARR-seq active regions. Genome Biol. 21, 298 (2020).
Athanikar, J. N., Badge, R. M. & Moran, J. V. A YY1-binding site is required for accurate human LINE-1 transcription initiation. Nucleic Acids Res. 32, 3846–3855 (2004).
Macfarlan, T. S. et al. Endogenous retroviruses and neighboring genes are coordinately repressed by LSD1/KDM1A. Genes Dev. 25, 594–607 (2011).
Xu, H. et al. Cohesin Rad21 mediates loss of heterozygosity and is upregulated via Wnt promoting transcriptional dysregulation in gastrointestinal tumors. Cell Rep. 9, 1781–1797 (2014).
Sun, X. et al. Transcription factor profiling reveals molecular choreography and key regulators of human retrotransposon expression. Proc. Natl Acad. Sci. USA 115, E5526–E5535 (2018).
Weintraub, A. S. et al. YY1 is a structural regulator of enhancer-promoter loops. Cell 171, 1573–1588 (2017).
Krietenstein, N. et al. Ultrastructural details of mammalian chromosome architecture. Mol. Cell 78, 554–565 (2020).
Lyu, X., Rowley, M. J. & Corces, V. G. Architectural proteins and pluripotency factors cooperate to orchestrate the transcriptional response of hESCs to temperature stress. Mol. Cell 71, 940–955 (2018).
Zhang, Y. et al. Transcriptionally active HERV-H retrotransposons demarcate topologically associating domains in human pluripotent stem cells. Nat. Genet. 51, 1380–1388 (2019).
Monserrat, J. et al. Disruption of the MSL complex inhibits tumour maintenance by exacerbating chromosomal instability. Nat. Cell Biol. 23, 401–412 (2021).
Samata, M. et al. Intergenerationally maintained histone H4 lysine 16 acetylation is instructive for future gene activation. Cell 182, 127–144 (2020).
Pehrsson, E. C., Choudhary, M. N. K., Sundaram, V. & Wang, T. The epigenomic landscape of transposable elements across normal human development and anatomy. Nat. Commun. 10, 1–16 (2019).
Kunarso, G. et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat. Genet. 42, 631–634 (2010).
Sundaram, V. et al. Functional cis-regulatory modules encoded by mouse-specific endogenous retrovirus. Nat. Commun. 8, 14550 (2017).
Sundaram, V. et al. Widespread contribution of transposable elements to the innovation of gene regulatory networks. Genome Res. 24, 1963–1976 (2014).
Zhang, S., Übelmesser, N., Barbieri, M. & Papantonis, A. Enhancer-promoter contact formation requires RNAPII and antagonizes loop extrusion. Nat. Genet. 55, 832–840 (2023).
Christopher, E. E. & Bachtrog, D. Dosage compensation via transposable element mediated rewiring of a regulatory network. Science 342, 846–850 (2013).
Conrad, T. & Akhtar, A. Dosage compensation in Drosophila melanogaster: epigenetic fine-tuning of chromosome-wide transcription. Nat. Rev. Genet. 13, 123–134 (2011).
Deng, X. et al. Mammalian X upregulation is associated with enhanced transcription initiation, RNA half-life, and MOF-mediated H4K16 acetylation. Dev. Cell 25, 55–68 (2013).
Boyle, A. L., Ballard, S. G. & Ward, D. C. Differential distribution of long and short interspersed element sequences in the mouse genome: chromosome karyotyping by fluorescence in situ hybridization. Proc. Natl Acad. Sci. USA 87, 7757–7761 (1990).
Macia, A. et al. Engineered LINE-1 retrotransposition in nondividing human neurons. Genome Res. 27, 335–348 (2017).
Pontis, J. et al. Primate-specific transposable elements shape transcriptional networks during human development. Nat. Commun. 13, 7178 (2022).
Sharp, A. J. et al. Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nat. Genet. 38, 1038–1042 (2006).
Shaw-Smith, C. et al. Microdeletion encompassing MAPT at chromosome 17q21.3 is associated with developmental delay and learning disability. Nat. Genet. 38, 1032–1037 (2006).
Koolen, D. A. et al. Mutations in the chromatin modifier gene KANSL1 cause the 17q21.31 microdeletion syndrome. Nat. Genet. 44, 639–641 (2012).
Dang, W. et al. Histone H4 lysine 16 acetylation regulates cellular lifespan. Nature 459, 802–807 (2009).
Rai, T. S. et al. HIRA orchestrates a dynamic chromatin landscape in senescence and is required for suppression of neoplasia. Genes Dev. 28, 2712–2725 (2014).
De Cecco, M. et al. L1 drives IFN in senescent cells and promotes age-associated inflammation. Nature 566, 73–78 (2019).
Liu, N. et al. Selective silencing of euchromatic L1s revealed by genome-wide screens for L1 regulators. Nature 553, 228–232 (2018).
Schertzer, M. D. et al. A piggyBac-based toolkit for inducible genome editing in mammalian cells. RNA 25, 1047–1058 (2019).
Chen, B. et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell 155, 1479–1491 (2013).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Meers, M. P., Tenenbaum, D. & Henikoff, S. Peak calling by sparse enrichment analysis for CUT & RUN chromatin profiling. Epigenetics Chromatin 12, 42 (2019).
Khan, A. & Mathelier, A. Intervene: a tool for intersection and visualization of multiple gene or genomic region sets. BMC Bioinform. 18, 287 (2017).
Ramírez, F., Dündar, F., Diehl, S., Grüning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 1–21 (2014).
We thank QMUL epigenetics hub members A. de Mendoza, C. Bell, V. Rakyan, and L. Stojic (QMUL) for discussing and reading the manuscript. We thank L. Vallier (Cambridge UK, with MTA from WiCell) for sharing the H9 cell line. We thank S. Henikoff (Fred Hutchinson Cancer Research Center), E. Schulz (Max Planck Institute for Molecular Genetics), and C. Martin (Queen Mary University of London) for sharing reagents. We thank P. Dubey, I. Alic, and A. Murrey (QMUL) for help with hESC cell culture. We thank G. Warnes and L. Gammon (Blizard Institute, QMUL) core facilities for help with the flow sorter and high-content imaging analysis. This research used Apocrita HPC, supported by QMUL Research-IT. Funding: Medical Research Council UKRI/MRC grant (MR/T000783/1) (M.M.P., D.P., M.P., F.B.), a Barts charity Rising Stars award and a Barts charity small grant (MGU0475) (M.M.P.), a Marie Skłodowska-Curie grant 896079 (J.S.), BBSRC (BB/T000031/1) (M.R.B.), and Cancer Research UK, UKRI/MRC, and the Wellcome Trust Welcome Trust (FC001152) (P.S.).
For open access, the author has applied a CC BY public copyright license to any author accepted manuscript version arising from this submission.
The authors declare no competing interests.
Peer review information
: Nature Structural & Molecular Biology thanks Ruchi Shukla and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Carolina Perdigoto and Dimitrios Typas, in collaboration with the Nature Structural & Molecular Biology team. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Related to Fig. 1. CUT&Tag data correlation data and overlap of histone modification peaks and at LINE1.
a. Pearson correlation heatmap for the CUT&Tag replicates across histone modifications in H9 cells. b. Upset plot showing the intersection of CUT&Tag peaks at TE (LTR, Alu and full-length L1) families. The X-axis shows the total number of peaks, and the Y-axis is the number of peaks intersected. c. Heatmaps showing signals (CPM) for the H3K9me3, H4K16ac, H3K27ac and H3K122ac at the full-length L1s marked by either H3K9me3 (top of each heatmap) or H4K16ac (bottom of each heatmap).
Extended Data Fig. 2 Related to Fig. 1, Overlap of CUT&Tag data peaks from replicates.
Venn diagrams showing reproducibility for the CUT&Tag peaks among the replicates called for the Histone PTMs.
Extended Data Fig. 3 Related to Figs. 1 and 2. H4K16ac is enriched at TEs in human brain, cancer and mouse stem cells.
a. Heatmap showing H4K16ac ChIPseq signal across full-length L1s in the human brain (prefrontal lobe) tissues from young, old and Alzheimer’s patients from Nativio et al. 2018. b. Heatmap showing two replicates of H4K16ac and H3K27ac CUT&Tag signal across RefSeq genes (left) and full-length L1s (>5 kb) from the mouse genome. c. Stacked bar plot showing ratio (Y-axis) of observed over expected (background) for the TSS, gene body and TEs (LTR, Alu and L1) overlapping with H4K16ac or H3K27ac (X-axis) in SHSY-5Y, K562 and TDF cells. d and e. Observed over expected enrichment ratio for H4K16ac and H3K27ac mouse embryonic stem cells (E 14 mESCs) CUT&Tag peaks at transposable elements from mouse genome (from Repbase).
Extended Data Fig. 4 Related to Fig. 2. a STARRseq and histone mark.
Related to Fig. 2. a. Frequency distribution of LTR elements (Y-axis, log 10 percentage) showing the STARR-seq signal enrichment (X-axis) that are H3K27ac–/H4K16ac+, H3K227ac+/H4K16ac+, H3K227ac+/H4K16ac– or H3K227ac–/H4K16ac–.b. Heatmaps showing signals (CPM) for the H4K16ac, H4K12ac, H3K27ac and H3K122ac, H3K4me1, H3K9me3 and ATACseq at the LTR subfamilies and Alu subfamilies.
Extended Data Fig. 5 Related to Fig. 3. Continuation of Fig. 3.
Like Fig. 3a, transcription factor binding sites enriched at the H3K27ac, H4K16ac and H3K122ac marked LTR and Alu in hESCs.
Extended Data Fig. 6 Related to Fig. 4. CRISPR CAS9 mediated deletion of L1 elements.
a) Illustration showing the Full length LINE1 (L1, ~7 kb), guideRNAs sites for CAS9 cutting (scissors), and the flanking primers (green arrow) and internal reverse primer (orange arrow) used for genotyping. b) Agarose gel electrophoresis showing PCR products for L1PA10 and L1PA7 clones, amplified using L1 flanking primers. ~500 bp amplification showing deletion of L1 (above). PCR with internal reverse primers showing presence of wild type allele (below). c) PCR amplicons with L1 flanking primers for pool of cells showing nearly 50% deletion efficiency for L1PA7 at the RLN2 locus and L1PA8 at the MOXD1 locus.
Extended Data Fig. 7 Related to Fig. 5. Depletion of MSL proteins leads to downregulation of TEs.
a. Immunofluorescence images showing H4K16ac levels (Magenta) in WT and MSL1 KO TDFs (left). Western blots showing H4K16ac level in MSL1 KO and WT TDFs (Right). b. Violin plots showing the log10 RPKM signal of RNAseq reads for parental (WT) and doxycycline-inducible MSL1 (MSL1 KO) for L1s (left panel) and LTRs (right panel) that are either H4K16ac+ or H4K16ac. c. Violin plots for RNAseq signal across different ERV subfamilies (ERV24, ERVL, HERVK, HERVH and HERVL; top), and LTR subfamilies (LTR5, LTR7, LTR9 and LTR16; below). Statistical tests for all violin plots were performed as Dunn test with Bonferroni correction. d. Heatmap comparing the CUT&Tag signals for H4K16ac and H3K9me3 for WT and MSL1 KO samples across L1 subfamilies, LTRs and ERV subfamilies, Alu subfamilies and NCBI refseq genes.
Extended Data Fig. 8 Related to Fig. 5, MSL3 depletion data.
a. IGV browser tracks showing RNAseq reads (RPKM) at MSL1, MSL2, MSL3 and KAT8 locus in control knockdown (nontargeting shRNA) and MSL3 knockdown (n = 4, biological replicates) H9 hESCs. b. Volcano plot showing up-and down-regulated genes upon lentiviral shRNA mediated knockdown of MSL3. Pluripotency-associated genes (for example, POU5F1, NANOG, SOX2) and genes expressed in neuronal differentiation (for example, PAX6, GFAP, NES, NEUROD1) are shown in arrow marks. c. Violin plots for the RNAseq signal (log10 RPKM) for the Control-shRNA and MSL3-shRNA knockdown inH9 hESCs for genes that contain H4K16ac peak (H4K16ac+) and genes that lack H4K16ac peaksor (H4K16ac–). d. Like C but for LTR subfamilies (ERV24, ERVL, LTR5, LTR7, LTR9 and LTR16; bottom panel).
H4K16ac ChIPseq/input signal for L1 (left) and LTR (right) subfamilies in the proliferative and senescent IMR90 cell line.
About this article
Cite this article
Pal, D., Patel, M., Boulet, F. et al. H4K16ac activates the transcription of transposable elements and contributes to their cis-regulatory function. Nat Struct Mol Biol 30, 935–947 (2023). https://doi.org/10.1038/s41594-023-01016-5