A gene silencing screen uncovers diverse tools for targeted gene repression in Arabidopsis

DNA methylation has been utilized for target gene silencing in plants. However, it is not well understood whether other silencing pathways can be also used to manipulate gene expression. Here we performed a gain-of-function screen for proteins that could silence a target gene when fused to an artificial zinc finger. We uncovered many proteins that suppressed gene expression through DNA methylation, histone H3K27me3 deposition, H3K4me3 demethylation, histone deacetylation, inhibition of RNA polymerase II transcription elongation or Ser-5 dephosphorylation. These proteins also silenced many other genes with different efficacies, and a machine learning model could accurately predict the efficacy of each silencer on the basis of various chromatin features of the target loci. Furthermore, some proteins were also able to target gene silencing when used in a dCas9-SunTag system. These results provide a more comprehensive understanding of epigenetic regulatory pathways in plants and provide an armament of tools for targeted gene manipulation.

DNA methylation has been utilized for target gene silencing in plants. However, it is not well understood whether other silencing pathways can be also used to manipulate gene expression. Here we performed a gain-of-function screen for proteins that could silence a target gene when fused to an artificial zinc finger. We uncovered many proteins that suppressed gene expression through DNA methylation, histone H3K27me3 deposition, H3K4me3 demethylation, histone deacetylation, inhibition of RNA polymerase II transcription elongation or Ser-5 dephosphorylation. These proteins also silenced many other genes with different efficacies, and a machine learning model could accurately predict the efficacy of each silencer on the basis of various chromatin features of the target loci. Furthermore, some proteins were also able to target gene silencing when used in a dCas9-SunTag system. These results provide a more comprehensive understanding of epigenetic regulatory pathways in plants and provide an armament of tools for targeted gene manipulation.
Transcriptional gene regulation is a fundamental biological process that controls the on or off states of gene expression, and involves DNA methylation, histone modification and chromatin remodelling 1 . In plants, DNA methylation is generally linked to transcriptional gene silencing 2 . For example, the Arabidopsis FWA gene is normally DNA methylated and silenced in all tissues, except in the developing endosperm where it is demethylated and expressed 3 .
Artificial zinc fingers are DNA binding domains that can be designed to bind a specific sequence and guide fusion proteins to specific loci 10 . For example, artificial zinc finger 108 (hereafter ZF) was designed to bind the Arabidopsis FWA promoter in the region that is normally methylated in Col-0 wild-type plants 11 . When ZF was fused with the RNA-directed DNA methylation (RdDM) component proteins restored an early flowering phenotype of the fwa epiallele to a similar level as in wild-type Col-0 plants, (Fig. 1c and Extended Data Fig. 1a), even though the FWA silencing was less efficient than ZF-SUVH2 ( Fig. 1b and Extended Data Fig. 1b).
Although ZF was designed to bind the FWA promoter, it also binds to thousands of off-target sites throughout the genome 12 . We therefore performed RNA sequencing (RNA-seq) to determine whether the different ZF fusion proteins could also regulate other genes near these off-target binding sites. We analysed genes near 6,091 ZF chromatin immunoprecipitation followed by sequencing (ChIP-seq) peaks that showed at least four-fold enrichment of ZF ChIP-seq signal relative to a fwa non transgenic control (Supplementary Table 2). The differentially expressed genes (DEGs) near ZF off-target peaks were analysed using region-associated DEG (RAD) analysis 20 . This analysis showed that all the ZF fusions, except the control fusion EYFP-ZF, showed a higher number of downregulated DEGs than upregulated DEGs when the ZF peak was within 1 kb of the start site of the gene ( Fig. 1e and Extended Data Fig. 1c), suggesting that all the identified fusions can repress many other genes in addition to FWA.
Previous studies showed that mutation of some of the effector genes displayed early flowering phenotypes, including ELF7 (ref. 21 ), LHP1 (ref. 22 ), JMJ14 (ref. 7 ) and CPL2 (ref. 23 ). Therefore, we wanted to ensure that expression of the ZF fusions was not causing a suppression of the expression of the endogenous genes. We compared the expression level of the endogenous gene by examining reads corresponding to the 3′ untranslated region (UTR) regions of the genes, which were excluded from ZF fusion transgenes. We found that the 3′ UTR expression levels were unaltered in the transgenic plants, suggesting that the early flowering phenotype of these ZF fusion lines was due to the silencing of FWA, rather than silencing of the endogenous effector encoding genes ( Supplementary Fig. 1).

The heritability of target gene silencing in ZF fusions
To test inheritance of the DNA methylation in ZF-SUVH2, bisulfite amplicon sequencing analysis (BS-PCR-seq) was performed to evaluate DNA methylation at FWA promoter regions in T2 lines that still contained the transgene and in lines that had segregated away the transgene (null segregants). DNA methylation was observed in both transgenic and null segregant lines, showing that DNA methylation established by ZF-SUVH2 could be heritable in the absence of the transgene (Fig. 2a), as has been shown for other fusion proteins that target DNA methylation to FWA 11,12,24 . As expected, the early flowering phenotype was also inherited in many null segregant plants in the T2 population (Fig. 2b  and Supplementary Table 3).
We similarly analysed the heritability of the flowering time phenotypes for the ZF fusions that were not associated with FWA DNA methylation. We found that, in T2 plants that inherited the fusion protein transgenes, the early flowering phenotype was usually maintained (Fig. 2b). However, in all null segregant plants, the flowering time reverted to the typical late flowering phenotype of fwa plants (Fig. 2b), showing that the persistent presence of the fusion protein transgenes was needed for FWA silencing. Within the population of transgene-containing T2 plants, we observed wide variation in flowering time (Fig. 2b). This was probably due to differences in the expression SUVH9 and transformed into fwa epiallele-containing plants, FWA DNA methylation and suppression were restored 11 . It was later shown that ZF fusions with many other RdDM-related proteins also caused FWA silencing and methylation [12][13][14] . However, it is largely unknown whether ZF fusions with non-DNA-methylation-related chromatin proteins can also trigger target gene silencing 15 .
In this Article, we fused a panel of 270 putative Arabidopsis chromatin proteins to ZF, and screened for fusions capable of silencing FWA. We identified 14 proteins capable of silencing through diverse mechanisms including establishment of DNA methylation, H3K27me3 deposition, H3K4me3 demethylation, H3K9, H3K14, H3K27 and H4K16 deacetylation, inhibition of RNA polymerase II (Pol II) transcriptional elongation, or Pol II dephosphorylation. We found that some target genes were only silenced by certain effector proteins, and a machine learning model could accurately predict which genes would be effectively silenced based on proximal chromatin features and expression levels of the target genes. Some proteins were also able to target gene silencing using the CRISPR/dCas9 based SunTag system 14,16 . These findings lay a foundation for more detailed mechanistic understanding of gene silencing pathways and provide an array of new tools for targeted gene silencing.

A gain-of-function screen for regulators of gene silencing
We utilized the native Arabidopsis gene FWA as a reporter to screen for regulators of gene silencing. FWA encodes a transcription factor that causes a late flowering phenotype when overexpressed, resulting in a greater number of leaves produced before flowering. In Col-0 wild-type plants, FWA is completely silenced by DNA methylation, while in fwa epialleles that have permanently lost this DNA methylation, FWA misexpression causes late flowering 17 . To find putative gene silencing regulators, we searched the Arabidopsis ORFeome collections 18,19 for chromatin-related proteins and also added other proteins of interest that were not present in the ORFeome collections (Supplementary Table 1). A total of 270 putative Arabidopsis chromatin proteins were fused with a ZF designed to bind the FWA promoter 11 . These fusions were individually transformed into fwa plants to screen for regulators that triggered FWA silencing and restored an early flowering phenotype.
This screen identified 14 effector proteins that successfully restored the early flowering phenotype of the fwa epiallele (Fig. 1a). Among these effectors, DMS3, SUVH2, SUVH9 and MORC1 are known players in the RdDM pathway, and previous studies have shown that DMS3-ZF, SUVH9-ZF and MORC1-ZF could restore methylation and silencing of FWA 11,12 . SUVH2 is a close homologue of SUVH9 that functions in RdDM 11 , and it was thus not unexpected that ZF-SUVH2 would also methylate and silence FWA expression ( Fig. 1b-d).
We also identified many gene regulators that silenced FWA in a DNA methylation-independent manner ( Fig. 1a-d), including two polycomb-group proteins MULTICOPY SUPRESSOR OF IRA1 (MSI1) and LIKE HETEROCHROMATIN PROTEIN 1 (LHP1), two JMJs JMJ14 and JMJ18, four HDACs HD2A, HD2B, HD2C and HDA6, a Pol II-associated factor 1 (PAF1) homologue EARLY FLOWERING 7 (ELF7) and Carboxyl-terminal Domain Phosphatase-like 2 (CPL2) (Fig. 1a). The ZF fusions of these  level of the fusion proteins, as we observed by western blotting that plants with high levels of transgene expression tended to have an early flowering phenotype, while plants with low protein expression levels tended to have a late flowering phenotype (Fig. 2c).

Targeted gene silencing by H3K27me3 deposition
MSI1 is a component of the PRC2 complex that also interacts with the Arabidopsis PRC2 accessory protein LHP1 (refs. [25][26][27], both of which are important for H3K27me3-mediated gene silencing. Therefore, to Type CG CHG CHH test whether H3K27me3 was deposited at FWA and ZF off-target sites, H3K27me3 and H3 ChIP-seq were performed in MSI1-ZF, LHP1-ZF and the fwa control. Indeed, H3K27me3 ChIP-seq signals were higher at FWA in LHP1-ZF and MSI1-ZF than fwa control plants (Fig. 3a). We also observed H3K27me3 enrichment in LHP1-ZF and MSI1-ZF when plotting over 6,091 ZF off-target sites ( Fig. 3b and Extended Data Fig. 2a   Two H3K4 demethylase protein fusions, JMJ14-ZF and JMJ18-ZF, successfully triggered an early flowering phenotype and silenced FWA in a DNA methylation independent manner (Fig. 1a-e). H3K4me3 and H3 ChIP-seq was performed in JMJ14-ZF, where we found that JMJ14-ZF caused a reduction of H3K4me3 over the FWA locus (Fig. 3c), as well as over ZF off-target regions ( Fig. 3d and Extended Data Fig. 3a), but this was not observed in the EYFP-ZF line (Extended Data Fig. 2b, c). Furthermore, we observed more H3K4me3 removal at ZF off-targets that contained high levels of pre-existing H3K4me3, and less removal at sites with low levels of pre-existing H3K4me3 (Extended Data Fig. 3b). In addition, unlike MSI1-ZF and LHP1-ZF (Fig. 3a,b), JMJ14-ZF did not show accumulation of H3K27me3 at FWA, nor at ZF off-target regions (Extended Data Fig. 3c,d). Thus, silencing by JMJ14 is probably acting directly via removal of H3K4me3 rather than by accumulation of H3K27me3, a mark which can act antagonistically with H3K4me3 (refs. [28][29][30]. Interestingly, several other H3K4 demethylase proteins were included in our collection, including JMJ16/17, LDL1/2/3 and FLD, but none of these was able to trigger silencing of FWA.

Targeted gene silencing by histone deacetylation
Four HDAC proteins, HD2A, HD2B, HD2C and HDA6, were identified from the silencing screen. It was previously shown that Arabidopsis HD2A is required for H3K9 deacetylation and ribosomal RNA gene silencing 31 , and that HD2C mediates H4K16 deacetylation and is involved in ribosome biogenesis 32 , suggesting that HD2 family members can deacetylate multiple sites. We performed immunoprecipitationmass spectrometry (IP-MS) utilizing a pHD2A:HD2A-FLAG transgene, which indicated that all three HD2 type HDACs interact with each other (Supplementary Table 4), consistent with an earlier report of IP-MS of tagged HD2C 32 . In addition, ChIP-seq analysis of pHD2A:HD2A-FLAG plants showed a partial overlap with histone H3K9ac, H3K27ac and H4K16ac in the genome ( Supplementary Fig. 2a,b). Thus, we profiled H3K9ac, H4K16ac, H3K27ac and H3 patterns by ChIP-seq in HD2A-ZF, HD2B-ZF, HD2C-ZF (HD2-ZFs) plants. We found that H3K9ac, H4K16ac and H3K27ac were moderately reduced in HD2-ZF plants both at FWA (Fig. 4a) and over ZF off-target sites (Fig. 4b-d and Extended Data Fig. 4a), particularly over the loci with a higher level of pre-existing acetylation modifications (Extended Data Fig. 4b-d). However, the acetylation levels were not reduced in EYFP-ZF at both FWA and ZF off-target loci ( Supplementary Fig. 3a,b). HDA6 has been reported to deacetylate several substrates including K9, K14, K18, K23 and K27 of the H3 histone tail and K5, K8 and K12 of the H4 histone tail 33 , with H3K9ac and H3K14ac confirmed in multiple studies 34,35 . Moreover, it is known that MSI1 interacts with HDA6 and HDA19, and contributes to histone deacetylation 36,37 . We therefore performed H3K9ac, H3K14ac and H3 ChIP-seq in HDA6-ZF and MSI1-ZF. Indeed, both H3K9ac and H3K14ac ChIP-seq signals were reduced at FWA as well as at ZF off-target sites in HDA6-ZF and MSI1-ZF plants (Fig. 4e,f and Extended Data Fig. 5a), particularly over the loci with high levels of pre-existing H3K9ac and H3K14ac (Extended Data Fig. 5b,c). This result suggests that HDA6 and MSI1-ZF repress target gene expression at least partially via histone H3K9 and H3K14 deacetylation. Together our results demonstrate that a variety of different HDAC proteins of different classes can be harnessed for targeted gene silencing. However, since histone acetylation universally goes with gene expression, it is not possible to rule out the possibility that the HDAC-ZF fusions are silencing expression by another mechanism, with the loss of acetylation being an indirect effect of gene silencing 38 .

Silencing by interference with Pol II transcription
ELF7 encodes a PAF1 homologue, which is a subunit of the PAF1 complex (PAF1C). PAF1C is a conserved protein complex in eukaryotes that collaborates with Pol II during transcription initiation and elongation 39,40 . In Arabidopsis, mutation of PAF1C subunit VIP3 caused a redistribution of histone H3K4me3 and H3K36me2 in certain genes 41 , and we therefore initially performed H3K4me3, H3K36me2 and H3K36me3 ChIP-seq to determine whether changes in these epigenetic marks might explain ELF7-ZF triggered FWA suppression. We observed some reduction of H3K4me3 at FWA in ELF7-ZF compared with fwa control plants (Fig. 3c). However, unlike JMJ14-ZF, H3K4me3 signal was largely unaffected near ZF off-target sites in ELF7-ZF (Fig. 3d). Considering that ELF7-ZF did trigger gene silencing at ZF off-target sites ( Fig. 1e and Extended Data Fig. 1c), it seemed unlikely that H3K4me3 reduction was the relevant mechanism. In addition, signals of both H3K36me2 and H3K36me3 were slightly decreased at the FWA locus (Extended Data Fig. 6a), while at the same time somewhat increased over ZF off-target sites (Extended Data  Fig. 6b,c), making it unlikely that changes in H3K36me2 or H3K36me3 levels were the direct cause of ELF7-ZF-mediated gene silencing.
We generated pELF7:ELF7-FLAG complementing transgenic lines in the elf7-3 mutant background to perform IP-MS to identify ELF7-interacting proteins 21 (Supplementary Fig. 4). Consistent with previous work 40 our ELF7 IP-MS data identified peptides corresponding to all of the subunits of the PAF1C, as well as Pol II subunits and transcription factors, consistent with a role of ELF7 in Pol II transcription (Supplementary Table 4). Since ELF7 is a Pol II-interacting protein, we hypothesized that ELF7-ZF might interact with Pol II at the FWA promoter region, retaining it there and inhibiting transcription. To test this, we performed Pol II serine 5 (Ser5) ChIP-seq in ELF7-ZF transgenic lines, as well as fwa, EYFP-ZF and HD2A-ZF as controls. As expected, Pol II occupancy at FWA transcribed regions was strongly reduced in ELF7-ZF, as well as in HD2A-ZF (Fig. 5a), but not in EYFP-ZF (Extended Data Fig. 2b,c), consistent with the silencing of FWA expression in these lines (Fig. 1b). However, we observed a very prominent Pol II peak at the FWA promoter overlapping the ZF site in the ELF7-ZF line, but not in HD2A-ZF, EYFP-ZF or fwa plants ( Fig. 5a and Extended Data Fig. 2b,c). Moreover, strong Pol II enrichment was also observed at the ZF off-target binding sites but not the shuffled controls sites in ELF7-ZF ( Fig. 5b and Extended Data Fig. 7a). Thus, Pol II appears to be tethered to the ZF binding sites due to the interaction with ELF7-ZF, which in turn appears to inhibit Pol II transcription, leading to gene silencing.
To better understand the endogenous function of ELF7, we performed ChIP-seq in pELF7:ELF7-FLAG transgenic lines. Consistent with its role in transcriptional elongation, ELF7 was exclusively distributed over gene body regions, with most ELF7 signals overlapping with both Pol II peaks and H3K36me2 or H3K36me3 peaks (Extended Data Fig. 7b,c). We also performed Pol II Ser5 ChIP-seq in elf7-3 and found a notable accumulation of Pol II at ELF7 enriched sites but not the shuffled control sites 42 ( Fig. 5c and Extended Data Fig. 7d), suggesting that transcriptional elongation is impeded, resulting in a higher Pol II occupancy reflected in the ChIP-seq data. Together our data show that the Arabidopsis PAF1C is required for proper Pol II transcriptional elongation, as has been shown in yeast and animal systems 43,44 , and that tethering the ELF7 component of the complex to promoters represents a novel synthetic mechanism to induce gene silencing that is probably independent of changes of particular epigenetic marks.

Target gene silencing by Pol II CTD Ser5 dephosphorylation
CPL2 is a well-characterized phosphatase that specifically acts on Ser5 of the Pol II C-terminal domain 45 , and represses transcription through inhibiting Pol II activity 46,47 . We therefore performed Pol II Ser5 ChIP-seq in CPL2-ZF transgenic lines, and we indeed observed reduced signal at FWA as well as ZF off-target sites that had pre-existing Pol II Ser5 (Fig. 5d,e and Supplementary Fig. 5), suggesting that CPL2-ZF indeed silenced target genes through Pol II CTD Ser5 dephosphorylation, although our data do not allow us to pinpoint the exact mechanism of repression. The promoter tethering of CPL2 thus represents a new mechanism for targeted gene silencing. Article https://doi.org/10.1038/s41477-023-01362-8

Target genes vary widely in different ZF fusions
We observed that the set of downregulated genes at ZF off-target sites for each ZF fusion were partially non-overlapping (Extended Data Fig. 8). As one example, gene AT3G13470 was downregulated by HD2A-ZF, HD2B-ZF, HD2C-ZF and LHP1-ZF, but not ELF7-ZF nor CPL2-ZF (Fig. 6a). These results suggest that the best gene silencing approach will greatly depend on the particular target gene of interest, highlighting the utility of gene silencing tools that work by different mechanisms. We hypothesized that pre-existing epigenetic features of the target genes might determine their sensitivity to silencing by the different ZF fusion proteins, and indeed different target genes showed different levels of various epigenetic marks including histone acetylation (H3K9ac, H3K27ac and H4K16ac), histone methylation (H3K4me3 and H3K27me3), chromatin accessibility (assay for  Table 5). To test this hypothesis, we used the chromatin features of 2,709 genes containing nearby ZF binding sites, as well as the expression level of these genes, together with the information of whether these genes were silenced with each of the ZF fusion as inputs to various machine learning algorithms (Supplementary Table 5). The decision tree classifier showed an excellent performance, and a ten-fold cross-validation of this model showed an extremely high accuracy (78-96%) in predicting the efficacy of each silencer (Fig. 6b and Supplementary Table 6). In addition, the importance of each feature to the model construction was also revealed by this method (Extended Data Fig. 9). Notably, the target gene expression level was common feature that contributed to almost every silencer, Pol II ChIP-seq played a dominant role in CPL2-ZF, and the rest of the silencers were usually determined by a combination of varied features (Extended Data Fig. 9), supporting the hypothesis that different effector proteins are efficient in silencing genes with different chromatin features.

Some effector proteins were effective in a dCas9-SunTag system
We cloned SUVH2, JMJ14, LHP1, HD2C, ELF7 and CPL2, representing different epigenetic pathways targeting gene silencing, into the SunTag system. The SunTag is a dCas9-based system in which dCas9 is fused to a chain of peptide epitopes, with effector proteins fused in a separate cassette with a single-chain antibody recognizing the peptide epitopes, such that a single dCas9 can recruit multiple effector proteins 14,16 . All the effectors in the SunTag system initially failed to trigger gene silencing or an early flowering phenotype in the fwa background, which we hypothesized was due to gene silencing of the transgene since western blot analysis showed little expression of dCas9 (Extended Data Fig. 10a). To reduce silencing, we retransformed all constructs into the fwa rdr6 genetic background 48 , and found that this dramatically increased the expression of dCas9 (Extended Data Fig. 10a). In this background we observed target FWA gene silencing and an early flowering phenotype in the JMJ14 and LHP1 lines, and to a lesser extent in ELF7 and HD2C, but not in SUVH2, CPL2 or the no effector SunTag control (Fig. 6c,d and Extended Data Fig. 10b,c). These results indicate that that some effector proteins identified from this work can be also used for target gene silencing via the SunTag system.

Discussion
By constructing ZF fusions to target a collection of putative chromatin regulators to FWA, we uncovered a variety of proteins capable of inducing gene silencing at FWA as well as at many other loci. This screen was done in the fwa epiallele background in which there are no short interfering RNAs, H3K9me2, RdDM components or DNA methylation at the target locus. Further, most of the silencers described here functioned  without inducing any of these heterochromatin-related epigenetic marks. Our genomic and genetic evidence demonstrate that target gene silencing can be directed by diverse histone modifications including H3K27me3 deposition, H3K4me3 demethylation and histone deacetylation at H3K9, H3K14 and H4K16 tails. Interestingly, we also found two factors that appear to act by more directly interacting with Pol II, the ELF7 component of the elongation factor PAF1C and the CPL2 enzyme, which dephosphorylates the RNA Pol II C-terminus. Due to protein-protein interactions between these effectors with the other epigenetic regulators, it is possible that some effectors might silence target genes through combined mechanisms. For example, it is known that HD2A, B and C interacts with HDA6 49 , which might deacetylate different histone tail residues. As another example, MSI1 interacts with HDA19, HISTONE DEACETYLATION COMPLEX1 (HDC1), HDA6, SIN3-like proteins and other proteins 36,37 , and our results indicated that MSI1-ZF might trigger gene silencing via a combination of H3K27me3 deposition and histone deacetylation (Fig. 4e,f and Extended Data Fig. 5a). This could also explain why MSI1-ZF silenced more efficiently than LHP1-ZF, HDA6-ZF and HD2-ZFs at FWA (Fig. 1b). The differences in interacting proteins between MSI1 and LHP1 might also explain why they silenced different sets of target genes (Supplementary Table 7).
Our results show that diverse pathways can be harnessed for the development of synthetic biology tools to downregulate genes. DNA methylation represents a strong and potentially heritable type of silencing, but only some genes will be amenable to this type of modification due to low densities of CG dinucleotides that are needed for silencing and heritability 12 or high levels of endogenous expression that can compete with DNA methylation maintenance 14 . In addition, it may often be desirable to cause only a partial silencing of a target gene, and the non-DNA-methylation-based tools described here were able to cause various levels of repression that were lower than that of DNA methylation targeting tools (Fig.1b).
Our data showed that certain genes were more amenable to particular gene silencing approaches, which is probably due to differences in proximal chromatin environments for different target genes. Therefore, having a wider array of silencing tools expands the range of genes that can be successfully targeted. This also suggests that a different set of proteins from the library might have been identified if we had utilized a different reporter gene other than FWA, or if we had targeted a position other than the promoter region of the reporter gene. The genes susceptible to silencing in this study were involved in a wide range of biological processes including development, hormone signalling pathways and disease resistance (Supplementary Table 7), suggesting that the tools described here could be useful for the modulation of many different traits. In addition, we were able to utilize machine learning algorithms that could accurately predict which genes would be sensitive to silencing by a particular gene silencing tool. The prediction was extended to all Arabidopsis genes (Supplementary Table 6), which may be useful for future gene expression engineering efforts. In conclusion, this work provides mechanistic detail for an array of key plant gene silencing pathways, and describes a collection of new tools that should be useful in both basic research and crop improvement.

Plant materials and growth conditions
All the plants used in this paper were in the Arabidopsis thaliana Col-0 ecotype, grown under long-day conditions (16 h light and 8 h dark). The T-DNA insertion lines used in this study included elf7-3 (SALK_019433) 42 . All the transgenic plants were generated by Agrobacterium (AGL0 strain)-mediated floral dipping.

Plasmid construction
pMDC123-UBQ10: ZF-3xFLAG-effector (cDNA) is a Gateway compatible binary destination vector that includes a plant UBQ10 promoter, followed by an N-terminal ZF and 3xFLAG epitope tag, a Gateway cassette and an OCS terminator. The list of selected effectors from Arabidopsis Gene ORFeome Collection (Supplementary Table 1) in the pENTR/D-TOPO vectors were all cloned into pMDC123 destination vector via LR reaction using Gateway LR Clonase II (Invitrogen). pEG302-effector (gDNA)-3xFLAG-ZF is also a Gateway-compatible binary destination vector, which consists of a gateway cassette, followed by a C-terminal 3xFLAG epitope tag, a ZF, a Biotin Ligase Recognition Peptide and an OCS terminator. The sequence of native promoter (~1.5 kb upstream from the 5′ UTR or until the next gene annotation) and genomic DNA (without stop codon) of the effectors were cloned into pENTR D-TOPO vectors (Invitrogen), which were used to deliver the genomic DNA sequences of these effectors into the destination vector using Gateway LR Clonase II (Invitrogen). pEG302-effector (gDNA)-3xFLAG/9xMyc contains a gateway cassette, followed by a C-terminal 3xFLAG or 9xMyc epitope tag, a Biotin Ligase Recognition Peptide and an OCS terminator. The cloning method is the same as pEG302-effector (gDNA)-3xFLAG-ZF. For the SunTag-effector, the catalytic domain of JMJ14 and the coding sequence of other effector proteins were cloned into the SunTag vector with the infusion method (Takara). Please see Supplementary Table 8 for the primers and guide RNA information for the SunTag system.

Flowering time measurement
The flowering times were measured by the leaf counts, and each dot in the dot plots represents the leaf number of individual plant. The number of plants with 20 or fewer leaves were marked.

BS-PCR-seq
The leaf tissue from 4-to 5-week-old Col-0 wild type, fwa and the representative T2 ZF lines showing early flowering phenotype were collected to perform BS-PCR at FWA promoter regions. Cetyltrimethyl ammonium bromide-based method was used to extract DNA, and the EpiTect Bisulfite kit (QIAGEN) was used for DNA conversion. The converted DNA was used as a template to amplify three different regions over promoter and 5′ transcribed regions of FWA, including region 1 (chr4: 13038143-13038272), region 2 (chr4: 13038356-13038499) and region 3 (chr4: 13038568-13038695). Pfu Turbo Cx (Agilent), dNTP (Takara Bio) and the primers designed for the above-mentioned FWA regions (Supplementary Table 8) were used to perform PCR reactions. Three different PCR products from three regions of each sample were pooled and purified with AMPure beads (Beckman Coulter). The purified PCR products were used to construct libraries by the Kapa DNA Hyper Kit (Roche) together with TruSeq DNA UD indexes for Illumina (Illumina), and the libraries were sequenced on Illumina iSeq 100.

IP-MS
The method of IP-MS used in this study has been described in a recent paper 13 . Ten grams of unopened floral buds from Col-0 wild type and FLAG-tag transgenic plants HD2A and ELF7 were collected and ground into fine powder with liquid nitrogen. These samples were resuspended with 25 ml IP buffer and homogenized until lump-free by Dounce homogenizer. The lysate was filtered through Miracloth and incubated with 250 μl anti-FLAG M2 magnetic beads (Sigma) at 4 °C for 2 h. The magnetic beads were washed with IP buffer and eluted with TBS containing 250 μg ml −1 3xFLAG peptides. The eluted proteins were precipitated with trichloroacetic acid (Sigma) and subject to MS analyses as described previously 13 .

ChIP-seq
We followed a previous protocol for ChIP-seq with minor modifications 11 . Five microlitres of anti-H3K9ac (ab4441, Abcam), anti-H3K14ac (ab52946, Abcam), anti-H3K27ac (ab52946, Abcam), anti-H4K16ac (ab109463, Abcam), anti-Pol II Ser5 (ab5131, Abcam), anti-H3K36me2 (ab9049, Abcam), anti-H3K36me3 (ab9050, Abcam), anti-H3K4me3 (04-745, Millipore) or anti-H3K27me3 (07-449, Millipore) or 10 μl of anti-FLAG (Sigma) have been added into each ChIP, accordingly. Briefly, a total of 2-4 g of the leaves of T2 ZF lines with early a flowering phenotype or unopened flower buds of the FLAG lines were collected. The plant materials were ground with liquid nitrogen and fixed with 1% formaldehyde containing nuclei isolation buffering for 10 min before adding fresh-made glycine to terminate the crosslinking reaction. The Article https://doi.org/10.1038/s41477-023-01362-8 nuclei were isolated and disrupted by SDS-containing lysis buffer, and the chromatin was sheared via Bioruptor Plus (Diagenode) and immunoprecipitated with antibody at 4 °C overnight. Next, the magnetic Protein A and Protein G Dynabeads (Invitrogen) were added and incubated at 4 °C for 2 h. After washing and elution, the reverse crosslinking was done at 65 °C overnight. Then the protein-DNA complex was treated with Protease K (Invitrogen) at 45 °C for 4 h, and the DNA was purified and precipitated with 3 M sodium acetate (Invitrogen), GlycoBlue (Invitrogen) and ethanol at −20 °C overnight. The precipitated DNA was directly used for library construction using the Ovation Ultra Low System V2 kit (NuGEN), and the libraries were sequenced on Illumina NovaSeq 6000 or HiSeq 4000 instruments.

RNA-seq
Leaf tissue of 4-to 5-week-old plants with similar age from fwa and early flowering ZF transgenic lines was collected for RNA extraction using Direct-zol RNA MiniPrep kit (Zymo Research). One microgram of total RNA was used to prepare the libraries for RNA-seq following TruSeq Stranded mRNA kit (Illumina), and the libraries were sequenced on Illumina NovaSeq 6000 or HiSeq 4000 instruments.

BS-PCR-seq analysis
BS-PCR-seq data analysis in this study used the pipeline described in ref. 12 . The raw pair-end sequencing reads of each sample were combined, and aligned to both strands of reference genome TAIR10 using BSMAP (v.2.90) 50 , and the alignment allowed up to two mismatches and one best hit. The reads with fewer than 20 reads coverage of cytosines and the reads with more than three consecutives methylated CHH sites were removed. The methylation level of each cytosine was calculated using the ratio of mC/(C + T), and only the methylation data within the designed FWA regions were kept for making a plot using customized R scripts.

ChIP-seq analysis
The ChIP-seq raw reads were trimmed using trim_galore (v0.6.5) (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) and then aligned to TAIR10 genome using Bowtie2 (v2.1.0) (ref. 51 ), which allowed one unique mapping site and zero mismatch. The Samtools version 1.9 (ref. 52 ) was used to remove the duplicated reads, and together with deeptools version 3.1.3 (ref. 53 ) to generate tracks using reads per kilobase of exon per million reads mapped (RPKM) for the normalization. The peaks were called using MACS2 version 2.1.1 (ref. 54 ), and the peaks that were frequently existed in previous FLAG ChIP-seq of Col-0 were removed.
For FLAG-ZF ChIP-seq, the FLAG ChIP-seq was performed in the unopened flower buds of FLAG-ZF T2 transgenic plants and fwa plants. The peaks were called by FLAG-ZF against fwa, and the peaks with four-fold or higher signal enrichment were kept as the ZF off-target sites (Supplementary Table 2, n = 6,091), while the other FLAG ChIP-seq used signal enrichment of two-fold or higher for the following analysis.
For the comparison of histone and Pol II (histone/Pol II) enrichment over ZF off-target sites between ZF lines and fwa, the histone/ Pol II ChIP-seq of each sample including both ZF lines and fwa was normalized with their respective H3 ChIP-seq first by using bigwig-Compare, and then the normalized histone/Pol II ChIP-seq of ZF lines were further normalized to fwa by using bigwigCompare, which were then used to make the metaplot over ZF off-target peaks and random shuffled peaks. This method was also applied in Pol II ChIP-seq enrichments between Col-0 wild type and elf7-3 mutant over ELF7 peaks and shuffled peaks (Fig. 5c).
The RNA-seq tracks were generated using Samtools version 1.9 (ref. 52 ) and normalized with RPKM using bamCoverage from deeptools version 3.1.3 (ref. 53 ). The DEGs were called using customized scripts of run_DE_analysis.pl from Trinity version 2.8.5 (ref. 56 ). Log 2 fold change ≥1 and false discovery rate <0.05 were used as a cut-off.
The method of RAD analysis in Fig. 1e is described at ref. 20 . We used the FLAG-ZF ChIP-seq peaks (n = 6,091) as input of the favourite regions, and up-and downregulated DEGs of ZF lines versus fwa were used, respectively, as the inputs of DEGs.

Machine learning
ZF targeting genes. We defined 6,091 ZF offtarget sites above; however, not all of them were located proximal to genes. Therefore, we considered 2,709 genes as ZF targeting genes, whose transcription start sites (TSSs) were located between −500 bp and 200 bp from a ZF peak. The chromatin features of these 2,709 ZF target sites as well as the gene expression level of ZF target genes were subjected to the following analysis.
Input data preparation. A total of 15 chromatin or genomic features of the 2,709 ZF off-target sites/genes, including gene expression level, ChIP-seq signals of H3K27ac, H3K27me3, H3K4me3, H4K16ac, H3K9ac and Pol II, ATAC-seq signals, the level of CG, CHG and CHH methylation, the number of CG, CHG and CHH sites, and GC content, that have been characterized in the fwa epiallele (Supplementary Table 5) were utilized to generate input data for machine learning. We also used a binary gene classification based on whether the gene was downregulated (fold change ≤2 and false discovery rate <0.05) in ZF lines compared with the fwa epiallele plants. Input data were pre-processed by removing perfect collinearity.

Prediction model selection.
We evaluated 14 machine learning algorithms with ten-fold cross-validation (Fig. 6b) in PyCaret (v2.3.4, https://pycaret.org/) to select prediction models. The input dataset was divided into ten groups, with nine groups used as training dataset and one group as test data, reiterated ten times. The decision tree classifier method was applied in modelling as this method showed good performance and also provided the contribution of each feature to the model construction.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The accession number for all the high-throughput sequencing data of this paper is GEO: GSE197063. Source data are provided with this paper.