Cryo-EM structure of the human Mixed Lineage Leukemia-1 complex bound to the nucleosome

Mixed Lineage Leukemia (MLL) family histone methyltransferases are the key enzymes that deposit histone H3 Lys4 (K4) mono-/di-/tri-methylation and regulate gene expression in mammals. Despite extensive structural and biochemical studies, the molecular mechanism by which the MLL complexes recognize histone H3K4 within the nucleosome core particle (NCP) remains unclear. Here, we report the single-particle cryo-electron microscopy (cryo-EM) structure of the human MLL1 core complex bound to the NCP. The MLL1 core complex anchors on the NCP through RbBP5 and ASH2L, which interacts extensively with nucleosomal DNA as well as the surface close to histone H4 N-terminal tail. Concurrent interactions of RbBP5 and ASH2L with the NCP uniquely align the catalytic MLL1SET domain at the nucleosome dyad, allowing symmetrical access to both H3K4 substrates within the NCP. Our study sheds light on how the MLL1 complex engages chromatin and how chromatin binding promotes MLL1 tri-methylation activity.


INTRODUCTION
The nucleosome core particle (NCP), consisting of an octameric core of histone proteins (two of each H2A, H2B, H3, and H4) and 146 base-pairs of genomic DNA, represents the first level of eukaryotic DNA packaging (Kornberg and Thomas, 1974). It is further organized into higher order chromatin structures. Cell specific transcription program, in large part, is governed by chromatin accessibility, which is actively regulated by histone modifying enzymes and ATP-dependent chromatin remodeling complexes. In recent years, X-ray crystallography and single-particle cryo-EM studies have shed light on how these chromatin-associating complexes interact with the NCP for respective physiological functions. Most, if not all, chromatin complexes engage the 'acidic-patch' region of the NCP through variations of an 'arginine-finger' motif (Armache et al., 2011;Barbera et al., 2006;Kato et al., 2013;Makde et al., 2010), highlighting common features among chromatin interacting protein complexes. It remains unclear whether the recognition mode of the NCP is universal for chromatin interacting complexes.
Among histone post-translational modifications, the states of histone H3 lysine4 methylation (H3K4me) (i.e., mono-, di-, tri-methylation) are exquisitely modulated at important DNA regulatory regions including active gene promoters, gene bodies and distal regulatory enhancers (Rao and Dou, 2015). In particular, H3K4me3 is highly correlative to the transcriptionally active-and open-chromatin regions (Chen et al., 2015;Ruthenburg et al., 2007) and is shown to actively recruit the basic transcription machinery, ATP-dependent chromatin remodeling complexes, and histone acetyltransferases (Lauberth et al., 2013;Ruthenburg et al., 2007;Taverna et al., 2007;Vermeulen et al., 2007;Wysocka et al., 2006). In contrast, H3K4me1 is a prevalent mark often found at poised or active distal enhancers (Herz et al., 2012). Specific regulation of the H3K4me states may play a critical role in important physiological processes in cells.
Here, we report the single-particle cryo-EM structure of the human MLL1 core complex bound to the NCP. It not only reveals the overall architecture of the human MLL1 complex with full-length core components, but also illustrates how the MLL1 core complex engages the chromatin. Importantly, we show that the MLL1 core complex docks on the NCP through concurrent interactions of ASH2L/RbBP5 with nucleosomal DNA and histone H4. This unique configuration aligns the catalytic MLL1 SET domain at the nucleosome dyad, which allows the symmetric access to both H3K4 substrates. Our
The overall architecture of the MLL1 RWSAD -NCP complex showed that MLL1 RWSAD anchors at the edge of the NCP through two core components, RbBP5 and ASH2L simultaneously ( Figure 1D). In the NCP, DNA superhelical location 7 (SHL7) and SHL1.5, together with H4 N-terminal tail, were involved in the interaction with MLL1 RWSAD ( Figure 1D). Notably, domains in the MLL1 RWSAD -NCP complex were dynamically associated with each other and showed multiple conformations ( Figure   S3E). However, the overall architecture was conserved in all sub-classes of the MLL1 RWSAD -NCP structures ( Figure S3E). Distinct from many of previously reported NCP-recognizing protein or protein complexes (Zhou et al., 2019), the MLL1 core complex did not interact with the acidic patch region of the NCP.

RbBP5 Binds the NCP through Both DNA and Histone H4 Tail Recognition
In the MLL1 RWSAD -NCP complex, the RbBP5-NCP interfaces were less dynamic. The sub-population particles of RbBP5-NCP from the MLL1 RWSAD -NCP dataset were resolved at 4.2 Å resolution (Figures 2A, S2, and S3B). The regions of mouse RbBP5 in model fitting shared 100% sequence identity with human RbBP5 ( Figure S4A). The structure showed that RbBP5 bound to the NCP by simultaneously engaging DNA (SHL1.5) and histone H4 N-terminal tail. The interactions involved six consecutive loops emanating from the WD40 repeats of RbBP5 ( Figure 2A). Characteristic features of RbBP5 (e.g. unique helix, anchoring loop, and insertion loop) were well matched into the cryo-EM map of RbBP5-NCP subcomplex ( Figure S4B). Notably, RbBP5 interacted with DNA SHL1.5 through four positively-charged arginine residues (Quad-R) located at β18-β19 (R220), β20-β21 (R251), β22-β23 (R272), and β24-β25 (R294) loops, respectively ( Figure 2B). The Quad-R participated in electrostatic interactions with the DNA phosphate backbone ( Figure 2B). Disruption of RbBP5-NCP interaction significantly reduced the activity of the MLL1 core complex. Mutations of the Quad-R residues to alanine (A) led to reduction of H3K4me3 and to a lesser degree, H3K4me2 ( Figure 2C). The effect was more drastic when Quad-R residues were mutated to glutamic acid (E) ( Figure 2D). Systematic alteration of three, two or one arginine residue(s) in Quad-R showed that at least two arginine residues were required for optimal H3K4me3 activity ( Figure 2C).
The second RbBP5-NCP interface includes two loops, an insertion loop (β16-β17 loop, referred to herein as I-loop) and an anchoring loop (β19-β20 loop, referred to herein as A-loop), of RbBP5 (Figures 2A, 2E, S4B, and S5A). Both I-and A-loops are evolutionarily conserved in higher eukaryotes ( Figure S5B). The I-loop was positioned between the N-terminal tail of histone H4 and nucleosomal DNA ( Figure 2E). The Aloop run parallel to H4 tail, which was positioned between the I-/A-loops of RbBP5 and the helix α1 (Leu65-Asp77) of histone H3 ( Figure 2E). This H4 tail-mediated nucleosome recognition of RbBP5 resembles that of the active-state DOT1L (Anderson et al., 2019;Worden et al., 2019) (Figure 2F). Similar to Quad-R, deletion of I-loop and to a lesser degree A-loop, reduced the activity of the MLL1 core complex for H3K4me3 and H3K4me2 ( Figure 2D). Importantly, RbBP5-NCP interaction is specifically required for MLL1 activity on the NCP ( Figure 2D). Mutations in Quad-R, I-loop, and A-loop had no effects on mono-, di-, tri-methylation of free H3 ( Figure S5C). Among RbBP5-NCP interactions, Quad-R was the main contributor to NCP binding. The mutation of Quad-R significantly reduced RbBP5 binding to the NCP, while the deletion of I-and A-loops only modestly affected NCP binding ( Figure 2G).

Structural Organization of WDR5, MLL1 SET , and ASH2L SPRY Sub-complex
To resolve the structural organization of WDR5, MLL1 SET , and ASH2L SPRY subcomplex, we reconstructed the MLL1 RWS -NCP subcomplex ( Figure 1C) ( Figure S2 and the STAR methods) and successfully docked the crystal structures of human WDR5 (PDB ID: 2H14) (Couture et al., 2006) and MLL1 SET -ASH2L SPRY -RbBP5 330-375 (PDB ID: 5F6L)  into the cryo-EM maps ( Figure 3A and 3B). The secondary structural components of MLL1 SET (Southall et al., 2009), including α−helices and βhairpin of the SET-I, SET-N, and SET-C domains (dotted circles), fitted well into the cryo-EM map ( Figure 3B). Similar to MLL1 SET , distinctive features of WDR5 and ASH2L SPRY were also well-defined in the cryo-EM structure ( Figure 3A and 3B). Importantly, our structure indicated that the WDR5-MLL1 SET -ASH2L SPRY sub-complex did not make direct contacts with nucleosomal DNA, which was experimentally confirmed by the gel mobility assays ( Figure S5D and data not shown). The catalytic site of the MLL1 SET domain was pointing outward, which might confer distance restraint on substrate accessibility (see below). Furthermore, the overall domain architecture of the MLL1 core complex within MLL1 RWSAD -NCP was largely conserved in the NCP-free yeast SET1 complexes (Hsu et al., 2018;Qu et al., 2018), suggesting that NCP binding may not require or induce major conformational changes in the MLL1 core complex ( Figure 3C).

Dynamic ASH2L-NCP Interaction is Critical for H3K4me3
The second docking point of the MLL1 core complex to the NCP was provided by the intrinsically disordered regions (IDRs) of ASH2L ( Figures 1D and 3C). The ASH2L-NCP interface was highly dynamic in solution ( Figure S3E), making it challenging to visualize the molecular details. Similar dynamic behaviour was observed for IDR of the yeast homologue Cps60 ( Figure 3C), which was not resolved in the cryo-EM structure of the yeast SET1 complex (Qu et al., 2018). Since the crystal structure of full-length human ASH2L has not been reported, we employed the protein structure prediction approach using the iterative template-based fragment assembly refinement (I-TASSER) method (Roy et al., 2010;Zhang, 2008). The crystal structure of yeast Bre2 was used as template (PDB ID: 6CHG) (Hsu et al., 2018) to build the ASH2L plant homeodomain-wing helix  Figures 1D and 4B). Surprisingly, the PHD-WH domain of ASH2L was located outside of the cryo-EM map ( Figure S7A) despite the reported function in DNA binding (Chen et al., 2011;Sarvan et al., 2011).
Our MLL1 RWSAD -NCP model pinpointed a short stretch of positively-charged residues (i.e., K205/R206/K207) in the ASH2L Linker-IDR to make contacts with nucleosomal DNA ( Figure 4B). These positively-charged residues were highly conserved in ASH2L homologs in higher eukaryotes ( Figure 5A). To biochemically validate the model structure of ASH2L IDRs, we first examined the importance of key residues involved in the NCP interaction. As shown in Figure 5B, ASH2L directly interacted with the NCP, resulting in a mobility shift in the native gel. However, deletion of both PHD-WH (residues 1-178) and Linker-IDR (residues 178-277), but not PHD-WH alone, abolished ASH2L interaction with the NCP ( Figure 5B). Further truncation of ASH2L Linker-IDR identified that residues 202-207 were important for NCP interaction, consistent with our ASH2L model ( Figure 4B). Binding of ASH2L to the NCP was critical for MLL1 methyltransferase activity. Deletion of ASH2L Linker-IDR completely abolished the MLL1 activity on the NCP ( Figure 5C, left). Similarly, deletion of ASH2L residues 202-207 or mutations of residues K205/R206/K207 to alanine significantly reduced MLL1 activity on the NCP for H3K4me3 ( Figure 5C, right), but not on free H3 ( Figure S7B). This result, together with that of RbBP5, suggests that MLL1-NCP interactions specifically promote tri-methylation of H3K4. Notably, deletion of ASH2L Linker-IDR led to more drastic reduction of overall H3K4me, suggesting additional mechanisms by which Linker-IDRs may contribute to MLL1 regulation (see discussion).

Alignment of MLL1 SET at the Nucleosome Dyad
Given the binding of RbBP5 and ASH2L at the edge of the NCP (SHL1.5 and SHL7), the catalytic MLL1 SET domain was positioned at the nucleosome dyad ( Figure 6A). In this arrangement, the active site of the MLL1 SET domain pointed outward ( Figure 6A). The NCP structure was well-resolved in the cryo-EM map of MLL1 RWSAD -NCP. Both histone H3 tails emanated from between two gyres of nucleosomal DNA, with Lys37 as the first observable residue on histone H3 N-terminal tails ( Figure 6A). The distance between Lys37 on each histone H3 tail and the active site of MLL1 SET was ~ 60 Å. Thus, K4 residues on both H3 tails were almost equally accessible to the MLL1 SET active site ( Figure 6A). The distance constraint restricted access of the MLL1 SET domain to only K4 and K9 on H3 N-terminus ( Figure 6A). More importantly, since MLL1 SET is a nonprocessive enzyme (Patel et al., 2009), close proximity of the MLL1 SET domain to K4 on both H3 tails likely played a significant role in promoting its activity on higher H3K4me states on the NCP ( Figure 6B, see discussion).

DISCUSSION
Here we report the single-particle cryo-EM structure of the human MLL1 core complex bound to the NCP. It shows that the MLL1 core complex anchors on the NCP through WD40 domain of RbBP5 and Linker-IDR of ASH2L. This dual interaction positions the catalytic MLL1 SET domain near the nucleosome dyad, allowing symmetric access to both H3K4 substrates within the NCP. Disruption of MLL1-NCP interaction specifically reduces MLL1 activity for nucleosomal H3K4me3. Our study sheds light on how the MLL1 core complex engages chromatin and how chromatin binding promotes MLL1 trimethylation activity.

Unique MLL1-NCP Recognition among Chromatin Recognition Complexes
One of the well-known features of the NCP is the acidic patch, which is a negativelycharged and solvent exposed surface. It is organized by a series of negatively-charged residues in histones H2A and H2B (Zhou et al., 2019). The acidic patch interacts with the basic patch on histone H4 of adjacent nucleosomes, which underlies inter-nucleosome interactions in higher order chromatin structures (Kalashnikova et al., 2013). Structures of NCP-protein complexes demonstrate that the acidic patch is recognized by NCPinteracting proteins in many cases through diverse arginine finger motifs (e.g., LANA (Barbera et al., 2006), RCC1 (Makde et al., 2010), 53BP1 (Wilson et al., 2016)). Our study demonstrates that the MLL1 core complex binds to a unique surface of the NCP that do not involve the acidic patch. The main contributors to the NCP interaction are the electrostatic interactions between positively-charged residues in RbBP5 and ASH2L and the DNA backbone in the NCP. Extensive DNA-interactions were also observed for the histone H3K27 methyltransferase PRC2 in complex with dinucleosomes (Poepsel et al., 2018). However, unlike PRC2, all MLL1-NCP interactions occur within a single nucleosome. It is possible that other domains of MLL1 (e.g. MLL1 PHD-Bromo) that are not included in our study are important for engaging adjacent nucleosomes and spreading the H3K4me marks.
In our structure, the I-loop of RbBP5, inserting between the H4 tail and nucleosomal DNA (SHL1.5), provides the specific docking point of the MLL1 core complex to the NCP. Once RbBP5 is docked, the distance between RbBP5 and ASH2L (~ 70 Å) limits ASH2L binding to SHL7 of the NCP ( Figure S7C). Interestingly, despite importance of RbBP5 I-loop in specifying the orientation of MLL1 on the NCP, it did not contribute significantly to binding of RbBP5 to the NCP ( Figure 2G). Nonetheless, dual recognition through both specific and non-specific interactions of RbBP5 and ASH2L likely enables the MLL1 core complex to bind the NCP in a unique configuration for optimal access to both H3K4 substrates ( Figure 6B The composition of catalytically-active human MLL/SET and yeast SET1 complexes is largely conserved (Rao and Dou, 2015). Our study shows that human MLL1 core complex and the yeast SET1 complexes (Hsu et al., 2018;Qu et al., 2018) have the same overall architecture, with the catalytic MLL1 SET domain sandwiched by RbBP5-WDR5 (Swd1-Swd3 in Kluveromyces lactis/Cps50-Cps30 in Saccharomyces cerevisiae) and ASH2L-DPY30 (Bre2-Sdc1/Cps60-Cps25 in yeast) on each side ( Figure 3C). Furthermore, the crystal structure of the yeast SET1 complex (Hsu et al., 2018) overlays well with the MLL1 core complex in our structure. This may suggest that the yeast SET1 complex adopts a similar configuration on the NCP.
Given that RbBP5 and ASH2L are shared among all MLL family protein complexes in mammals, it is likely that our study reveals a general mechanism for how the mammalian MLL complexes engage chromatin and gain access to the H3K4 substrates. Sequence alignments show significant conservation of RbBP5 I-/A-loops as well as the basic residues in ASH2L in higher eukaryotes. It supports the functional importance of these regions in chromatin binding and H3K4me regulation. It also suggests that the mechanism by which MLL family enzymes engage chromatin in higher eukaryotes is likely to be conserved. Importantly, recent genome sequencing studies have identified mutations in these conserved regions in human malignancies (Blankin et al., 2012;Kim et al., 2014), which warrants future studies. We would like to point out that the interface of RbBP5 and ASH2L with the NCP are not well conserved in homologous yeast Swd1/Cps50 and Bre2/Cps60 protein ( Figures 5A and S5B). The I-loop is much shorter and the A-loop is missing in the yeast Swd1/Cps50 protein ( Figure S5A), suggesting potential divergence of detailed yeast SET1-NCP interactions at the molecular level, although the overall NCP recognition pattern might be similar.

Contribution of ASH2L-IDR in NCP Recognition and H3K4me Regulation
Previous studies have shown that several components of the MLL1 core complex are capable of interacting with DNA or RNA, including WDR5, RbBP5 and ASH2L (Chen et al., 2011;Mittal et al., 2018;Sarvan et al., 2011;Yang et al., 2014). This raises the question of how these interactions contribute to recruitment of the MLL1 core complex to chromatin. Our structure indicates that WDR5, MLL1 SET as well as PHD-WH domain of ASH2L do not directly interact with the NCP, suggesting that these proteins probably interact with either nucleosome-free DNAs and indirectly contribute to the stability of the MLL1 core complex on chromatin. Our study reveals a previously uncharacterized function of ASH2L Linker-IDRs in chromatin function. This region was not studied in the previous MLL1 SET -ASH2L SPRY -RbBP5 330-375 structure . The ASH2L IDRs contain evolutionarily conserved sequences ( Figure 5A and S5B) and exhibit dynamic properties on the NCP. Notably, we take advantage of protein structure prediction approach to identify the essential interface between ASH2L IDRs and the NCP. This approach subsequently allowed us to uncover a basic patch region in ASH2L ( 205 -KRK-207 ) that significantly contributes to MLL1 binding to the NCP and MLL1 trimethylation activity. However, it is likely that other regions of ASH2L also contribute to NCP binding since deletion of the ASH2L residues 202-254 leads to more prominent reduction of H3K4me ( Figure 5B-D). Molecular dissection of ASH2L IDRs awaits future studies.

Regulation of MLL1 Activity for H3K4me3 on the NCP
Single turnover kinetic experiments revealed that the MLL1 core complex uses a nonprocessive mechanism for catalysis (Patel et al., 2009), requiring capture and release H3K4 after each round of the methylation reaction. Thus, it is not kinetically favorable to achieve tri-methylation state when enzyme and substrate have random encounters in solution. Our study here demonstrates that the MLL1 core complex stably associates with NCP via RbBP5 and ASH2L, which uniquely positions the MLL1 SET domain at the nucleosome dyad with near symmetric access to both H3K4 substrates ( Figure 6A).
Stable settlement on the NCP allows close physical proximity and optimal orientation of the MLL1 SET catalytic site to both H3K4 substrates on the NCP, which significantly favor the kinetics of successive methylation reactions. In support, disruption of the MLL1-NCP interactions significantly reduces H3K4me3 activity on the NCP without affecting MLL1 activity on free H3. Given enhanced MLL1 activity on the NCP, we envision that stabilizing the MLL1 complex on chromatin by transcription factors and cofactors will further enhances overall H3K4me3, which in turn recruits additional transcription cofactors (Lauberth et al., 2013;Ruthenburg et al., 2007;Taverna et al., 2007;Vermeulen et al., 2007;Wysocka et al., 2006). This leads to a feedback loop for optimal gene expression and spreading of H3K4me3 at the actively transcribed genes in cells. Position of the MLL1 SET domain at the nucleosome dyad also raises the question of potential interplay with linker histones, which bind near the nucleosome dyad in the heterochromatic regions in eukaryotes (Hayes et al., 1994;Lu et al., 2013;Zhou et al., 2015). It would be interesting to test whether linker histone inhibits MLL1 activity and thus promotes closed chromatin conformation in future.

In vitro histone methyltransferase assay
The in vitro histone methyltransferase assay was carried out by incubating the

Electrophoretic Mobility Shift Assay (EMSA)
EMSA assay was carried out using 0.1 µM nucleosomes and increasing concentration of MLL1 subunits. The protein mixture was run on the 6% 0.2X TBE gel that was pre-run for 1.5 hours, 150 V at 4 ºC. The gel was visualized by incubating in 100 mL of TAE with 1:20000 diluted ethidium bromide for 10 minutes at room temperature. Gels were then incubated in distilled water for 10 minutes and visualized by UV transillumination (Bio-Rad ChemiDoc Imaging System). The results were quantified by ImageJ software (Schneider et al., 2012).

His 6 Pull-down Assay
His 6 -fusion proteins were incubated with the NCP in BC150 (20 mM Tris-HCl, pH 7.5, 350 mM NaCl, 20 mM imidazole, 0.05% v/v NP-40, 10 mM DTT, 1 mg/ml BSA, PMSF and inhibitor cocktail) for 2 hours at 4 ºC. After several washes with BC150, the beads were boiled in SDS loading buffer and analyzed by Western blot.

Cryo-EM sample preparation
The cryo-EM sample was prepared by the GraFix method (Kastner et al., 2008).
After ultracentrifugation, 20 µl fractions were manually collected from the top of the gradient. The crosslinking reaction was terminated by adding 2 µl of 1 M Tris-HCl, pH7.5 into each fraction. Glycerol was removed by dialyzing the sample in GraFix buffer using centrifugal concentrator (Sartorius Vivaspin 500) before making cryo-EM grids.

Cryo-EM data collection and processing
A protein sample at 1 mg/ml concentration was plunge-frozen on 200 mesh quantifoil R1 Micrograph movie stacks were first subjected to MotionCor2 for whole-frame and local drift correction (Zheng et al., 2017). For each micrograph, CTFFIND4.1 was used to fit the contrast transfer function (Rohou and Grigorieff, 2015). The estimated resolution of micrographs lower than 5Å were excluded from further processing, which resulted in 3896 micrographs. Particle picking was performed using the Warp (Tegunov and Cramer, 2018), which picked total 712,198 particles. Using particle coordinates obtained from the Warp, the particles were extracted with the box size of 350 Å using RELION 3 program package (Zivanov et al., 2018). Extracted particles were then imported into cryoSPARC (Punjani et al., 2017) for 2D classification in 200 classes.
After removal of bad classes, the total of 694,180 particles were subjected to ab initio 3D classification ( Figure S2). The major class (323,408 particles) contained the MLL1 core complex and the NCP, which was then subjected for the heterogeneous refinement. This led to the identification of ten subclasses. One subclass showed the partial cryo-EM density for the MLL1 core complex, thus excluded for the further processing. The These particles were used for 3D refinement in RELION and post-processed to a resolution of 6.2 Å and a B factor of -189 Å 2 . This cryo-EM map was local filtered using RELION to the local resolution to avoid over-interpretation.
The partial signal subtraction was performed to generate the particle set for RbBP5-NCP and MLL1 RWS -NCP. Further 3D classifications without alignment (5 classes, 35 cycles, T=40 for RbBP5-NCP and 10 classes, 35 cycles, T=40 for MLL1 RWS -NCP) were performed and the best maps based on the resolution and occupancy of RbBP5 and MLL1 RWS densities were selected for further refinement and post-processing ( Figure S2).
The reported final resolution of each cryo-EM structure was estimated by RELION with Fourier shell correlation (FSC) at criteria of 0.143 ( Figure S3A-C).

Modeling, rigid body fitting, and model refinement
We built a 3D atomic model of the human ASH2L protein by I-TASSER (Roy et al., 2010;Wei Zheng, 2019;Zhang, 2008) assisted by deep-learning based contact-map prediction (Li et al., 2019b). The fragment-guided molecular dynamics refinement software, FG-MD , was utilized to remove the steric clash between ASH2L model and other molecules and further refine the local structures ( Figure S6B).
Finally, our in-house EM-fitting software, EM-Ref (Zhang et al, in preparation), was used to fit the ASH2L model and other parts of human MLL1 core complex to the density maps to get final atomic models.
I-TASSER utilized LOMETS, which consisted of 16 individual threading programs , to generate templates as the initial conformation. Human ASH2L protein consisted of three domains, while the 2 nd , 3 rd domains (Linker-IDR and ASH2L SPRY ) and C-terminal SDI motif can be covered by templates (PDB ID: 6E2H and 6CHG, B chain, crystal structure of the yeast SET1 H3K4 methyltransferase catalytic module (Hsu et al., 2018)) in most of the top threading alignments. The 1 st domain (PHD-WH domain) was covered by another template (PDB ID: 3S32, A chain, the crystal structure of ASH2L N-terminal domain) (Sarvan et al., 2011). Therefore, these three proteins were used as the main templates for building the full-length ASH2L model, where structural assembly simulation was guided by the contact-maps from the deeplearning program, ResPRE . Finally, the first model of I-TASSER was selected as the potential ASH2L model, where the estimated TM-score (Xu and Zhang, 2010) for the C-terminal domain is 0.71±0.12, suggesting that the confidence of the I-TASSER model is high. Superposing ASH2L model (Linker-IDR and ASH2L SPRY ) with the experimental structure (ASH2L SPRY ) is shown in Figure 4A.
Monte Carlo (MC) simulation was employed to fit and refine the complex model structures based on the experimental density map. During the MC simulations, individual domain structures were kept as the rigid-body, where global translation and rotation of the domains were performed, which would be accepted or rejected based on Metropolis algorithm (Binder et al., 1993). The total number of translation and rotation was 50,000 in the MC simulation. The MC energy function used in the simulation was a linear combination of correlation coefficient (CC) between structural models and the density map data and the steric clashes between the atomic structures, i.e., Where ρ c (y) was the calculated density map on grid (DiMaio et al., 2009). ρ o (y) was obtained from the experimental density map. and were the average of calculated density map and experimental density map, respectively. DM and L represented tbe density map and the length of protein, respectively. d ij was the distance between the two atoms i and j. r ij was the sum of their van der Waals atomic radii and ԑ ij was the combined well-depth parameter for atoms i and j, which were all taken from the CHARMM force field (MacKerell Jr et al., 1998). w 1 =100 and w 2 =1 were the weights for correlation coefficient item and clash item, respectively.
For the nucleosome model, the crystal structure of nucleosome (PDB ID:3MVD) (Makde et al, 2010) was used for rigid-body fitting. In the cryo-EM structure of RbBP5-NCP, the histone H4 tail region was manually rebuilt where the density allowed using the program COOT (Emsley et al., 2010). Three model structures of MLL1 RWSAD -NCP, RbBP5-NCP, and MLL1 RWS -NCP were subjected to the real-space refinement using PHENIX after rigid-body fitting. Validations of three model structures were performed by MolProbity . The final structures were further validated by calculating map-model FSC curves using phenix.mtriage in the PHENIX program package ( Figure S3D) (Afonine et al., 2018). The computed FSC between the model and  Figure S3D. Statistics for data collection, refinement, and validation summarized in Table 1.