Mechanism of replication origin melting nucleated by CMG helicase assembly

The activation of eukaryotic origins of replication occurs in temporally separated steps to ensure that chromosomes are copied only once per cell cycle. First, the MCM helicase is loaded onto duplex DNA as an inactive double hexamer. Activation occurs after the recruitment of a set of firing factors that assemble two Cdc45–MCM–GINS (CMG) holo-helicases. CMG formation leads to the underwinding of DNA on the path to the establishment of the replication fork, but whether DNA becomes melted at this stage is unknown1. Here we use cryo-electron microscopy to image ATP-dependent CMG assembly on a chromatinized origin, reconstituted in vitro with purified yeast proteins. We find that CMG formation disrupts the double hexamer interface and thereby exposes duplex DNA in between the two CMGs. The two helicases remain tethered, which gives rise to a splayed dimer, with implications for origin activation and replisome integrity. Inside each MCM ring, the double helix becomes untwisted and base pairing is broken. This comes as the result of ATP-triggered conformational changes in MCM that involve DNA stretching and protein-mediated stabilization of three orphan bases. Mcm2 pore-loop residues that engage DNA in our structure are dispensable for double hexamer loading and CMG formation, but are essential to untwist the DNA and promote replication. Our results explain how ATP binding nucleates origin DNA melting by the CMG and maintains replisome stability at initiation.

Since the discovery of the double helix, molecular biologists have been asking how the separation of two DNA strands is nucleated after the initiation of chromosome replication. In vitro reconstitution of bacterial 2 , viral 3 and eukaryotic DNA replication 1 have started to address this question. By studying these systems, a universal role has been identified for ATP binding by multimeric enzymes that untwist the double helix at the start of replication. Existing atomic models of initiators and helicases bound to single-stranded DNA mimic the structure of origin DNA immediately after melting 2,4,5 . However, to understand the mechanism of the ATP-triggered opening of duplex DNA at the molecular level, the structure of an origin duplex caught in the act of nucleating a replication bubble must be obtained. To achieve the opening of origin DNA, bacteria 2 and eukaryotic viruses 3,6 use one single protein that oligomerizes around the double helix and causes its deformation, but such melting intermediates have not to our knowledge been structurally characterized so far. Origin opening in Saccharomyces cerevisiae requires not one, but thirty-two distinct polypeptides that act sequentially. First, the origin recognition complex (ORC) together with loading factors Cdc6 and Cdt1 recruit a set of two ring-shaped MCM helicases that form an inactive double hexamer around duplex DNA 7,8 . Activation requires the recruitment of two firing factors, Cdc45 and Go-Ichi-Nii-San (GINS) 1,[9][10][11] . To achieve this, the double hexamer is first phosphorylated by the Dbf4-dependent kinase (DDK). These changes are recognized by the Sld3-7 phosphoreader, which recruits Cdc45 to the double hexamer [11][12][13][14][15] . Sld3 is in turn phosphorylated by the Clb5-Cdc28 (CDK) kinase, which also phosphorylates the firing factor Sld2. Phospho-Sld2 and phospho-Sld3 bind Dpb11, which engages Pol ε and GINS to mediate their origin recruitment 11,12,16 . After ADP release and ATP binding by MCM, GINS and Cdc45 stably engage MCM, forming two distinct CMG assemblies that disrupt the double hexamer interface through an unknown mechanism. Topology footprint assays indicate that CMG formation leads to partial DNA untwisting, but whether base pairing is broken at this stage in origin activation remains to be determined. After the recruitment of Mcm10, the lagging-strand template is ejected from the MCM ring pore, which leads to the establishment of the replication fork and the ATPase-powered translocation along single-stranded DNA 1 . How the CMG selects the translocation strand in this context is unknown. Assembly of two CMGs at an origin disrupts the double hexamer interface 1 . Mapping the relative orientation of the two separated CMGs on the origin DNA is important to understand how replication forks are established bidirectionally and how replisome stability is maintained in the early stages of replication initiation [17][18][19][20][21][22] .

CMG assembly on chromatinized origin DNA
To understand how ATP-dependent CMG formation leads to double hexamer separation and DNA untwisting, we used electron microscopy to image origin-dependent CMG formation reconstituted in vitro with Article purified yeast proteins and in a near-native chromatin environment. To this end, we reconstituted CMG on ARS1 origin DNA flanked at both ends by a nucleosome assembled on strong Widom positioning sequences. Nucleosome capping of the naked, AT-rich ARS1 DNA recapitulates the architecture of chromatinized origins that is found in cells 23 and serves to trap double hexamers on duplex DNA, preventing dissociation by sliding 24 . Double hexamers were (i) loaded onto origin DNA using MCM-Cdt1, ORC, Cdc6 and ATP; (ii) phosphorylated with DDK in solution; and (iii) isolated using Strep-TactinXT-coated paramagnetic beads that capture a twin-strep tag on histone H3 of the nucleosome. After a high-salt wash that removes helicase loading intermediates and DDK, DNA-bound phosphorylated double hexamers were biotin-eluted and incubated with Sld3-7, Cdc45, Sld2, Dpb11, GINS, Pol ε, CDK and ATP to promote CMG formation ( Fig. 1a and Extended Data Fig. 1a,b). We analysed the full reaction by negative-stain electron microscopy (NS-EM) single-particle two-dimensional (2D) averaging, to find that on average 32% of double hexamers were converted to CMG. Of these, 70% were homo-dimeric and Pol-ε-engaged (dCMGE) and 19% were single CMGs. In most of the dCMGE particles, GINS-Cdc45 and Pol ε mapped on opposite sides around the MCM ring, giving rise to a trans configuration ( Fig. 1b and Extended Data Fig. 1c). The remaining dCMGE particles (11%) were in a cis configuration, with GINS-Cdc45-Pol ε located on the same side ( Fig. 1b and Extended Data Fig. 1c). In silico reconstitution (ReconSil), performed by overlaying nucleosome and MCM-containing 2D averages onto the corresponding particles in the raw micrographs, revealed that dCMGEs were tightly packed between the two flanking nucleosomes (Fig. 1b). A measured inter-nucleosome distance of 136 bp ± 23 bp (s.d.) (Fig. 1c) matches the expected 136 bp, supporting the notion that these represent bona fide reconstituted origins and not neighbouring particles bound to different DNA molecules. Similar results were obtained for chromatinized origins trapping a double hexamer (133 bp ± 18 bp (s.d.); Fig. 1c). We did not observe single CMGs trapped in between nucleosomes, suggesting that these may represent helicases that fell off the DNA (Fig. 1b). It is established that ARS1 DNA substrates capped by nucleosomes, or by covalently linked HpaII methyltransferase (MH) roadblocks, support double hexamer loading 24 (Fig. 1b,d). Given the tighter protein packing that is caused by the formation of dCMGEs on the ARS1 origin, we asked whether enough space is available between nucleosomes for two activated CMGE particles to cross paths during the establishment of the replication fork. To address this question, we reconstituted roadblocked origin replication in vitro using a minimal set of replisome factors (Fig. 1d,e), matching established conditions that support the replication of an ARS-containing 10.6-kb supercoiled plasmid 11 (Fig. 1f). DNA products separated by alkaline agarose gel electrophoresis showed that the nucleosome-flanked ARS1 substrate is copied in full (Fig. 1e). This is evident from the size of duplicated DNA, which is longer for the Widom-flanked ARS1, compared to a shorter construct in which nucleosomes are swapped for MH caps (Extended Data Fig. 1b and Fig. 1e). The tight packing of a dCMGE particle between two flanking nucleosomes raises the question of whether the CMGE dimer is a stable complex or whether it is formed as a result of the spatial constraints imposed by the two roadblocks that prevent dissociation. We reasoned that CMG assembly on longer, less crowded DNA substrates might allow enough space for the dCMGE complex to dissociate into two discrete CMGE particles. To test this hypothesis, we performed origin-dependent CMG assembly reactions on MH-capped 864-bp DNA substrates that contain an array of 6 consecutive ARS1 sequences separated by 40 bp of linker DNA. We only observed dCMGE particles and not separated CMGs on the array substrate, irrespective of the efficiency of double hexamer loading ( Fig. 1g and Extended Data Fig. 1d-g). Thus, stability of the dCMGE complex assembled during ATP-dependent double hexamer activation is independent of nucleosomes and independent of the position of flanking roadblocks, as well as the level of protein saturation of DNA.

Cryo-EM structure of the CMGE dimer
Our previous NS-EM work on origin-dependent CMG reconstitution in vitro involved high-salt treatment of protein-DNA tethered to  Article streptavidin-coated paramagnetic beads, followed by elution using DNA digestion 1 . This procedure disrupted CMGE dimers but not the double hexamer, which suggests that the conversion of double hexamer to dCMGE reconfigures and weakens the MCM dimerization interface. To understand the conformational transitions that occur upon CMG formation, we determined the cryo-electron microscopy (cryo-EM) structure of a dCMGE complex assembled on chromatinized ARS1 DNA (Extended Data Fig. 2a-c). Both C2 symmetry and asymmetric refinement yielded a structure with limited resolution (Extended Data Figs. 2d-f and 3). However, symmetry expansion approaches revealed that the two rings in the dimer are identical. Combined with three-dimensional (3D) classification, variability analysis and refinement, followed by iterative cycles of contrast transfer function (CTF) refinement and Bayesian polishing, this process yielded a structure at 3.5 Å resolution 25 (Fig. 2d). In fact, partial digestion with the restriction enzyme MseI promotes the disassembly of the dCMGE complex into separated CMGs ( Fig. 2e and Extended Data Fig. 1g). provide most of the leading-strand contacts, which explains how the double-hexamer-to-dCMGE transition leads to selection of the translocation strand (Fig. 3c). This leading-strand binding mode in dCMGE is consistent with previously reported structures of single-and duplex-DNA-engaged recombinant CMG (refs. 17,31-34 ), which indicates that interaction with the leading strand is conserved from initiation to termination (Fig. 3d). Given the sparse pore-loop contacts, the lagging-strand template appears to be in turn poised for ejection from the MCM ring pore, which is required to achieve the establishment of the replication fork. How nucleotide-triggered conformational changes in MCM affect the structure of duplex DNA will be discussed in the next paragraph.

Mechanism of DNA-bubble nucleation
Captured between the ATPase and the N-terminal pore loops in the dCMGE complex, a stretch of seven base pairs is underwound.  The density for the selected translocation strand (red on the right) has been extracted from the duplex DNA density (grey on the left). d, The leading-strand template extracted from the dCMGE structure superposed on the yeast CMG translocating on a DNA fork reconstituted on an artificial DNA fork (PDB 6U0M), bound to the fork stabilization complex (PDB 6SKL) or bound to SCF Dia2 and duplex DNA (PDB 7PMK). lagging strand, three flipped-out bases can be confidently built (Fig. 4a), which are stabilized by two conserved residues (T423 and R424) located within the Mcm6-specific insertion of the N-terminal pore loop (ʻMcm6 wedge ̓ ; Fig. 4b and Extended Data Fig. 5b,c,e). Together, our data show that the double-hexamer-to-dCMGE transition not only promotes the untwisting of duplex DNA but also the disruption of at least three consecutive base pairs, with the resulting orphan bases being stabilized by an Mcm6 pore-loop element. Two separate bubbles are nucleated inside the two MCM rings across the dCMGE, which remain separated by 1.5 turns of exposed duplex DNA.
Structural changes in the ATPase pore loops explain how the double hexamer-to-dCMGE transition leads to DNA untwisting. Amongst several MCM-DNA interactions that are summarized in Extended Data Fig. 5a-d, we identified K587 on the Mcm2 helix-2 insert (h2i) pore loop as one of a few elements that maintain the same DNA contact in both the double hexamer and the dCMGE complex. As shown in Supplementary Video 4, this element appears to push on the lagging-strand template, contributing to the deformation of duplex DNA. Additional DNA contacts involve five conserved residues on Mcm2 that pull on the leading-strand template (V580, K582, P584 and W589 in h2i, as well as K633 in the pre-sensor 1 β-hairpin, PS1BH; Extended Data Figs. 5f and 6a). These contacts are found in the dCMGE complex but are absent in the double hexamer and widen the minor groove, which decreases the superhelical DNA density (Fig. 4c). An Mcm2 variant (Mcm2 6A), which targets all of the ATPase-DNA contacts that are observed in the dCMGE complex, can load double hexamers to wild-type levels but is completely defective for replication (Fig. 1d,f). To establish whether this defect is due to the inability of Mcm2 6A to untwist DNA upon CMG assembly, we loaded double hexamers on a 616-bp circular DNA that contains ARS1 and added a full complement of wild-type firing factors,   Fig. 1c) to reduce background single CMG particles that are present owing to incomplete roadblocking of DNA. dCMGE is the product of CMG assembly using wild-type proteins, whereas Mcm2 6A primarily forms single CMGs. A minority of particles are compatible with dCMG or dCMGE formation (although in the latter, Pol ε occupancy is only partial (dCMG(E))). f, Mcm2 6A MCMs are converted to CMG at wild-type levels. P values were determined by two-tailed Welch's t-test; NS, not significant. This experiment was performed three times. Error bars, mean ± s.d. g, Mcm2 6A disrupts the dCMGE dimer mostly into single isolated CMGs. P values were determined by two-tailed Welch's t-test; **P = 0.0030. This experiment was performed three times. Error bars, mean ± s.d.

Article
or selected dropout controls, in the presence of TopoI ( Fig. 4d and Extended Data Fig. 6b,c). As previously described, three topoisomers, α, α−1 and α+1, were visible in the absence of DDK, indicating that there was no MCM-dependent change in DNA topology 1 . Omission of Mcm10 with wild-type MCM led to an additional accumulation of negatively supercoiled topoisomers −2 and −3, which are indicative of initial DNA untwisting 1 consistent with that seen in the dCMGE structure. When all firing factors were present with wild-type MCM, a robust accumulation of −2 through to −6 negative supercoils was detected (lane 3), indicating additional DNA untwisting that arises after the ejection of the lagging strand from CMG. None of the topoisomers that are associated with either initial untwisting or full activation appeared when MCM containing the Mcm2 6A variant was assayed. This indicates that DNA engagement by Mcm2-as observed in our dCMGE structure-is essential for the initial untwisting of DNA and the subsequent ejection of the lagging strand from CMG. NS-EM analysis of the same Mcm2 6A mutant revealed that double hexamers can efficiently be converted to CMG (Fig. 4e-f); however, most fail to homo-dimerize and form complete dCMGE complexes (Fig. 4g). Hence, DNA binding by CMG is important for the stability of the dCMGE structure, in agreement with our observation that partial DNA digestion disrupts CMGE dimerization (Fig. 2e). The discovery that a MCM mutant that is unable to untwist DNA and support origin-dependent replication is competent in single but not double CMG formation supports a functional role for CMG dimerization during replication initiation.

Open DNA stabilized as the MCM dimer splits
CMG assembly leads to the disruption of the double hexamer interface, but how this is linked to the ATPase state of MCM is unclear 1 . When comparing the double hexamer and the dCMGE structures, we observed that the Mcm6 wedge insertion, which stabilizes the lagging-strand orphan bases in the dCMGE complex, is retracted and contributes to stabilizing the dimerization interface in the double hexamer (Fig. 5a). Residues T423 and R424 in the wedge insertion are surface-exposed and face on the outer perimeter of MCM in the double hexamer. As the DNA becomes untwisted, the Mcm6 wedge insertion disengages from the double hexamer interface and enters the helicase ring lumen in the dCMGE complex (Fig. 5b,c and Supplementary Video 5). In agreement with this observation, a combined T423E/R424E mutation (Mcm6 2E) supports MCM loading onto origin DNA to wild-type levels but negatively affects replication (Fig. 1d,f) (Fig. 1d,f). By combining our comparative structural and mutagenesis analyses, we propose a model whereby changes in the ATPase state promote dCMGE complex formation, which in turn couples melting of duplex DNA and splitting of the double hexamer. ADP release and ATP binding in the ATPase tier promotes the concerted movement of the h2i pore loops. Amongst these, Mcm2 h2i pulls on the leading-strand and pushes on the lagging-strand template, promoting duplex DNA untwisting, whereas Mcm4 h2i releases a latch that pins the N-terminal Mcm6 wedge insertion that is packed at the double hexamer homo-dimerization interface. As the latch is released, the Mcm6 wedge can swing inside of the MCM pore and stabilize the orphan bases that become exposed after the disruption of DNA base pairing in the untwisted DNA duplex. Supplementary Video 5 describes these structural transitions.

Discussion
To ensure that replication occurs bidirectionally, several steps along the origin activation pathway in eukaryotic cells occur symmetrically. Symmetry is first established in G1 with the concerted and sequential loading of an inactive double hexamer 7,8 . During this process, origin loading of a first MCM hexamer creates a binding site for the symmetric loading of the second hexamer 24 . After entry into S phase, recruitment of firing factors that activate MCM depends on the phosphorylation of MCM by DDK (refs. 11,[35][36][37][38], which selectively targets fully loaded helicases by recognizing the symmetric structure of the double hexamer 39 . After the establishment of the replication fork, one of the two strands of the origin duplex is ejected from each CMG, and becomes the translocation strand of the opposing CMG. This symmetric fail-safe mechanism ensures that replication starts only if both helicases have been fully activated 1,40 . By imaging CMG caught in the act of nucleating origin DNA melting, we have identified yet another symmetric event on the path to origin activation. In fact, although CMGE formation disrupts the double hexamer interface, we found that the complex maintains a two-fold symmetric character, by forming a CMG dimer that is stabilized by both protein-protein interactions and DNA gripping. The CMGE dimer provides a head-to-head roadblock that limits ATPase-powered unidirectional translocation before the lagging strand is ejected, thus explaining previous observations that CMG formation and DNA untwisting at origins require ATP binding but not hydrolysis 1 . A CMGE dimer also explains why the CMG assembled around the origin DNA duplex during initiation is protected from disassembly before lagging-strand ejection 22 . In fact, whenever CMG transitions from engaging a fork to a DNA duplex, an MCM-binding site becomes accessible for the E3 ubiquitin ligase, SCF Dia2 (in yeast, or CUL2 LRR1 in Metazoa), which sends the replisome to Cdc48-mediated disassembly. MCM ubiquitylation in duplex-engaged CMG occurs either upon termination of DNA replication or when the replisome engaged in fork progression encounters a nick on the lagging strand 17,19,20,41,42 . However, SCF Dia2 cannot target the duplex-engaged replisome at initiation 22 . We now understand that this is because the dCMGE sterically impedes the docking of the E3 ligase onto MCM. In fact, when CMGE-SCF Dia2 is superposed to our dCMGE structure, an extensive steric clash can be identified between the E3 ligase engaged to one ring and the Mcm3 subunit from the opposed ring in the CMG dimer (Extended Data Fig. 7).
The dCMGE nucleates two DNA bubbles inside each MCM ring, separated by 1.5 turns of duplex DNA, which might serve for the concerted recruitment of fork establishment factors, including Mcm10, RPA and Pol α. Mcm10 is known to trigger the ejection of the lagging strand and the ATPase-powered unwinding of the replication fork 1 . Although the mechanism of origin activation remains unknown, we note that Mcm10 engages the same N-terminal MCM elements 43,44 that mediate CMGE dimerization in our structural intermediate. A model for origin activation is presented in Extended Data Fig. 8 and further discussed in the Supplementary Discussion. Studies will be needed to establish whether Mcm10 engagement further disrupts the CMGE dimer interface, thereby releasing the inhibitory interaction that impairs ATPase-powered DNA translocation and allowing helicase bypass.
DNA replication, transcription and recombination all require the untwisting and opening of the double helix. Recent studies have described these processes in the transcription pre-initiation complex that supports RNA synthesis 45,46 and in the recombinases that promote strand exchange 46,47 . By contrast, the mechanism for the nucleation of DNA melting at an origin of replication has remained-to our knowledge-unknown for decades. Our work fills this gap. We describe the structure of the CMG replicative helicase assembled sequentially onto the ARS1 origin, by reconstituting a multistep cellular process that involves 32 polypeptides 1 . Base-pair disruption involves ATP-triggered changes in MCM that promote pulling of the leading-strand and pushing of the lagging-strand template DNA. Our findings provide a framework in which to study replication initiation.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-022-04829-4.

Cloning, expression and purification of Mcm2-7-Cdt1 mutants.
Designed DNA fragments (Supplementary Table 1 Table 3) used to overexpress Mcm2-7-Cdt1 mutants. The oMG25 DNA fragment was subcloned from pMG39 to pAM38 using MluI and XbaI restriction sites to obtain pMG69, which was integrated into the yJF21 yeast strain, thus generating the yAE164 strain that was used to overexpress the Mcm2 6A mutant (Mcm2 V580A/K582A/P584A/ K587A/W589A/K633A). The oMG27 DNA fragment was subcloned from pMG43 to pJF4 using BsiWI and SphI restriction sites to obtain pMG53, followed by the integration of pMG53 into the yAM20 strain, yielding the yAE160 strain, which was used for overexpression of the Mcm6 2E mutant (Mcm6 T423E/R424E). The oMG28 DNA fragment was subcloned from plasmid pMG44 to pJF4 using BsiWI and SphI restriction sites, thus obtaining plasmid pMG54. The pMG54 plasmid was integrated into the yAM20 strain, yielding the yAE161 strain that was used to overexpress the Mcm6 5E mutant (Mcm6 T408E/Q409E/L410E/ G411E/L412E). All Mcm2-7-Cdt1 mutants were purified essentially as wild type 50 . Table 1. BL21(DE3)-CodonPlus-RIL cells (Agilent) were transformed with GINS expression plasmid (pJL003). Transformant colonies were inoculated into a 250-ml LB culture containing kanamycin (50 µg ml −1 ) and chloramphenicol 35 µg ml −1 ), which was grown overnight at 37 °C with shaking at 200 rpm. The following morning, the culture was diluted 100-fold into 6× 1 l of LB with kanamycin (100 µg ml −1 ) and chloramphenicol (35 µg ml −1 ). The cultures were left to grow at 37 °C until an optical density at 600 nm (OD 600 nm ) of 0.5 was reached; 0.5 mM isopropyl β-d-1-thiogalactopyranoside (IPTG) was added to induce expression and cells were left shaking for 3 h. Cells were collected by centrifugation at 4,000 rpm for 20 min in a JS. Cloning, expression and purification of MH. The codon-optimized expression sequence for MH containing a HRV 3C protease cleavage site followed by a twin-strep tag was synthesized and cloned into pET302 by GeneWiz Synthesis (pJL004). T7 express cells (NEB) were transformed with pJL004. Transformant colonies were inoculated into a 250-ml LB culture with ampicillin (100 µg ml −1 ), which was grown overnight at 37 °C with shaking at 200 rpm. The following morning, the culture was diluted 100-fold into 6× 1 l of LB with ampicillin (100 µg ml −1 ). The cultures were left to grow at 37 °C until an OD 600 nm of 0.5 was reached; 0.5 mM IPTG was added to induce expression and cells were left shaking for 3 h. Cells were collected by centrifugation at 4,000 rpm for 20 min in a JS.4.2 rotor (Beckman). For lysis, cell pellets were resuspended in 80 ml of lysis buffer (20 mM Tris-HCl pH 8.5, 10% glycerol 0.5 mM EDTA, 500 mM KCl, Roche protease inhibitor tablets and 2 mM tris(2-carboxyethyl) phosphine (TCEP)) + 0.7 mM PMSF. The lysate was sonicated for 120 s (5 s on, 5 s off) at 40% on a Sonics Vibra-Cell sonicator. Insoluble material was removed by centrifugation at 20,000 rpm for 30 min in a JS.25.50 rotor (Beckman). The supernatant was loaded by gravity onto a 5-ml Strep-TactinXT column (IBA). The resin was washed extensively with lysis buffer. MH was eluted by the addition of 12 ml of 1× BXT (IBA) supplemented with 10% glycerol and 1 mM DTT. The MH-containing fractions were pooled and loaded onto a HiLoad 16/600 Superdex 75 equilibrated in gel filtration buffer (20 mM Tris-HCl pH 8.5, 10% glycerol 0.5 mM EDTA, 100 mM KCl and 0.5 mM TCEP). MH-containing fractions were pooled, aliquoted and snap-frozen in liquid N 2 . About 36 mg MH was purified from a 6-litre culture.

DNA templates
The native ARS1 origin of replication flanked by Widom 601 and 603 sites or MH-flanked was amplified by PCR and purified as previously described 24 . The 6× ARS1 array (pSSH005) was assembled by inserting an array of 6 ARS1 origins with 40-bp spacing flanked by MH sites using NEBuilder HiFi assembly. The 6× ARS1 origin array was amplified from pSSH005 using primer oSSH038 and concentrated by ethanol precipitation. A list of primers and DNAs used is included in Supplementary Table 1.
Preparation and purification of chromatinized origin DNA. Soluble yeast nucleosomes were reconstituted from octamers and DNA by salt gradient dialysis in several steps from 2 to 0.2 M NaCl as previously described 24 . Following nucleosome refolding, a final dialysis step was performed into loading buffer (25 mM HEPES-KOH pH 7.6, 80 mM KCl, 100 mM sodium acetate, 0.5 mM TCEP) and loaded onto a Superose 6 Increase 3.2/300 column equilibrated in the same buffer. Fractions containing ARS1 origin DNA bound by 2 nucleosomes were pooled, concentrated, and stored at 4 °C. Reconstitution conditions were optimized by small-scale titration and nucleosomes checked by 6% native PAGE. Fractions containing MH-conjugated array DNA were pooled, concentrated and stored at 4 °C. Conjugations were checked by 6% native PAGE.

616-bp ARS1 circles.
The 616-bp ARS1 circles were assembled and prepared as previously described 1 with the following modifications. The dephosphorylation step was performed with the use of quickCIP, instead of Antarctic phosphatase, for 30 min at 37 °C followed by enzyme inactivation at 80 °C for 2 min. After the ligation step, the DNA was concentrated as described and incubated with T5 exonuclease (NEB; 37 °C for 1 h) to eliminate non-ligated DNA. Ethanol precipitation, agarose electrophoresis and electroelution were omitted; instead, phenol/chloroform/isoamyl-alcohol extraction was performed, followed by ethanol precipitation using sodium acetate (pH 5.1) and the neutral carrier GeneElute Linear Polymer (LPA, MERCK). For experiments in which DNA was partially digested after the CMG formation reaction, MseI (NEB) was added at a concentration of 0.1 U diluted in 1× loading buffer. Incubation was performed for 10 min at 30 °C before applying to EM grids.

In vitro DNA replication assays
Replication assays were performed as described previously 52  For replication reactions with linear DNA (Fig. 1) Pol ε exo-was used instead of Pol ε wild type to reduce end labelling and the concentration of deoxynucleotides was modified (that is, 30 µM dCTP, 30 µM dGTP, 30 µM dTTP, 30 µM dATP and 100 nM α 32P -dCTP). The reactions were stopped by EDTA after 15 and 30 min for reactions with 10.6-kb supercoiled DNA or after 20 min for reactions with short linear DNA substrates and processed as described 51,52 . The replication products were separated using 0.8% agarose alkaline gel for 17 h at 25 V for reactions with 10.6-kb supercoiled DNA. For reactions with short DNA substrates, samples were separated using 2% agarose alkaline gel for 4 h at 38 V. The image signal from Fig. 1e was background-subtracted in Fiji using the subtract background algorithm in Fiji v.2.0.0 (ref. 53 ).

DNA topology assay
The experiment was performed as described previously 1  After processing the reactions as described previously 1 , Ficoll 400 (final concentration was 2.5%) and Orange G were used to load the sample onto a native 3.5% bis-polyacrylamide gel (1× TBE) and separation was carried out for 21 h at 90 V using Protean II XL Cell apparatus (Bio-Rad) at room temperature. The 0.7-mm gel was dried (without fixation) at 80 °C for 105 min, exposed to a phosphor screen and scanned with the use of Typhoon phosphor imager.

Sample preparation and data collection for NS-EM
NS-EM sample preparation was performed on 400-mesh copper grids with carbon film (Agar Scientific). Grids were glow-discharged for 30 s at 45 mA using a K100X glow discharge unit (Electron Microscopy Sciences) before a 4-µl sample was applied to the grids and incubated for 2 min. Grids were stained by two successive applications of 4 µl 2% (w/v) uranyl acetate with blotting between the first and second application. Stained grids were blotted after 20 s to remove excess stain. Unless described otherwise, data collection was carried out on a Tecnai LaB6 G2 Spirit transmission electron microscope (FEI) operating at 120 keV. A 2K × 2K GATAN Ultrascan 100 camera was used to collect micrographs at a nominal magnification of 30,000 (with a physical pixel size of 3.45 Å per pixel) within a −0.5 to −2.0 µm defocus range.

NS-EM image processing
A subset of particles was manually picked using RELION-3.1 (ref. 26 ) and used as a training dataset for Topaz training 53 . Subsequent image processing was performed using RELION-3.1. The CTF of each micrograph was estimated using Gctf (ref. 54 ) and particles were extracted and subjected to reference-free 2D classification in RELION-3.1.

ReconSil image processing
For ReconSil experiments, image processing was carried out as detailed above. Reference-free 2D classification in RELION generates both 2D class averages and star files detailing the class assignment, particle coordinates and transformations (translations and rotations) applied to the raw particles for alignment. 2D averages are superposed on the raw micrographs, overlaid on the particles that contributed to their generation. This yielded signal-enhanced 'ReconSiled' micrographs reconstituting the context of complete origins of replication. ReconSiled micrographs were used for the selection and rejection of origin nucleoproteins for further analysis.

ReconSil data analysis and statistics
ReconSiled origins were analysed as previously described 24 . In brief, ReconSiled micrographs were used to re-extract particles of interest in RELION. Selected particles were manually classified for statistical analysis. Measurements of ReconSiled origins were performed manually using Fiji 55 and plotted in GraphPad Prism v.9.2.0.

Sample preparation and data collection for cryo-EM
CMG assembly reactions (reconstituted as described in 'In vitro CMG assembly on short chromatinized origins') were frozen on 400-mesh lacey grids with a layer of ultra-thin carbon (Agar Scientific). All grids were freshly glow-discharged for 1 min at 45 mA using a K100X glow discharge unit (Electron Microscopy Sciences) before plunge freezing. Samples were prepared by applying 4 µl of undiluted CMG assembly reactions for 2 min on a grid equilibrated to 25 °C in 90% humidity. The grid was blotted for 4.5 s and plunged into liquid ethane. Data collection was performed on an in-house Thermo Fisher Scientific Titan Krios transmission electron microscope operated at 300 kV, equipped with a Gatan K2 direct electron detector camera (Gatan) and a GIF Quantum energy filter (Gatan). Images were collected automatically using the EPU software (Thermo Fisher Scientific) in counting mode with a physical pixel size of 1.08 Å per pixel, with a total electron dose of 51.4 electrons per Å 2 during a total exposure time of 10 s dose-fractionated into 32 movie frames (Extended Data Table 1). We used a slit width of 20 eV on the energy filter and a defocus range of −2.0 to −4.4 µm. A total of 65,286 micrographs were collected from two separate sessions.

Cryo-EM image processing
Data processing was performed using RELION-3.1 (ref. 26 ) and cryoSPARC v.3.2 (ref. 56 ) (Extended Data Fig. 3). The movies for each micrograph were first corrected for drift and dose-weighted using MotionCorr2 (ref. 57 ). CTF parameters were estimated for the drift-corrected micrographs using Gctf within RELION-3.1 (ref. 54 ). Dataset one was first processed separately and combined with dataset two at a later stage.
For the first dataset, particles were picked using a manually curated particle set as a template in crYOLO v.1.7.5 (ref. 58 ). These particles were binned by 2 and extracted with a box size of 360 pixels for 2D and 3D classification. A subset of 1,600 representative particles across the entire defocus range was selected. Picks in areas of obvious particle aggregation were removed along with particles located on the carbon lace. A Topaz 53 model was then iteratively trained on the remaining particles. All particles were re-picked with the Topaz model with the default score threshold of 0 for particle prediction. The two datasets were combined and a total of 927,109 particles were picked, binned by 2 and extracted with a box size of 360 pixels. We carried out 2D classification to remove remaining smaller particles and contaminants. We subjected the remaining particles to 3D multi-reference classification with four sub-classes, angular sampling of 7.5°, a regularization parameter T of 5 using low-pass-filtered initial models from previous ab initio and processing steps on dataset 1 of dCMGE complexes, and double hexamer model generated from EMD-3960 (Extended Data  Fig. 3). The resulting 133,262 (trans-dCMGE) and 46,049 (cis-dCMGE) particles with density corresponding to Pol ε on both CMG molecules were un-binned and refined to yield maps with resolutions of 7.7 and 14.4 Å. C2 symmetry imposition did not improve the quality of the maps. The 133,262 trans-dCMGE particles were imported into cryoSPARC and subjected to multiple rounds of non-uniform refinement, heterogenous 3D classification and non-uniform local refinement, yielding a map at approximately 8 Å (Extended Data Fig. 3). Attempts to improve cis-dCMGE were unsuccessful given the limited particle numbers. As expected, these reconstructions do not show secondary structural features owing to the conformational heterogeneity between the two CMGE molecules bound by flexible DNA. We applied a C2 symmetry expansion procedure to both trans-and cis-dCMGE particles (179,311) with re-centring on one CMGE in RELION and combined all particles. We also downsized the box size to 512 pixels during this process to speed up downstream processing. Following this, masked 3D refinement with local searches in C1 of the centred single CMGE (consisting of 358,622 particles) was refined to 4.2-Å resolution. These particles were subjected to several rounds of CTF refinement and two rounds of Bayesian polishing. After this, CTF-refined and polished particles were refined with local searches in C1 with a mask encompassing the entire CMGE density to 3.6-Å resolution. To better resolve the DNA inside the MCM central channel, densities corresponding to Cdc45, GINS and Pol ε were subtracted in RELION. Signal-subtracted particles were analysed by 3D variability analysis in cryoSPARC (ref. 56 ). A subset of 71,348 particles was selected based on the quality of DNA density. These signal-subtracted particles were subsequently reverted to the original particles and refined using local searches in C1 using local searches to 3.5-Å resolution.
All refinements were performed using fully independent data half-sets and resolutions are reported based on the Fourier shell correlation (FSC) = 0.143 criterion (Extended Data Fig. 2). FSCs were calculated with a soft mask. Maps were corrected for the modulation transfer function of the detector and sharpened by applying a negative B-factor as determined by the post-processing function of RELION or in cryoSPARC. The final RELION half-maps were used to produce a density modified map using the PHENIX Resolve CryoEM (refs. 28,59 ). This 3.4-Å map showed significant improvements for side chain and DNA density as well as for overall interpretability. Local-resolution estimates were determined using PHENIX or cryoSPARC (Extended Data Fig. 2f,j). The conversions between cryoSPARC and RELION files were performed using the UCSF pyem v.0.5 package 60 .
Model building and refinement CMG (from PDB 6SKL) 31 , Pol2 subunit (from PDB 6HV9) 33 and a homology model of the N-terminal domain of Dpb2 obtained from the Phyre2 server 61 were docked initially into the cryo-EM map produced from Resolve CryoEM, using USCF Chimera, and refined against the map using Namdinator 62 as a starting point for modelling with Coot v.0.9.1 (ref. 63 ). The DNA and the MCM5 winged helix domain were built de novo. The register of origin DNA engagement of dCMGE is heterogeneous because MCM double hexamers can slide along duplex DNA before dCMGE is formed. For this reason we could not build the origin DNA sequence with certainty and modelled polyA:polyT DNA instead. The resulting model was then subjected to an iterative process of real-space refinement using Phenix.real_space_refinement 64 with geometry and secondary structure restraints and base-pairing and base-stacking restraints where appropriate, followed by manual inspection and adjustments in Coot. The geometries of the atomic model were evaluated by the MolProbity webserver 65 .

Map and model visualization
Maps were visualized in UCSF Chimera 66 and ChimeraX 67 and all model illustrations and morphs were prepared using ChimeraX or PyMOL.

Statistics and reproducibility
Statistical analysis was performed using a two-tailed Welch ̓ s t-test in GraphPad Prism v.9.2.0. No statistical methods were used to predetermine sample size. The experiments were not randomized, and investigators were not blinded to allocation during experiments and outcome assessment.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability
Data supporting the findings of this study are available within the paper and its Supplementary Information files. Cryo-EM density maps of the CMGE dimer complex have been deposited in the Electron Microscopy Data Bank (EMDB) under the accession code EMD-13988. The cryo-EM density map of the symmetry-expanded CMGE monomer has been deposited in the EMDB under the accession code EMD-13978. Atomic coordinates have been deposited in the PDB with the accession codes 7QHS (symmetry-expanded CMGE monomer) and 7Z13 (monomer docked into the CMGE dimer map).