Core architecture of a bacterial type II secretion system

Bacterial type II secretion systems (T2SSs) translocate virulence factors, toxins and enzymes across the cell outer membrane. Here we use negative stain and cryo-electron microscopy to reveal the core architecture of an assembled T2SS from the pathogen Klebsiella pneumoniae. We show that 7 proteins form a ~2.4 MDa complex that spans the cell envelope. The outer membrane complex includes the secretin PulD, with all domains modelled, and the pilotin PulS. The inner membrane assembly platform components PulC, PulE, PulL, PulM and PulN have a relative stoichiometric ratio of 2:1:1:1:1. The PulE ATPase, PulL and PulM combine to form a flexible hexameric hub. Symmetry mismatch between the outer membrane complex and assembly platform is overcome by PulC linkers spanning the periplasm, with PulC HR domains binding independently at the secretin base. Our results show that the T2SS has a highly dynamic modular architecture, with implication for pseudo-pilus assembly and substrate loading.

T he bacterial T2SS is found in human pathogens such as Acinetobacter baumannii, Chlamydia trachomatis, Escherichia coli and Vibrio cholerae 1 . It secretes a broad repertoire of substrates including digestive enzymes and infective agents like the cholera and heat-labile (LT) toxins 2 . Between 12 and 15 genes in a single operon usually encode the majority of T2SS components. Whilst the soluble domains for many of these proteins have been solved by X-ray crystallography 3,4 , their relative stoichiometry, mode of association with binding partners, and temporal coordination for assembling a functional secretion apparatus is still poorly understood.
The protein GspD forms a 15-fold rotationally symmetric pore termed the secretin that inserts into the outer membrane (OM) and provides a conduit for substrate into the external environment. OM insertion is usually dependent on a lipidated pilotin 5 , which binds to the GspD C-terminal S-domain with 1:1 stoichiometry 6,7 . The pilotin gene is often chromosomally discrete from the main T2SS operon. Multiple recent highresolution cryo-EM structures report the partial secretin architecture [8][9][10] , and in complex with the pilotin 7 . However, the entire secretin has not yet been fully resolved due to disorder in the periplasmic N0 and N1 domains.
Within the inner membrane (IM), GspL and GspM are bitopic and monotopic membrane proteins, respectively, that together form homo-and hetero-dimers 11 . Combined with the polytopic membrane protein GspF 12 and the ATPase GspE, these proteins form an assembly platform (AP) for the pseudo-pilus 13 . The pseudo-pilus constitutes a helical filament that extrudes fully folded substrate through the secretin channel. The relative stoichiometry and overall ultrastructure of the AP is unknown. GspE is a cytoplasmic AAA+ ATPase that energises the T2SS and drives pilin assembly and pseudo-pilus formation. The active state is considered to be a hexameric ring as ATP turnover is significantly upregulated in an artificially oligomerized GspE-Hcp1 fusion 14 . The homologous ATPases PilB and PilT in the closely related type IV pilus (T4P) system also function as hexamers 15,16 . Ultimately, the functional oligomeric state and stoichiometry of GspE within the T2SS apparatus has not yet been determined. The GspE N-terminal N1E domain connects to the N2E domain with an extended linker. A shorter but known flexible linker connects the N2E domain to the C-terminal CTE ATPase domain 17 . Such inherent flexibility within GspE is predicted to facilitate large-scale conformational changes. The N1E domain of GspE forms a 1:1 stoichiometric complex with the cytoplasmic domain of GspL 17,18 . These two proteins contact GspF 13,19 , which is predicted to reside centrally within the AP. Concerted interplay between the GspE, GspL and GspF complex are thought crucial for coupling GspE conformational changes to the mechanical loading of pilin subunits within the pseudo-pilus assembly 20,21 . As GspL is a bitopic membrane protein, the direct contact between GspE and GspL also represents a mechanism for enabling cross-talk across the IM to other periplasmic components such as GspM. The coupling of the AP and outer membrane complex (OMC) across the cell envelope is mediated by GspC, where the GspC N-terminus associates with GspL and GspM within the IM 11 . The C-terminus of GspC must then span the periplasm as the GspC HR domain binds the GspD N0 domain 22 . Despite the known interaction between GspC HR domain and GspD N0 domain, the precise arrangement of GspC HR domain at the base of the secretin remains unclear 22 . Similarly, understanding how GspC HR domains bind to the secretin is important for determining how the likely symmetry mismatch between AP and OMC components is overcome.
Here we isolate an assembled T2SS so that both OM and IM components are captured together. Using a fusion of cryo and negative stain EM, as well as stoichiometry measurements, we provide a reconstruction of the entire OMC and a model for the cytoplasmic components of the AP. Combined they reveal a glimpse at the core ultrastructure of this cell envelope spanning nanomachine.

Results
Purification, EM and stoichiometry of Pul CDELMNS . The T2SS from the human pathogen K. pneumoniae HS11286 strain comprises 13 genes in a single unidirectional operon termed PulC through to PulO (Fig. 1a). Note that Pul and Gsp nomenclature relate to equivalent proteins in homologous T2SS systems. The pilotin PulS is located in a separate position within the chromosome. These 14 genes were cloned and over-expressed in E. coli. Using affinity chromatography tags positioned on the cytoplasmic ATPase PulE and the periplasmic pilotin PulS, a complex containing seven components was purified by two successive pulldowns. Glutaraldehyde stabilisation was included after the initial pulldown. The complex comprised PulC, PulD, PulE, PulL, PulM, PulN, and PulS, and is here termed Pul CDELMNS (Fig. 1b). SDS-PAGE band identification was confirmed by LC-MS/MS. It is unclear why PulF and pseudo-pilus components were not copurified with Pul CDELMNS . Their inclusion may require intact membrane for stabilisation. Co-expression of PulA substrate or pulG knockout did not promote their inclusion. Trace quantities of GspJ and GspK were identified on the gel by LC-MS/MS suggesting the pseudo-pilins were at least partially expressed. Visualisation of Pul CDELMNS by negative stain EM yielded parti-cles~40 nm long and 17-22 nm wide (Fig. 1c). The OMC PulD secretin was readily identifiable within 2D class averages. Hanging beneath the OMC and separated by a 5-10 nm gap the IM AP was observed. Highly flexible linkers connect the OMC and AP so that these two assemblies effectively constitute independent particles tethered together. The Pul CDELMNS complex was vitrified on thin carbon film and imaged by cryo-EM (Fig. 1c). 2D class averages of Pul CDELMNS yielded well resolved side views of the OMC. All domains of PulD were identifiable with additional densities observed at the base of the secretin where the PulC HR domain was expected to bind to the N0 domain 22 , and where PulS decorates the exterior of the secretin core 7 (Fig. 1c). The AP was not resolved here due to high flexibility and averaging effects. In the absence of glutaraldehyde, the OMC sometimes separated from the AP and yielded top views, which confirmed PulD and PulS in 1:1 stoichiometric ratio 7 with C15 symmetry (Fig. 1c). The relative stoichiometry of the IM AP components were determined by SDS-PAGE densitometry and quantification of fluorescent emission using both Coomassie R250 and Sypro Ruby dyes 23,24 (Fig. 1b). Pul C:E:L:M:N relative mean ratios standardised around PulE were observed as 2.13:1.00:0.97:0.99:0.95, which is consistent with an overall relative ratio of 2:1:1:1:1 for these components. The relative ratios of PulE and PulL were tightly correlated (sigma = 0.07), which is important as it reflects their known 1:1 interaction 17,18 and acts as an internal control for the gel densitometry. PulC stoichiometry was at least twice that of the other AP components with a relatively broad variance (sigma = 0.19). The 15-fold copy numbers of PulD or PulS were not used as a reference to determine an overall copy number for the AP components within the Pul CDELMNS complex. PulD remained partially multimerized despite phenol treatment so that it failed to consistently enter and migrate through the gel fully. PulS quantity was significantly enriched as a consequence of the PulD secretin decoupling from the AP in the absence of glutaraldehyde stabilisation during purification.
PulD secretin structure determination. Focused refinement of the OMC yielded a reconstruction with an overall resolution of 4.3 Å ( Fig. 2 and Supplementary Fig. 1). All domains of PulD were resolved with sufficient map quality ( Supplementary Fig. 2a) to build a complete model of the monomer and the secretin, excluding the amino acids (aa) in loops 288-303, 462-470 and 632-637 (Figs 3 and 4a, and Supplementary Fig. 3). The PulD fold is similar to partial E. coli K12 and H10407 GspD models (RMSD Cα = 3.2 Å and 3.3 Å) where the secretin core and N3 domain have been described, along with homology modelled N2 and N1 domains 7,10 (Fig. 4b). The entire secretin is 20 nm long with an external diameter of 15 nm at the base (Figs 2b and 3). It includes an occluding central gate and N3 domain constriction sites within the secretin channel. It lacks a V. cholerae cap gate 10 . The N1, N2 and N3 domains pack tightly (Fig. 5) with a diagonal offset of 36°. N0 is positioned almost directly below the N1 domain and does not maintain the diagonal offset. The N0 fold is similar to that described in multiple crystal structures 22,25,26 with a core of two helices flanked on each side by β-sheets. However, its position relative to the N1 domain within the secretin is significantly different to these crystal structures where crystal contacts appear to have dominated domain arrangement ( Supplementary Fig. 2b). The N0 and N1 domains are connected by loop 7, which constitutes a substantial 26 aa linker. The Nterminus of loop 7 forms a wedge that packs between neighbouring N1 domains. Its C-terminus partially envelops the proximal N1 domain whilst making additional secondary contacts with N0 domain helix 2 (Figs 3a and 5d). The N0 domains form a tightly packed ring with alternating stacked β-sheets sandwiched between helices 2 and 4 ( Fig. 5d). Failure to stabilise the N0 domain and to promote formation of the loop 7 wedge likely accounts for the previously reported N0, N1 and N2 domain flexibility in other systems [7][8][9][10]27 . Overall, a single PulD monomer has a radial twist around the secretin long axis of 130° (  Fig. 3c).
PulC HR domain binds at the secretin base. Hanging beneath the N0 domains in the OMC map, additional globular densities at 7-8 Å resolution protrude from the secretin base ( Fig. 2 and Supplementary Fig. 1b). Focused refinements 28 failed to markedly improve resolution. These densities were predicted to be the PulC HR domain given its known interaction with the PulD N0 domain in homologous systems 22,29 . Using the GspC HR domain and GspD N0 domain crystal structure 22 as a reference, a homology model of PulC HR domain was fitted as a rigid body ( Supplementary Fig. 2c). The model closely follows the surface envelope of the map in this region, with a pair of triple β-sheets opposed around a central cavity (Fig. 2c). Based on the quality of this fit, the PulC HR domain was assigned to each globular protrusion. Importantly, each PulC HR domain binds to only a single PulD N0 domain and has no contact with neighbouring PulC HR domains. Overall, the binding of the PulC HR domain to the PulD N0 domain appears to be important for the correct positioning of the N0 domain within the secretin and the subsequent stabilisation of the N1 and N2 domains. Additional stabilisation is derived from a plug that occludes the lumen of the secretin at the level of the N0-N1 domains (Fig. 2b). The plug was observed in both the 2D class averages (Fig. 1b) and the 3D reconstruction. The high-resolution plug ultrastructure was not resolved due to a likely symmetry mismatch with the C15 averaged OMC. Attempts to resolve the plug structure through refinement using lower symmetries yielded reconstructions of low quality and resolution. Further studies will be required to resolve the source of the plug although the PulC PDZ domain is a speculative candidate given the position of the PulC HR domain at the base of the secretin.
The PulS pilotin decorates the secretin core. Decorating the outside of the secretin core proximal to the PulD S-domain in the map, globular densities were observed in a position consistent with the pilotin AspS relative to GspD in enterotoxigenic E. coli (ETEC) 7 (Fig. 2). For these densities, map resolution was limited to~7 Å ( Supplementary Fig. 1) and focused refinements 28 did not markedly improve resolution. A homology model of the PulS   No additional contacts were observed in contrast to AspS-GspD where the secretin core helix α11 forms extensive secondary contact with the pilotin 7 . The lack of equivalent secondary contacts between PulS and PulD likely accounts for the apparent flexibility between these proteins and may be a distinguishing feature between the structurally discrete Klebsiella-type and Vibrio-type pilotins.
PulC links the OMC to the inner membrane AP. Whilst the PulC HR domain binds to the base of the secretin, its N-terminus is located within the IM AP 11 so that PulC is predicted to span the periplasm and link the AP and OMC. To verify the presence and positioning of the PulC N-terminus within the AP, a hexahistidine tag was inserted after aa 61 where PulC was predicted to exit the IM and enter the periplasm. Ni-NTA gold labelling showed beads localise exclusively to the AP and not the OMC ( Fig. 6a and Supplementary Fig. 4). Given the PulC HR domains bind to the base of the secretin, PulC therefore spans the periplasmic gap between the OMC and AP (Fig. 1c).
PulE, PulL and PulM form a flexible hexameric hub. Cryo-EM 2D class averages of the Pul CDELMNS complex revealed the ultrastructure of the AP positioned beneath the OMC. A 20-22 nm outer ring is coupled to a 10-12 nm inner ring by six radial linkers (Fig. 6b). Focused alignments of the AP where the OMC was masked out show the outer ring to be comprised of weakly associating non-contiguous globular densities ( Fig. 6c and Supplementary Fig. 5a). This concentric ring structure is highly flexible and represents the preferred single orientation of the AP so that 3D structure determination was impeded. The addition of  Fig. 5b). The resultant Pul ELM complex was analysed by cryo and negative stain EM on continuous carbon film (Fig. 6d, e and Supplementary  Fig. 5b, c). Under vitreous conditions, Pul ELM yielded a preferred orientation concentric ring structure that was similar to the AP within the Pul CDELMNS complex, with equivalent dimensions and overall C6 symmetry. By negative stain, the same ultrastructure was observed although the sample was compacted so that the outer and inner rings have dimensions of 16-18 and 8-9 nm, respectively. Compaction was likely a consequence of sample flexibility and drying effects during the negative stain procedure. PulE alone, and PulE in complex with the cytoplasmic domain of PulL (aa 1-235) or full-length PulL were also purified from the membrane fraction. However, EM analysis yielded heterogeneous and flexible particles that did not further resolve AP ultrastructure under conditions tested.

Discussion
Negative stain and cryo-EM studies of the Pul CDELMNS complex revealed the ultrastructure of the AP positioned beneath the OMC to be a highly flexible hexameric hub comprising two concentric rings. In vitreous ice, the inner ring had a diameter of 11 nm and overall C6 symmetry. The outer ring constituted non-contiguous globular densities with~22 nm diameter. Similar ultrastructure was observed with purified Pul ELM complex meaning just these components alone are sufficient to generate the bulk of the hexameric hub structure. In both systems, the single preferred orientation captured was considered to represent an end view projection of the AP. In the T2SS, the V. cholerae PulE homologue GspE forms a quasi C6 ring with diameter N2-N3 domain interface H8 S12 S12 H8 H6 S9 H5 S10 S11 H6 H5 H5 H5 H S10 S11 S9 S13 S14  14 . The related T4P system is energised by the ATPases PilB and PilT, which are known to operate as highly dynamic hexameric rings. PilB has dimensions of 13.5 and 9 nm when in an elliptical C2 conformation 15 whilst the PilT ring has a diameter of 11.5 nm 33 . For PilB and GspE these dimensions are specific to the N2E and CTE ATPase domains and do not include their N-terminal N1E domains. In situ reconstruction of the T4P system in Myxococcus xanthus indicates the position of PilB and PilT to be located centrally at the base of the AP 34 . Collectively, this data is consistent with a model for the Pul CDELMNS AP and Pul ELM complex in which the observed inner ring constitutes a PulE hexamer comprising the N2E and CTE domains similar to that observed in V. cholerae GspE 14 (Fig. 6f).
The V. cholerae GspE N1E domain forms a direct 1:1 stoichiometric complex with the cytoplasmic domain of GspL 17,18 . The equivalent domains in K. pneumoniae also form a 1:1 stoichiometric complex termed here Pul E-N1E/Lcyto ( Supplementary   Fig. 5d). In the T4P system, the PulL cyto homologue PilM forms a ring around the central PilB/PilT hexamer. This architecture supports a model for the Pul CDELMNS AP and Pul ELM complex where the bulk of the hexameric hub outer ring constitutes Pul E-N1E/Lcyto complexes (Fig. 6f). PulM, the periplasmic domain of PulL (termed PulL peri ), and PulN and PulC when present, may also contribute to the observed outer ring densities if ordered. Certainly PulL peri , PulM and PulN do not form a periplasmic ring here as might be expected based on the T4P system 34 . The retention of membrane or inclusion of other T2SS components such as the pseudo-pilus may be required for the stabilisation and visualisation of any such ring.
Within the Pul CDELMNS complex, the mean relative ratio of PulC: PulE: PulL:PulM:PulN was measured to be 2. 13  where independent or weakly associated sub-complexes containing two copies of PulC and a single copy of PulM, PulN and Pul E-N1E in complex with PulL decorate each subunit of the central PulE hexamer (Fig. 6g). This model does not preclude the possible formation of a Pul E-N1E/Lcyto ring positioned more centrally directly above the PulE N2E/CTE hexamer in vivo. Such an arrangement would be similar to that observed for the N1 domains of Thermus thermophilus PilF 35 and may promote the close contact and encircling of the putative pilin spooling protein PulF 13,19 .
PulC spans the periplasm to bind the OMC with the PulC Nterminus located within the IM and the HR domain bound to the PulD N0 domain at the secretin base. Based on six PulE subunits within the AP, stoichiometry measurements support a model where the PulC copy number within the AP is 12. PulC therefore constitutes a cage that spans the periplasm (Fig. 6g) and is reminiscent of the virB10 N-terminus which spans the periplasm in the type IV secretion system 36 . Substrate may have the potential to gain access to the secretin lumen via PulL and PulM 37 in positions where PulC is absent. The OMC structure shows that each PulC HR domain binds a single PulD N0 domain independently with no lateral contacts between neighbouring HR domains. This is important as it provides a natural mechanism for overcoming symmetry mismatch between the 12 PulC subunits located within the hexameric AP and the 15-fold symmetric OMC. In this model on average three secretin N0 domains remain unoccupied without bound PulC HR domain. Whether full secretin N0 domain binding occupancy occurs with up to 15 bound PulC HR domains cannot be entirely excluded as each of the N0 domains has at least the potential to bind a PulC HR domain. T2SS assembly and secretion may ultimately occur in a dynamic equilibrium with a range of N0 domain binding occupancies.
Multiple T2SS secretin structures have been solved including Klebsiella-types from E. coli K12 10 and Pseudomonas aeruginosa 9 , and Vibrio-types from ETEC 7 , enteropathogenic E. coli 8 and V. cholerae 10 . Despite high quality 3-3.5 Å resolutions achieved for the secretin core and N3 domains, all the structures have poorly resolved densities towards the PulD/GspD N-termini with homology modelling required for the N1 domains and entirely absent N0 domains. Docking of N0 and N1 domain co-crystal structures 22,[25][26][27] into the secretin maps failed to resolve the possible position and orientation of the N0 domain due to significant steric clashes. Consequently, the PulC HR domain orientation relative to the N0 domain remained unclear and it was modelled decorating the inside of the secretin base 22 . To account for the high flexibility of the secretin N0 and N1 domains and the symmetry mismatch between the sixfold symmetric AP and 15-fold OMC, a pseudo-sixfold arrangement comprising a hexamer of dimers had been proposed for the N0-N2 domains 9 . In such a model the N0-N2 domains of three secretin subunits would be displaced in a metastable arrangement. The OMC structure derived from the Pul CDELMNS complex resolves now how the PulD N0 domains are organised within the secretin relative to the N1 domain ring. The N-terminus of loop 7 wedges and stabilises neighbouring N1 domains whilst the N0 domains form an intimate and stable C15 ring. At least when PulC HR domain is bound, the N0 ring does not immediately support a model where the secretin incorporates multiple symmetries along its length. Instead the secretin forms a well-ordered barrel, which combined with the PulC HR domains bound at the base protrudes from the outer membrane~20 nm into the periplasm.
In this study, the T2SS outer membrane components PulD and PulS are co-purified with the core components of the IM AP including PulC, PulE, PulL, PulM and PulN. Overall, our results reveal a glimpse at the core architecture of an assembled T2SS (Fig. 6g) and show it to be different to other known secretion systems ( Supplementary Fig. 6). The T2SS AP does not constitute a stack of conjoined rings that seal directly to the secretin base forming an enclosed central channel as observed in the T3SS 38 . Instead, the T2SS has a highly dynamic and modular architecture where the OMC and IM AP have weakly associating and limited inter-connection via a PulC cage. This observation is consistent with the requirement for substrate to gain access to the secretin lumen via the periplasm. Whilst the resolution of the hexameric hub observed in the Pul CDELMNS and Pul ELM complexes is limited it shows the T2SS to have a sixfold arrangement for assembled AP components. The observed AP architecture is consistent with that described in the T4P system 34 providing evidence that the IM ultrastructure is broadly conserved between these different but related secretion systems. Although the connection between inner and OMCs was expected to be transient and dynamic 3 , we show that it is possible to isolate an almost fully assembled T2SS in vitro. Our work suggests that the purification of an entire T2SS with the pseudo-pilus, PulF, and ultimately substrate incorporated is in reach. These additional components may be necessary to help stabilise the IM AP so that ultimately full structural dissection and mechanistic understanding for the T2SS may be achieved.

Methods
Cloning, protein expression and purification. All clones were generated using a modified version of the Gibson isothermal DNA assembly protocol 39 where the Taq DNA ligase was omitted from the one-step isothermal DNA assembly. To obtain the Pul CDELMNS complex, the Klebsiella pneumoniae T2SS operon encoding genes from pulC to pulO was cloned into pASK3c vector (IBA-GO) using primer pair 1 and 2 (for all primers used see Supplementary Table 2). A StrepII tag was added at the C-terminus of PulE using primer pair 3. The pulS gene was cloned into pCDF-duet vector using primer pair 4 and 5. A C-terminal Flag tag was inserted using primer pair 6. Uniprot codes for individual genes are as follows: pulC KPHS_08880, pulD KPHS_08870, pulE KPHS_08860, pulF KPHS_08850, pulG KPHS_08840, pulH KPHS_08830, pulI KPHS_08820, pulJ KPHS_08810, pulK KPHS_08800, pulL KPHS_08790, pulM KPHS_08780, pulN KPHS_08770, pulO KPHS_08760 and pulS KPHS_08940. The clones were co-transformed into E. coli C43 (DE3) electro-competent cells (Lucigen) modified here to incorporate a pspA gene knockout using a Lambda Red recombinase strategy 40 (PspA is a common contaminant induced by PulD over-expression). Cells were grown on selective LBagarose plates with chloramphenicol (30 µg/ml) and spectinomycin (50 µg/ml). 2xYT media was inoculated and cells grown at 37°C until induction at OD 600 = 0.5-0.6 with anhydrotetracycline (AHT, 0.2 mg/L) and isopropyl β-D-1thiogalactopyranoside (IPTG, 0.24 g/L). Cells were grown for~15 h at 19°C and processed immediately. Pellets were re-suspended in ice-cold buffer 50 mM Tris-HCl pH 7.5, 5 mM EDTA, treated with DNase I, lysozyme and sonicated on ice. The lysate was clarified by centrifugation at 16,000 × g for 20 min. The membrane fraction was collected by centrifugation at 142,000 × g for 45 min. Membranes were mechanically homogenised and solubilized in 50 mM Hepes-NaOH pH 7.5, 150 mM NaCl, 1 % w/v DDM (Anatrace) and 5 mM EDTA at room temperature for 30-40 min. The suspension was clarified by centrifugation at 132,000 × g for 15 min. The supernatant was loaded onto a StrepTrap HP column (GE Healthcare) and washed with 50 mM Hepes-NaOH pH 7.5, 150 mM NaCl, 0.06 % w/v DDM and 5 mM EDTA (Buffer W) at 4°C. All prior buffers were supplemented with EDTA-free complete protease inhibitor tablets (Roche). The protein sample was eluted in Buffer W supplemented with 2.5 mM desthiobiotin (IBA) but with protease inhibitors removed. Peak fractions were pooled, 0.05% glutaraldehyde (Sigma-EM grade) added and incubated on ice for 10 min before quenching with 100 mM Tris-HCl pH 7.5. The sample was batch incubated with Flag resin (Sigma) for 1 h. Flag resin was washed with Buffer W and then eluted with the same buffer supplemented with 3xFlag peptide. Peak fractions were collected and used immediately. LC-MS/MS confirmed the identity of bands identified by SDS-PAGE.
To obtain purified Pul ELMN complex, the full-length pulE gene was cloned into pASK3c vector to include a C-terminal StrepII tag (primer pair 7 and 8). The pulL, pulM and pulN region of the T2SS operon was cloned into pCDF-duet vector with a PulN C-terminal Flag tag (primer pair 9 and 10). These vectors were cotransformed and the same initial purification strategy was then followed as for the Pul CDELMNS complex with the exception that no glutaraldehyde or Tris quenching buffer were added subsequent to elution from the Strep column. Protease inhibitor tablets were included in all buffers. After elution from the Flag column, due to sample heterogeneity as judged by negative stain EM, GraFix 32 was undertaken. Using Beckman Ultra-Clear 4.2 ml 11 × 60 mm ultracentrifugation tubes 2.1 ml of 50 mM Hepes-NaOH, 150 mM NaCl, 0.06% w/v DDM, 30% v/v glycerol, 5 mM EDTA and 0.1% glutaraldehyde was loaded under 2.1 ml of the equivalent but with 10% v/v glycerol. A continuous gradient was made using a BioComp Gradient Master cycle set for 66 s at 83°tilt and 22 rpm. An equivalent gradient was also made but omitting glutaraldehyde so that the sample could be analysed by SDS-PAGE after centrifugation. The Pul ELMN sample was split in half and loaded onto the gradients ± glutaraldehyde. Samples were spun at 71,000 g for 16 h at 4°C using a Beckman Sw 60 Ti swing rotor. 150 µl fractions were collected manually and analysed. Note that PulN was generally lost from the Pul ELMN complex during GraFix yielding just Pul ELM .
To obtain purified Pul E-N1E/Lcyto complex, the N1E domain comprising aa 1-108 from pulE was cloned into pASK3c vector to include a C-terminal StrepII tag (primer pair 8 and 11). For Pul Lcyto , aa 1-235 relating to the cytoplasmic domain of PulL were cloned into pCDF-duet vector with a C-terminal Flag tag (primer pair 10 and 12). The same two-step affinity chromatography purification strategy was then followed as for PulELMN but excluding Grafix. Note that the Pul E-N1E/Lcyto complex readily purifies from the membrane fraction despite the removal of the PulL trans-membrane domain.
Gold labelling. This was performed on the Pul CDELMNS complex modified to include a hexahistidine tag within PulC after aa 61 (primer pair 13). Purification was the same as for the Pul CDELMNS complex but without the addition of fixative. Protease inhibitor tablets were included in all buffers. Ni-NTA-Nanogold (Nanoprobes) pre-washed in Buffer W was added to 10 µl of the protein sample and incubated for 30 min at 4°C. A homemade continuous carbon grid was deposited on the 10 µl sample for 3 min, blotted and washed two times in Buffer W supplemented with 10 mM imidazole, then two times in Buffer W before being stained with three drops of 2% uranyl acetate.
Stoichiometry determination. Pul CDELMNS complex was purified as described above with the exception that no glutaraldehyde or Tris quenching buffer were added subsequent to elution from the Strep column. Protease inhibitor tablets were included in all buffers. Six independent purifications were extracted using phenol to disrupt PulD multimerization 41 . Before extraction samples were divided and phenol treated in duplicate. The samples were precipitated with an equal volume of phenol and immediately vortexed. Four volumes of ice-cold acetone were then added and vortexed again. The mixture was kept overnight at −20°C. The precipitate was pelleted in a bench-top centrifuge at 14,000 × g for 30 min at 4°C, washed once with ice-cold acetone, dried under vacuum and re-suspended in loading buffer supplemented with 10 mM DTT. Both duplicates for each independent purification were analysed by SDS-PAGE densitometry and quantification of fluorescent emission using both Coomassie Blue R250 and Sypro Ruby dyes. The Coomassie R250 procedure was as described 24 but with extended stain and destaining times. Briefly, gels were rinsed with water and then stained with 0.01% Coomassie Blue R250 in 50% methanol and 10% acetic acid for 20 min. Gels were rinsed with 40% methanol and 7% acetic acid and then destained twice with the same solution for 20 min each time. Finally, gels were soaked 2-3 times in water until fully destained, usually 1-2 h. Gels were imaged on a Licor Odyssey Fc at 680 nm. Sypro Ruby gel staining was undertaken following the manufacturer's instructions (Bio-Rad). Gels were imaged on a Bio-Rad Chemidoc XRS imaging system at 302 nm. Image Lab software (Bio-Rad) and ImageJ were used to calculate band intensities. Uncropped gels are provided as a Source Data file.
To determine the stoichiometry of the Pul E-N1E/Lcyto complex, two independent purifications were undertaken. Samples were run in duplicate on gels and then stained using the Coomassie Blue R250 procedure as outlined above. Gels were imaged on a Licor Odyssey Fc at 680 nm.
Electron microscopy sample preparation and data collection. For OMC structure determination, 4 µl of purified Pul CDELMNS complex solution was incubated for 30 s on glow discharged homemade continuous thin carbon grids before vitrification in liquid ethane using a Vitrobot Mark IV (FEI). Data were collected at 300 kV on a Titan Krios (M02 beamline at eBIC Diamond, UK) equipped with a Gatan Quantum K2 Summit detector. Images were acquired at a magnification of ×28,090 yielding 1.78 Å/pixel using EPU software. Images were dose-weighted over 40 frames with 12 s exposures. Total dose was~50 e/Å 2 .
All other cryo and negative stain datasets were collected in-house at 200 kV on a Tecnai F20 microscope equipped with Falcon II direct electron detector. For cryo data, Pul CDELMNS complex (Fig. 6b, c and Supplementary Fig. 5a) was vitrified as described above. The Pul ELM GraFix treated sample required glycerol removal so that 4 µl of sample was loaded onto the glow discharged continuous carbon EM grid and after 1-min incubation was washed four times in Buffer W before plunge freezing. Images were acquired at a magnification of 90,909 yielding 1.65 Å/pixel using EPU software. Images were collected over 54 frames with 3 s exposures. Total dose was~50 e/Å 2 . For negative stain data, 4 µl of Pul CDELMNS complex was loaded onto the glow discharged continuous carbon EM grid, after 40 s the grid was washed with 3 drops of distilled water and stained with 3 drops of 2 % uranyl acetate. Pul ELM GraFix treated sample was similarly incubated on EM grids, washed iteratively with 4 drops of 15 µl Buffer W and then negatively stained as above. Images were acquired at a magnification of 90,909 using EPU software. Single frames were collected with 1 s exposure and total dose~15 e/Å 2 .
Image processing. For OMC structure determination, individual movie image frames were aligned with MotionCor2 42 and the contrast transfer function estimated using Gctf 1.06 43 . Low quality images were discarded and 3427 micrographs used for subsequent reconstruction in Relion 2.1 44 . Initial manual particle picking was focused on the OMC/secretin region of the Pul CDELMNS complex. For particle extraction a box and mask diameter were chosen so that contributions from the IM AP were excluded. In this way, low-resolution 2D class averages of just the OMC were used as a template for autopicking. OMC side views only were prevalent in this dataset, which provided a sufficiently even equatorial band distribution for a reliable reconstruction 45 . Low quality particles were removed by four rounds of 2D classification resulting in a stack of 36,240 particles. A single round of 3D classification was undertaken generating ten classes. The V. cholerae GspD reconstruction EMDB-1763 was used as an initial model filtered to 40 Å 46 . C15 symmetry was applied based on top views of the OMC (obtained in an alternative Pul CDELMNS complex purification) and an unambiguous 15 peaks observed from the rotation auto-correlation function calculation (Fig. 1c). A single class containing 7284 particles was used for the final refinement, which attained 4.4 Å resolution. Post-processing yielded 4.3 Å resolution with an auto-estimated B-factor 47 of −142 Å 2 applied to sharpen the final 3D map for model building. A locally sharpened map was also generated using LocScale 48 once initial models were built. Further particle polishing and 3D refinement did not yield a marked increase in resolution. Resolutions reported are based on gold standard Fourier shell correlations (FSC) = 0.143. Statistics for data collection and 3D refinement are included in Supplementary Table 1. Local 3D refinements with various particle subtraction strategies focusing on PulS or PulC HR domain did not markedly improve resolution.
To generate all other 2D class averages both in cryo and negative stain conditions as for Pul CDELMNS and Pul ELM complexes, the following protocol was followed. Working initially within Relion 2.0 or 2.1, Gctf 1.06 was used for estimating the CTF. Negative stain micrographs were phase flipped. Low quality micrographs were discarded. Initial 2D class averages were generated from a manually picked stack to yield templates for autopicking. A total of 3-4 rounds of 2D classification were then undertaken to remove low quality particles. Using the 'relion_stack_create' a cleaned image stack was generated for further processing. For cryo-EM images the stack was created from phase flipped particles. In Imagic 49 , particles were normalised, band pass filtered, centred and subjected to reference-free MSA and classified. The best classes, typically judged by lowest variance, were used as references for multi-reference alignment (MRA) in Spider 50 followed by MSA and classification in Imagic. This cycle of MRA and MSA was typically iterated a further 2-3 times. For negative stain Pul ELM data, 2746 micrographs yielded 81,727 extracted particles and a cleaned stack of 63,014 particles. For cryo Pul ELM data, 467 micrographs yielded 4937 hand-picked particles, and all particles were used for subsequent MSA and MRA. For Pul CDELMNS , 3059 micrographs yielded 89,381 extracted particles and a cleaned stack of 66,855 particles for subsequent MSA and MRA (Fig. 6b). From this same dataset, 1148 micrographs were then used to hand-pick 4020 AP-focused particles ( Fig. 6c and Supplementary Fig. 5a). Rotation auto-correlation functions were calculated using Imagic.
Model building. A PulD homology model was generated with I-Tasser 51 using the E. coli GspD PDB 5WQ7 as a template. This yielded a starting model for the secretin core, and N3, N2 and N1 domains. E. coli GspD shares 57% sequence identity with PulD. A homology model for the N0 domain was generated using Swissmodel 52 with the relevant part of 3OSS as a template (51% sequence identity). Homology models were rigid body fitted into the map using Chimera 53 Fit in Map function. Using these models as a starting guide and the side chain detail from bulky residues to confirm sequence register, Coot 54 was used to manually build a complete model for PulD aa 27-652 excluding aa 288-303, 462-470 and 632-637. The model was further refined using real-space refinement in Phenix 55 with secondary structure, geometry and NCS restraints applied. For the low-resolution regions specific to the PulC HR domain, a homology model based on ETEC GspC HR domain PDB 3OSS (27% sequence identity) was generated using Swissmodel. For fitting this homology model, the PDB 3OSS which includes the ETEC GspD N0 domain was first superimposed onto the PulD N0 domain. The PulC HR domain homology model was then superimposed onto the ETEC GspC HR domain from PDB 3OSS resulting in a near perfect fit within the map. The Chimera Fit in Map function was then applied to PulC HR domain resulting in a minor shift so that the PulC HR-PulD N0 domain complex has a RMSD Cα = 1.4 Å when aligned to PDB 3OSS (Supplementary Fig. 2c). For the low-resolution regions specific to the PulD S-domain C-terminus (aa 638-652) in complex with the PulS pilotin, a homology model was generated using Swissmodel based on the equivalent structure from Dickeya dadantii PDB 4K0U (>50% sequence identity for both chains). The map was low pass filtered to 8 Å and the homology model initially fitted manually so that the PulS lipidated N-terminus orientated towards the membrane. The Chimera Fit in Map function was then used for final positioning. Cross-validations were carried out as previously described 10,56 using the auto-estimated B-factor sharpened map. Briefly, the PulD secretin model was displaced randomly by 0.2 Å and then refined against a map reconstructed from one of the independent data halves (Half map 1). FSC curves were then calculated using the resulting model and Half map 1 (FSC work ). FSC curves were also calculated between this same model and another reconstruction generated from the other independent data half (Half Map 2 and FSC free ). The similarity between FSC work and FSC free curves indicates an absence of overfitting within the PulD secretin model (Supplementary Fig. 1d). The final models were assessed using Molprobity 57 and statistics outlined in Supplementary Table 1.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.