Continuous, Topologically Guided Protein Crystallization Controls Bacterial Surface Layer Self-Assembly

Bacteria assemble the cell envelope using localized enzymes to account for growth and division of a topologically complicated surface1–3. However, a regulatory pathway has not been identified for assembly and maintenance of the surface layer (S-layer), a 2D crystalline protein coat surrounding the curved 3D surface of a variety of bacteria4,5. By specifically labeling, imaging, and tracking native and purified RsaA, the S-layer protein (SLP) from C. crescentus, we show that protein self-assembly alone is sufficient to assemble and maintain the S-layer in vivo. By monitoring the location of newly produced S-layer on the surface of living bacteria, we find that S-layer assembly occurs independently of the site of RsaA secretion and that localized production of new cell wall surface area alone is insufficient to explain S-layer assembly patterns. When the cell surface is devoid of a pre-existing S-layer, the location of S-layer assembly depends on the nucleation characteristics of SLP crystals, which grow by capturing RsaA molecules freely diffusing on the outer bacterial surface. Based on these observations, we propose a model of S-layer assembly whereby RsaA monomers are secreted randomly and diffuse on the lipopolysaccharide (LPS) outer membrane until incorporated into growing 2D S-layer crystals. The complicated topology of the cell surface enables formation of defects, gaps, and grain boundaries within the S-layer lattice, thereby guiding the location of S-layer assembly without enzymatic assistance. This unsupervised mechanism poses unique challenges and advantages for designing treatments targeting cell surface structures or utilizing S-layers as self-assembling macromolecular nanomaterials. As an evolutionary driver, 2D protein self-assembly rationalizes the exceptional S-layer subunit sequence and species diversity6.

imaging, and tracking native and purified RsaA, the S-layer protein (SLP) from C. crescentus, 24 we show that protein self-assembly alone is sufficient to assemble and maintain the S-layer in 25 vivo. By monitoring the location of newly produced S-layer on the surface of living bacteria, we 26 find that S-layer assembly occurs independently of the site of RsaA secretion and that localized 27 production of new cell wall surface area alone is insufficient to explain S-layer assembly 28 patterns. When the cell surface is devoid of a pre-existing S-layer, the location of S-layer 29 assembly depends on the nucleation characteristics of SLP crystals, which grow by capturing 30 RsaA molecules freely diffusing on the outer bacterial surface. Based on these observations, we 31 propose a model of S-layer assembly whereby RsaA monomers are secreted randomly and 32 diffuse on the lipopolysaccharide (LPS) outer membrane until incorporated into growing 2D S-33 layer crystals. The complicated topology of the cell surface enables formation of defects, gaps, 34 and grain boundaries within the S-layer lattice, thereby guiding the location of S-layer assembly 35 without enzymatic assistance. This unsupervised mechanism poses unique challenges and 36 advantages for designing treatments targeting cell surface structures or utilizing S-layers as self-37 assembling macromolecular nanomaterials. As an evolutionary driver, 2D protein self-assembly 38 rationalizes the exceptional S-layer subunit sequence and species diversity 6 . 39 40

Main Text 41
Assembling a macromolecular structure on the micron scale often requires input energy 42 and spatial coordination by enzymes and other cellular processes 1-3 . S-layers, however, exist 43 outside the cell envelope and lack access to many cytosolic components, including ATP 4-6 . How 44 do microbes continuously assemble a crystalline macromolecular structure on a highly curved 45 cell surface undergoing drastic changes during normal cell growth? To answer this question, we 46 71 Figure 1. Localized S-layer assembly occurs primarily at the poles and division plane, independent of secretion. a) Schematic of the C. crescentus cell envelope with the S-layer crystal lattice (red/orange) anchored to the outer membrane (OM) via an 18 nm thick LPS layer (yellow), peptidoglycan (PG), and inner membrane (IM). b) Model of the RsaA S-layer structure (EMD-3604) applied to the surface of a 100 nm diameter cylinder. c) STED fluorescence microscopy image of a CysRsaA cell labeled with STAR RED. d) 3D mesh representation of predivisional C. crescentus topology with absolute value of Gaussian curvature projected onto the surface (gray shading). e) Schematic of 2-color pulse-chase experiment to image the sites of S-layer assembly where fluorophores DY-480XL and STAR RED were added to CysRsaA cells 30 min apart, washing in between. f) Confocal (top) and STED images (bottom) show localized assembly of natively produced S-layer at the cell poles (arrows), division plane (triangles), and crack-like features on the cell body (asterisks). g) Schematic of pulse-chase experiment where saturating quantities of purified, fluorescently labeled CysRsaA are added to the media of ΔRsaA cells. h) Endpoint STED images show localization of newly incorporated CysRsaA protein at the cell poles (arrows), division plane (triangle), and other locations. i) Overview of analysis method. Top: Cells were horizontally aligned with the stalk on the left side and the cell boundary was identified (nongray region). Middle: S-layer was identified in both channels via intensity thresholding, and the binary image is projected along the normalized cell axis. Bottom: The projected image created two binary profiles of the S-layer for the upper and lower half of the cell in each color channel (red and green lines). j) S-layers produced natively (red, representative cell in red boxed image from Fig 1f) or by exogenous protein addition (blue, representative cell in blue boxed image from Fig 1h) show preferential incorporation at the cell poles and division plane, while control cells (black, representative cell in black boxed image from Fig 1c) show uniform labeling. Shaded regions show 95% confidence intervals for n=81, 75, and 57 cells for native, exogenous, and control cells, respectively. Scale bars = 1 µm except where noted.
to the N-terminus of RsaA, which includes a single cysteine residue 13 (Supp Fig. 1, Methods). 72 CysRsaA-producing cells divide normally and create a stable S-layer at a rate similar to that of 73 WT cells, and protein can be extracted and purified as a monomeric species (Supp Fig. 1). 74 Covalently modifying CysRsaA with membrane-impermeable fluorophores via 75 maleimide chemistry is a robust, highly specific labeling scheme for RsaA and enables live-cell We sought to determine the factors that contribute to S-layer assembly by examining how 86 RsaA secretion, cell wall growth, and the presence of an existing S-layer structure affect the 87 location of S-layer assembly in living cells. To determine whether RsaA secretion is necessary to 88 localize S-layer assembly, we added purified, fluorescently labeled CysRsaA to cultures of cells 89 with a genomic deletion of the rsaA gene (∆RsaA). Saturating quantities (600 nM) of DY-480XL 90 labeled CysRsaA were introduced as a pulse, followed by a wash, 30 min of cell growth, and 91 another saturating quantity of STAR RED labeled CysRsaA as chase (Figure 1g). This 92 experiment revealed that exogenously added CysRsaA preferentially incorporates at the poles 93 and division plane (Figure 1h), in addition to increased incorporation along the cell body ( Figure  94 1j, blue) compared to native S-layer assembly (Figure 1j, red). Previous immuno-gold staining 95 and electron microscopy of RsaF, the outermost component of the RsaA secretion apparatus, 96 indicated diffuse localization 23,26 . Therefore, localized S-layer assembly occurs independent of 97 RsaA secretion. 98 To investigate the effect of changing cell wall surface area on S-layer assembly, we 99 manipulated peptidoglycan insertion using the specific MreB perturbing compound, A22 27 . 100 Under normal conditions, the cell wall grows mainly at the polar-proximal base of the stalk and 101 around the middle of the cell body, which eventually becomes the division plane 2 . These regions 102 alone are insufficient to explain localized S-layer assembly due to the additional appearance of 103 signal at the pole opposite the stalk (Figure 1j). At 2 µg/mL, A22 delocalizes MreB but still 104 allows some peptidoglycan addition along the cell body, leading to lemon-shaped cells that 105 divide slowly 27 (Supp Fig. 4). At 25 µg/mL, A22 fully inhibits peptidoglycan insertion and 106 surface area addition stops completely except at division planes where constriction began before 107 drug exposure 27 (Supp Fig. 4). Using a pulse-chase labeling scheme after drug treatment ( Figure  108 2a), we found that cells treated with 2 µg/mL A22 maintain bipolar localized S-layer assembly, S-layer assembly on the cell body appears to correlate with localized cell wall growth while 121 bipolar localization is not driven by PG insertion alone. 122 Physically removing the S-layer from a CysRsaA cell provides a clean surface with 123 which to observe the cell replacing its own S-layer, which we term de novo assembly. Calcium  In the absence of an existing S-layer structure, RsaA crystallizes on the cell surface with localization dependent on pre-existing crystal size. a) Coomassie-stained SDS-PAGE of protein samples extracted from the surface of CysRsaA cells grown in low calcium M2G (50 µM CaCl 2 , t<0 min) and then switched to 500 µM CaCl 2 (0 < t < 180 mins) show de novo accumulation of the Slayer. b) Schematic of de novo S-layer assembly experiment where CysRsaA cells are resuspended in M2G with 500 µM CaCl 2 and then their S-layer is labeled with STAR RED. c) A STED imaging time course of STAR RED labeled CysRsaA cells shows that de novo assembly of a new S-layer occurs at discrete patches that grow larger over time. d) Schematic of 2-color pulse-chase labeling between 30 and 60 minutes after calcium addition. e) STED imaging reveals growth of S-layer patches from their perimeter (red signal at the edges of green patch) and is confirmed by line profiles of the dashed line (inset). f) Schematic of experiment where low concentrations of STAR REDlabeled CysRsaA are incubated with ΔRsaA cells for 15 mins. g) STED images show exogenous CysRsaA nucleates small puncta of S-layer. h) Schematic of 2-color stepwise addition of 5 nM DY-480XL labeled CysRsaA followed by 10 nM STAR RED labeled CysRsaA. i) STED imaging shows that puncta on the cell surface formed by exogenous addition of purified CysRsaA grow from their perimeter, confirmed by line profiles of the dashed line (inset). j) Quantitation of S-layer assembly localization for native S-layer assembly (red), de novo native S-layer assembly (black), and S-layer assembly by exogenous addition of labeled CysRsaA (blue). Shaded regions show 95% confidence intervals for n=100, 50, and 65 cells for native, de novo, and exogenous protein addition, respectively. k) Quantitation of the number of S-layer puncta and average number of RsaA molecules per punctum on ΔRsaA cells upon addition of purified CysRsaA protein. A model of an RsaA crystal of about the measured size is shown (inset). Scale bars = 1 µm (panels c, e, g, i) If de novo S-layer assembly occurs through nucleation and 2D crystallization of RsaA, 151 then protein self-assembly could also be responsible for continuous S-layer growth. Since protein 152 crystallization is concentration-dependent, we predict that once secreted, RsaA monomers should 153 be able to diffuse while non-covalently anchored to the LPS outer membrane until incorporated 154 into a nucleating or growing S-layer crystal. Indeed, at 1 nM exogenous CysRsaA, cells exhibit 155 weak diffuse fluorescence suggestive of diffusing molecules (Supp Fig. 7). Therefore, we 156 employed SMT to dynamically track the location of individual CysRsaA monomers anchored to 157 the LPS outer membrane. ∆RsaA cells were first pre-treated with 2.5 nM Cy3-CysRsaA to form 158 sparse, immobile S-layer crystalline patches or "seeds" on the LPS outer membrane (Figure 4a Scale bar = 1 µm. c) Upper: The RMSD from a nearby seed of an example track shows binding of the molecule from ~16-43s. Binding is defined as RMSD<57.3 nm (red shaded region). Lower: Distance from the nearest nucleation seed (d NS ) for the same track shows the molecule binds ~200 nm from the center of a crystal patch. Green shaded region shows the 300 nm threshold used for determining crystal patch proximity. d) A quantitative Venn diagram displaying the total time (grey area), and the fractions of time molecules spend near a crystal patch (green area), bound (yellow area), and bound but not near a crystal patch (red area), shows that when molecules cease diffusing, they are almost always in close proximity to an existing crystal patch. e) MSD analysis from 3D SMT of AlexaFluor647 labeled CysRsaA molecules revealing a diffusion coefficient of D = 0.077 µm 2 /s. Shaded regions represent SEM. f) A generalizable model for S-layer assembly including secretion, diffusion, and incorporation at gaps within the S-layer lattice. g) A model for S-layer crystal boundaries/defects (white) mapped onto the C. crescentus surface (red), suggested by Gaussian curvature calculations (gray colorbar, right). Our results imply that continuous S-layer crystallization occurs at gaps, defects, and grain 185 boundaries within the S-layer structure caused by localized cell wall growth or the inherent 186 topology of the cell surface (Figure 4g). Increasing localized cell wall growth along the cell body 187 (treatment with 2 µg/mL A22) increases S-layer assembly at that location; however, preventing 188 cell elongation (treatment with 25 µg/mL A22) disrupts S-layer assembly at both poles, 189 indicating that an additional factor is coordinating this process (Figure 2c (Figure 1f, 2b,e). 204 In C. crescentus, RsaA deletion and the subsequent loss of an S-layer disrupts normal cell 205 growth, suggesting a connection between the S-layer structure and cellular fitness 30 . Given the 206 central role of RsaA crystallization in S-layer assembly and this connection to fitness, the 207 protein's ability to self-assemble may drive natural selection of the RsaA amino acid sequence. 208 Remarkably small but stable RsaA protein crystals consist of only ~50 molecules (Figure 3g,k), 209 suggesting that efficient nucleation at low concentrations may be another selectable trait 210 supporting protein self-assembly. SLPs are exceptionally diverse in sequence, varying widely in 211 size (40-200 kDa) and fold 4 . Functional convergence of diverse crystalline structures can be 212 rationalized by selection driven by protein self-assembly, which can occur independently of 213 overall fold and instead requires just a few key surface residues making symmetric, planar 214 crystal contacts 34,35 . Similarly, diverse SLPs in archaea prefer charged (acidic or basic) amino 215 acids to facilitate nutrient uptake through the nanoporous S-layer-a function that evolves 216 independently of protein fold 36 . 217 The mechanisms by which bacteria build, maintain, and evolve their S-layers are 218 important to human health and our ability to treat and respond to bacterial pathogens such as C. Three strains were used in this study and are available from the corresponding author 232 S.W. upon request. C. crescentus NA1000, which is referred to as wild type (WT) throughout the 233 text, was used as a control for fluorescent labeling, S-layer protein production, and drug 234 treatment response. An RsaA-negative strain of NA1000, referred to as ∆RsaA, was generated 235

S-layer Protein Purification 251
Purified RsaA in the absence of CaCl 2 was previously shown to partially unfold at 252 28°C 30 . Therefore, CysRsaA samples were kept cold (<4°C) at all times unless otherwise noted. 253 CysRsaA protein was purified similarly to previously reported methods 30,44 . CysRsaA-producing 254 C. crescentus cells were grown to early stationary phase at 30°C in PYE medium, shaking at 200 255 rpm. The culture was then pelleted by centrifugation and stored at -80°C. Approximately 1 g of 256 cell pellet was thawed on ice, re-suspended with 10 mL of ice cold 10 mM HEPES buffer pH 257 7.0, and centrifuged for 4 min at 18,000 rcf. This washing step was performed three times. The M2G once if another fluorophore was to be added next or twice if the next step was imaging. 283 Complete labeling was evidenced by highly spatially complementary fluorescent images in 284 pulse-chase labeled cells (Fig. 1b). 285 For in vitro experiments, purified CysRsaA was buffer exchanged into 50 mM HEPES 286 pH 7.0 and 150 mM NaCl using a 30 kDa MWCO centrifugal concentrator (Sartorius). 287 Overnight labeling of CysRsaA protein (>20 µM) was performed on ice with the addition of 1 288 mM TCEP and at least 5-fold stoichiometric excess of maleimide-derivatized STAR RED, DY-289 480XL, Cy3 (Lumiprobe), or AlexaFluor647 (ThermoFisher). The next day, three successive 290 1:30 dilutions were performed to remove unbound dye molecules using a 30 kDa MWCO 291 centrifugal concentrator (Sartorius) and buffer containing 50 mM Tris/HCl pH 8.0 and 150 mM 292 NaCl. Absorbance measurements at 280 nm and the known absorbance peak for each 293 fluorophore determined labeling efficiency, which varied from 50-90%. 294

Growth Curves 295
For growth curve analysis of WT and ΔRsaA C. crescentus strains, 10 µL of mid-log 296 phase cultures (OD 600nm =0.5) were added to 90 µL of M2G containing varying amounts of A22 297 (Cayman Chemical) or cephalexin (Frontier Scientific) in a sterile, black-walled, 96-well 298 transparent plate (Corning). While incubating the plate at 29°C and shaking at 600 rpm between 299 readings, OD 600nm was measured every 5 minutes for up to 23 hours using an Infinite M1000 300 microplate reader (Tecan). 301

Confocal and STED Microscopy 302
Images were acquired on a bespoke 2-color fast scanning STED microscope described 303 cell outline was defined. For conditions with a mostly complete S-layer (Fig 1d and 2c), first the 331 outer cell boundary was determined by thresholding a normalized sum of the two-color channels, 332 and then this outer boundary was eroded to form the 200 nm cell outline. For analysis including 333 conditions with an incomplete S-layer (Fig 3e) During the first ~10 seconds of acquisition, the intensity of the 641 nm laser is briefly increased 372 to a higher intensity to allow the fluorescent dyes to bleach down approximately to the single-373 molecule regime before reducing the intensity for optimal tracking. Spatial correlation between 374 the two cameras is determined by detecting Tetraspeck beads (Invitrogen), which appear in both 375 channels. 3D tracking was performed with a double-helix phase mask placed at the Fourier plane 376 in the detection side of our microscope. Imaging conditions are similar to our 2D single-particle 377 tracking experiments, except the 641nm laser was used for detecting labeled RsaA molecules 378 with an intensity of 885 W/cm 2 . All image acquisition was performed through software made by 379 Andor. Pixel size is 163 nm. All tracks shown (Figure 4b and Supp Fig. 9) are down-sampled 5X 380 for clarity. 381

3D Calibration with Fiducials 382
In calibrations were acquired over a 3 µm range along the z axis with a 50 nm step-size with 30 387 frames measured at each z-height. This calibration step produces template images of the DH-388 PSF, which are used for the identification of single-molecule signals during post-processing of 389 the raw data. All imaging was performed at 25°C. Using the easyDHPSF MATLAB program 46 , a 390 z-axis calibration over a 3-µm range is obtained via a 2D Double-Gaussian fit, which provides us 391 with xy positions, width, amplitudes, and offset levels of each lobe of the fluorescent bead. 392

Mobility Analysis of 2D RsaA Tracking Data 393
Images of single molecules and beads were analyzed using custom-built MATLAB code. 394 Within one C. crescentus cell, a 2D symmetric Gaussian fit is applied to a single labeled RsaA 395 molecule of interest, which provides an estimate of its xy position at that point in time. Linking 396 together the trajectory of the same molecule over time generates tracks. In order to determine 397 whether the detected molecules were bound or not, we first calculated the root mean squared 398 deviation (RMSD) from the mean position over a 1 s sliding window (20 frames). Bound 399 molecules will have a relatively low RMSD (near the localization precision) compared to a 400 molecule freely diffusing within the cell. By performing this for every track (9 tracks) within the 401 same cell, we were able to generate a histogram of all RMSD values (6625 RMSD values), 402 which shows a clear peak with a tail (Supp Fig. 8), where the left-most peak arises from the 403 localization precision error for bound molecules. The Gaussian fit of the lowest RMSD 404 population was utilized to determine a binding threshold (RMSD < 57.3 nm) corresponding to 405 2σ above the mean. Molecules were classified as bound if RMSD < Threshold or unbound if 406 RMSD ≥ Threshold). 407

2-Color Analysis of 2D RsaA/Nucleation Site Tracking Data 408
The xy location of each nucleation site, labeled with Cy3, was separately determined by 409 fitting a 2D symmetric Gaussian. The locations of these nucleation sites was found to be 410 stationary over the ~15 minute imaging period. Using the 2D RsaA trajectories analyzed earlier, 411 we calculated the d NS , which is defined as the distance between the RsaA molecule and the 412 nearest nucleation site. By calculating both the RMSD and the d NS , we can categorize each frame 413 as the following: (a) bound only, (b) bound and close to a nucleation site, or (c) close to a 414 nucleation site only. 415

3D Mean Square Displacement (MSD) Analysis 416
Images of RsaA imaged in 3D were analyzed using custom-built MATLAB code for 417 analyzing DH-PSF data. A 2D Double-Gaussian fit was applied to each emitter in the field-of-418 view, which provides us with x, y, and θ information from the tilt of the two lobes of the double-419 helix. We use the calibration obtained earlier to convert our estimates to xyz values. For each 420 individual track, the MSD is computed over a series of time lags starting from 50 ms. We then 421 pool the data over all 30 trajectories to obtain a 3D MSD plot. The diffusion coefficient is 422 extracted by fitting the following equation to the first 4 time lags: 423 where D is the diffusion coefficient, is the time lag, ! is the exposure time of the camera (50 424 ms), and ! is the localization error in the !! dimension (s x,y,z = 93 nm, 93 nm, 91 nm). package 47 . Gaussian curvature analysis was performed as previously described 22 . 430

Code Availability 431
The code used in this study is either open access 46,47 or is described above and available 432 upon request. 433 434