Cooperative folding of intrinsically disordered domains drives assembly of a strong elongated protein

Bacteria exploit surface proteins to adhere to other bacteria, surfaces and host cells. Such proteins need to project away from the bacterial surface and resist significant mechanical forces. SasG is a protein that forms extended fibrils on the surface of Staphylococcus aureus and promotes host adherence and biofilm formation. Here we show that although monomeric and lacking covalent cross-links, SasG maintains a highly extended conformation in solution. This extension is mediated through obligate folding cooperativity of the intrinsically disordered E domains that couple non-adjacent G5 domains thermodynamically, forming interfaces that are more stable than the domains themselves. Thus, counterintuitively, the elongation of the protein appears to be dependent on the inherent instability of its domains. The remarkable mechanical strength of SasG arises from tandemly arrayed ‘clamp' motifs within the folded domains. Our findings reveal an elegant minimal solution for the assembly of monomeric mechano-resistant tethers of variable length.


Supplementary
⏐ Forced unfolding of SasG probed by AFM. Data were collected in triplicates (different surfaces and cantilevers) for a SasG construct containing seven G5 and six E domains (G5 1 -G5 7 ) at the retraction rate of 200 nm s -1 (a), 800 nm s -1 (b), 1500 nm s -1 (c), 3000 nm s -1 (d) and 5000 nm s -1 (e). Data from triplicates were pooled and analyzed as described in the methods section. For each retraction rate a scatter plot superposed with a smooth density histogram is presented (left panel). The density histogram legend is shown in panel f. The corresponding force-frequency (central panel) and contour length gain (ΔL c ) -frequency (right panel) one-dimensional histograms are also shown for each retraction rate. Red lines indicate Gaussian fits to the histograms and the modal force and ΔL c values are shown.

Supplementary
is the extrapolated zero angle intensity of scattering. R g is the particle radius of gyration. R c is the radius of gyration of a cross-section. D max is the maximum linear particle dimension. D c is the maximum dimension of a cross-section. Normalized spatial discrepancy (NSD) is a measure of similarity between three-dimensional SAXS ab initio models.

Supplementary Discussion
The structure of G5 2 -G5 3 The Matthews coefficient 1 indicated that the asymmetric unit contained two molecules and had 46.9% solvent content, confirmed following phasing by molecular replacement (MR) (data statistics and refinement summarized in Supplementary Table 1). Relative to the near identical sequences of repeats G5 2 -G5 7 , G5 1 is less conserved, with 85% sequence identity to G5 2 . The X-ray crystal structures of G5 2 -G5 3 ( Supplementary Fig. 1a-c) and G5 1 -G5 2 (PDB accession: 3TIQ) were superposed by secondary structure matching, revealing only modest divergence that was localized to the N-terminal G5 domain ( Supplementary Fig. 1e). Superposition of G5 2 alone from G5 2 -G5 3 chains A and B (residue ranges 549-626 and 552-626, respectively) showed some differences in the length of the β-strands, but overall these structures are highly similar (Cα root mean square deviation (RMSD) 1.3 Å, 74 residues aligned). When both structures were aligned with G5 2 from G5 1 -G5 2 (3TIQ), deviation at the N-terminus of G5 2 -G5 3 was clear ( Supplementary Fig. 1f). Comparison of G5 2 from G5 2 -G5 3 with G5 1 from G5 1 -G5 2 (residue range 420-498) showed that the N-terminal G5 domains share a common deviation from linearity ( Supplementary   Fig. 1g), suggesting that the presence of an E segment at the N-terminus of G5 may affect the folding of the G5 residues in the interface. To test this hypothesis, G5 3 from G5 2 -E 2 -G5 3 (residue range 676-754) and G5 2 in G5 1 -E 1 -G5 2 (having 97% sequence identity) were superposed, showing they are highly similar and linear ( Supplementary Fig. 2h), supporting the role for E in maintaining the linearity of the C-terminal G5. To generate a model of the contiguous repeating unit, G5 2 -G5 3 and G5 1 -G5 2 were superposed by alignment of G5 2 , then G5 2 -G5 3 was iteratively aligned to generate a model of the contiguous repeating unit 71 nm in length, comprising G5 1 -G5 7 , with 98.5% sequence identity to the wild-type sequence ( Supplementary Fig.   1i).

SAXS analysis of the SasG repeat region
To investigate the solution structure of long contiguous repeats from SasG, we purified samples of G5 1 -G5 2 , G5 1 -G5 3 , G5 1 -G5 4 , G5 1 -G5 5 , G5 1 -G5 6 , and G5 1 -G5 7 . Their molecular masses (MM) were confirmed using electrospray ionization mass spectrometry (ESI-MS) (Supplementary Table 2). SEC-MALLS-QELS was used to analyze the homogeneity, monodispersity, solution MM and hydrodynamic radius (R h ) of purified proteins (Supplementary Table 2). All particles eluted as single peaks of MM close to that predicted from sequence analysis, with a uniform MM calculated across the peaks, indicating monodispersity ( Supplementary Fig. 2a). The ratio of the weight average MM (M w ) to M n (total mass/number of molecules) provides a measure of relative molar mass dispersity (Đ M ). In all cases, Đ M =1, confirming particles are monodisperse, and therefore suitable for SAXS analysis of particle shape. Initial analysis of SAXS scattering by the particles showed a slight concentration dependence of particle size, indicative of some interparticle repulsive effects at high concentration ( Supplementary Fig. 2b). The highest concentration data set for each sample that had a minimal concentration dependent decrease in radius of gyration (R g ) was selected for further analysis (Supplementary Table 2). The Guinier plots (ln I(s) vs s 2 ) were linear at low angles (sR g ≤ 1.1), confirming monodispersity ( Supplementary Fig. 2c). The R g of these particles estimated by the Guinier approximation was consistently close, but slightly lower than those calculated from the distance distribution function (P(r)) (Supplementary Table 2). The Guinier region showed a progressive truncation with increasing construct size ( Supplementary Fig. 2c), characteristic of scattering by highly anisotropic particles 2,3 . Calculation of G5 1 -G5 2 and G5 1 -G5 3 MM from absolute intensity (I(0)) derived from P(r), calibrated to a BSA standard and using the method of Fischer et al. 4 agreed with the MM calculated from the sequence. The MM of G5 1 -G5 4 , G5 1 -G5 5 , G5 1 -G5 6 and G5 1 -G5 7 was underestimated by both absolute scattering and calibration to BSA, but the Fischer et al. 4 method defined MMs close to those expected (Supplementary Table 2).
SAXS scattering by all constructs was relatively featureless (Fig. 2e), which is a quality associated with highly anisotropic particle scattering 5 . P(r) showed a clearly skewed distribution with maxima at short interatomic distances (Fig. 2c), also characteristic of scattering by rod-like particles 6 . We plotted a modified Guinier approximation (ln I(s).s vs s 2 ) 2 which showed a maxima, followed by a linear correlation at higher angles, characteristic of rod-like particle scattering ( Supplementary Fig. 2d) from which the radius of gyration of a cross-section (R c ) can be derived (Supplementary Table 2). The slight increase in R c with increasing construct length suggests the rod may incorporate a slight curve or coil. P(r) of a cross-section (P c (r)) (Fig. 2d) shows a similar distance distribution for all particles, of 2.05-2.30 nm wide (Supplementary Table 2), correlating with the width of X-ray crystal structures of G5 1 -G5 2 (PDB accession: 3TIQ) and  Table 2). Particle shape can also be assessed by determining the shape factor (ρ) (R g /R h ), where ρ ~0.77 indicates a spherical particle, and ρ ≥1.73 a rod-like species. The shape factor of all particles falls in the range 1.6-1.9, again suggesting SasG repeats form contiguous rod-like shapes in solution (Supplementary Table 2). The length of an extended rod (l) can be derived from l 2 =12(R g 2 -R c 2 ) 8 , giving lengths of 17.5 nm, 28.5 nm, 36.8 nm, 46.2 nm, 53.5 nm and 60.4 nm for G5 1 -G5 2 , G5 1 -G5 3 , G5 1 -G5 4 , G5 1 -G5 5 , G5 1 -G5 6 and G5 1 -G5 7 , respectively. To calculate the real space D max , distance distribution functions (P(r)) were calculated for all particles (Fig. 2c). The particle lengths determined were similar to those estimated from the calculation described above (Supplementary Table 2). P(r) functions were also used to derive R g and I(0) to overcome the limitations of Guinier analysis of these parameters due to the truncation of the Guinier region inherent for rod-like particle scattering (Supplementary Table 2). Ab initio modeling using Gasbor was repeated five times for all constructs and models were aligned and averaged ( Fig. 2b; data fit shown Fig. 2e; model mean normalized spatial discrepancy (NSD) in Supplementary Table 2). The crystal structure of G5 1 -G5 2 spatially aligns well with the filtered average shape derived from SAXS ( Supplementary Fig. 2e). Rigid body modeling was performed using a modified version of SASREF 9 employing 50 spherical harmonics to take into account the anisotropy of the particles. The calculation (total time, 2 weeks) was performed once in parallel against all datasets ( Supplementary Fig. 2f,g). The rigid body models are consistent with the extended shapes from ab initio Gasbor modeling and describe highly extended rod-like structures that incorporate a slight coil/bend (goodness-of-fit χ 2 values 9 are listed in Supplementary Table 2, Fig. 2b and Supplementary Fig. 2f,g).

TIRF-microscopy analysis of particle length
To measure the macromolecular length, we analyzed Cys-G5 1 -G5 7 -Cys fluorescently labeled at N-and C-termini using SHRImP-TIRFM (Fig. 3a). Fluorophores attached to the protein were localized individually by sequential photobleaching, and inter-fluorophore distances were calculated to estimate the molecular end-to-end distance. Different concentrations of poly-D-lysine (20, 100 and 500 µg ml -1 ) were used to treat the quartz microscope slides used for TIRFM, thus generating surfaces with an increasingly positive charge density. We obtained a range of end-to-end distances for G5 1 -G5 7 under these three conditions (mean ± s.e. = 59 ± 5 nm, Fig. 3b and Supplementary Fig. 3a The fractional residual charge on the polyelectrolyte can be calculated as (Nξ) -1 , where N is the absolute value of the counterion valence (= 1 for Na + here). SasG and DNA are predicted to have fractional charges of 0.71 and 0.24, respectively (for N = 1). SasG retains more unscreened negative surface charges compared to DNA, because it condenses fewer counterions at its surface. As the valence of the counterion increases, e.g. for poly-D-lysine, SasG will retain a larger fractional residual charge relative to DNA. This will likely favor adsorption of SasG to a positively charged imaging surface in an equilibrated form indicative of the elongated, solution conformation -the mean end-to-end distances observed for SasG ( Fig. 3b and Supplementary Fig. 3a,b) are consistent with this prediction. Our results also suggest that the G5 1 -G5 7 structure is intrinsically more rigid than double-stranded DNA of a comparable length under identical conditions (compare mean end-to-end distances in Fig. 3a and Supplementary Fig. 3a,b for DNA and SasG).
For DNA, the mean end-to-end distances (49, 51 and 54 nm) were reduced at all imaging surface charge densities relative to the crystallographic B-form value (67 nm). This has been observed previously for DNA adsorbed on a positively charged surface 11 and is consistent with surface adsorption reducing the electrostatic tension in DNA due to charge neutralization, which in turn can reduce the apparent persistence length, i.e. increasing the bending flexibility. The end-to-end distance distributions observed here for DNA (198 bp) are consistent with molecular equilibration on the imaging surface and a ~2-fold reduction in the apparent persistence length.
Thermodynamics of the SasG system G5 2 in isolation has a free energy of folding (∆G N-D ) of −2.8 kcal mol -1 (from the equilibrium denaturation studies) whereas E-G5 2 is more stable, ∆G N-D = −6.3 kcal mol -1 , a difference of 3.5 kcal mol -1 (Fig. 4 and Table 1). Hence, the folded E domain stabilizes G5 2 by 3.5 kcal mol -1 . However, we have demonstrated previously using NMR spectroscopy that the E domain is disordered in isolation 7 . If we make the conservative assumption that fewer than 15% of E molecules must be folded in isolation (or we would be able to detect them in the spectra) then the ∆G N-D of E must be ≥ +1 kcal mol -1 . Thus, from our studies of G5 2 and E-G5 2 , we estimate that the stability conferred by the E-G5 2 interface is at least 4.5 kcal mol -1 .
Importantly, however, E is unfolded 7 in G5 1 -E, so we can infer that the stability conferred by the G5 1 -E interface (−1.5 kcal mol -1 ) is insufficient to fold E. This allows us to put a minimal estimate on the free energy of folding of E as at least +2.5 kcal mol -1 (1 kcal mol -1 greater than the stability conferred by the interface, otherwise, folded E would be significantly populated at equilibrium in G5 1 -E and would be detected in the NMR spectra 7 ). Furthermore, we must revise our estimate of the stability conferred by the E-G5 2 interface, which not only enables E to fold (conferring at least −2.5 kcal mol -1 ) but also stabilizes G5 2 (by −3.5 kcal mol -1 ), i.e. the stability of the E-G5 2 interface is at least −6 kcal mol -1 .