The self-association of proteins into symmetric complexes is ubiquitous in all kingdoms of life1,2,3,4,5,6. Symmetric complexes possess unique geometric and functional properties, but their internal symmetry can pose a risk. In sickle-cell disease, the symmetry of haemoglobin exacerbates the effect of a mutation, triggering assembly into harmful fibrils7. Here we examine the universality of this mechanism and its relation to protein structure geometry. We introduced point mutations solely designed to increase surface hydrophobicity among 12 distinct symmetric complexes from Escherichia coli. Notably, all responded by forming supramolecular assemblies in vitro, as well as in vivo upon heterologous expression in Saccharomyces cerevisiae. Remarkably, in four cases, micrometre-long fibrils formed in vivo in response to a single point mutation. Biophysical measurements and electron microscopy revealed that mutants self-assembled in their folded states and so were not amyloid-like. Structural examination of 73 mutants identified supramolecular assembly hot spots predictable by geometry. A subsequent structural analysis of 7,471 symmetric complexes showed that geometric hot spots were buffered chemically by hydrophilic residues, suggesting a mechanism preventing mis-assembly of these regions. Thus, point mutations can frequently trigger folded proteins to self-assemble into higher-order structures. This potential is counterbalanced by negative selection and can be exploited to design nanomaterials in living cells.
The ubiquity of symmetry in biological systems testifies to the unique properties it enables1,2,3,4,5,6. Symmetry allows compact genetic encoding of large protein assemblies such as viral capsids and cytoskeleton tubules and filaments. Symmetric oligomers also allow cooperative, switch-like transitions, as in the conformational changes promoting oxygen capture and release by haemoglobin. Additionally, the repetition of subunits within a protein oligomer introduces multivalence, a central element of polymer and supramolecular chemistry (Fig. 1a). The potential of symmetric oligomers to form supramolecular assemblies has in fact been demonstrated in vitro by synthetic design of protein fibres8, nanotubes9, cages10,11,12,13, and lattices14 with remarkable mechanical properties15. Whether evolution frequently samples such supramolecular assemblies is, however, unknown.
To trigger a new mode of self-assembly, a new interaction of the protein oligomer with itself must be created. In sickle-cell disease, a new interaction is created by a single glutamate-to-valine mutation7. This minor surface modification, repeated on opposite sides of the haemoglobin oligomer owing to its internal symmetry, results in the formation of supramolecular fibres. Is the power of a single point mutation to generate supramolecular assemblies unique to haemoglobin, or does it reflect a general property of protein surfaces? Notably, an analysis of protein surface characteristics has shown that solvent-exposed faces of proteins are, on average, within a few mutations away from resembling protein–protein interaction interfaces16. As a result, it can be predicted that random mutations at protein surfaces have the potential to create new interaction sites.
We investigated the capacity of surface mutations to trigger new interactions leading to supramolecular assembly. We used a strategy consisting of an increase in surface hydrophobicity with no regard for other factors such as geometrical or charge complementarity. In contrast to classic protein engineering experiments in which de novo properties are selected from libraries containing 106–109 mutants, we created only a few mutants (fewer than ten) per protein studied. The simplicity of our strategy makes our results uniquely amenable to evolutionary interpretation, as any phenotype we observe should often be sampled during evolution.
We initially examined a peptidase from E. coli that assembles into a homo-octamer. We introduced point mutations increasing the hydrophobicity at the head of the ring, reasoning that a new interaction between rings could trigger stacking into fibres (Fig. 1b). To monitor the self-assembly state in vivo, we fused a yellow fluorescent protein (YFP) to the subunit forming the octamer (Fig. 1c). To increase the octamer’s surface hydrophobicity, we mutated a cluster of solvent-exposed and charged residues (E239/E243/K247) to leucine (Fig. 1d). The wild-type octamer showed homogeneous fluorescence when expressed in yeast cells, but the triple leucine mutant self-assembled into micrometre-long fibres spanning entire cells, as did a triple tyrosine mutant (Fig. 1d). Mutating in turn each amino acid of the triplet to leucine did not show a noticeable effect. Remarkably, however, a point mutant to tyrosine (E239Y) sufficed to trigger the supramolecular assembly of the enzyme (Extended Data Fig. 1 and Supplementary Video 1).
The fact that we did not rely on any prediction tool for choosing these mutations suggests that the phenotype we observed is naturally frequent and that this enzyme evolves on the edge of supramolecular self-assembly. To assess the generality of this notion, we performed similar experiments for 11 additional homomers (Extended Data Table 1 and Supplementary Table 1), mutating charged surface residues either to leucine or tyrosine. These mutations triggered the formation of fibres in five additional structures, among which three were single point mutants (Fig. 2a and Supplementary Table 2). In the six remaining structures, we observed punctate foci rather than fibres. In all cases, self-assembly took place at concentrations within a physiological range (Extended Data Fig. 2).
Formation of protein fibres is typically associated with misfolding and amyloids17,18, so we examined the biophysical and structural properties of the mutant proteins. Notably, the peptidase supramolecular assembly was reversible by lowering the ionic strength (Fig. 2b), a property inconsistent with amyloid fibrils. Furthermore, we purified nine mutants exhibiting a super-assembly phenotype and compared their secondary structure contents with those of the wild-type proteins by circular dichroism (Supplementary Table 3). In all cases, the circular dichroism spectra indicated that the mutants were folded (Fig. 2c). Lastly, five mutants selected for such analysis showed high thermostability (melting temperature Tm > 70 °C) like their wild-type counterparts (Extended Data Fig. 3). Together, these biophysical properties indicate that the mutations were not detrimental to protein structure or stability, despite their dramatic impact on the assembly phenotype in vivo.
To make the proteins amenable to the biophysical characterization described above, we inhibited supramolecular assembly using a mild detergent (n-dodecyl-d-maltoside) and the amino acid l-arginine. To gain structural insights under conditions where self-assembly occurs, we used transmission electron microscopy (TEM) (Fig. 3a and Extended Data Fig. 4). Of the six structures self-assembling into fibres in vivo, five also formed fibres in vitro. Also, two of the mutants forming punctate foci in vivo assembled into fibres in vitro (Fig. 3a and Extended Data Table 2). The relatively large size of individual homomers repeated along the fibres allowed us to estimate their size from TEM images (Extended Data Fig. 4). We observed an excellent match between the size of fibre-forming mutants and their wild-type counterparts (R2 = 0.94, P = 3.6 × 10−4; Fig. 3b), indicating that self-assembly occurs through the folded state. We also verified that fusion of the YFP tag did not affect the formation of fibres (Extended Data Fig. 4). To capture the molecular details of fibre formation, we obtained a density map of a fibre by cryo-TEM single-particle reconstruction (Electron Microscopy Data Bank accession number EM-4094; Extended Data Fig. 5). The map had a local resolution of 7–8 Å in the central octamer and at the interface, which included the α-helix where the E239Y mutation was introduced (Fig. 3c). Analysis of side chains at the interface suggested several interactions that may stabilize the assembly (Extended Data Fig. 6).
Altogether, we created 73 mutants, 30 of which triggered supramolecular assemblies in vivo, in the form of punctate foci or fibres (Supplementary Table 2). These mutants allowed us to examine properties of amino acids that trigger de novo intermolecular interactions upon mutation. This analysis identified a novel structural property: the normal distance to the closest bounding plane of the homomer (henceforth nDp, Fig. 4a and Supplementary Table 5). In contrast to accessible surface area, which describes local solvent exposure, nDp depends on a residue’s position on the global quaternary structure. The lower nDp, the closer the amino acid is to the apex of the quaternary structure along a symmetry axis, and the more it enables the interaction of the quaternary structure with a copy of itself. Accordingly, we found that positions where mutations triggered fibre formation exhibited lower median nDp values (2.1 Å; Fig. 4b) than positions with no phenotypic impact (4.7 Å, P < 0.004, t-test). Positions associated with the formation of foci also showed lower values for nDp, although the difference was not statistically significant.
Thus, by geometry, surface regions with low nDp are hot spots for supramolecular assembly. In civil engineering, structures are strengthened locally, according to the load endured. In a similar manner, the greater potential of geometric hot spots to self-interact might be counterbalanced chemically by the presence of amino acids with low interaction propensity. Such negative design can protect against folding into alternative structures19,20 and might protect against non-native interactions21. We measured the interaction propensity of surface regions by their ‘stickiness’, defined by the enrichment of amino acids at protein–protein interfaces relative to solvent-exposed surfaces22 (Extended Data Fig. 7). Analysis of 1,990 dihedral complexes of known structure revealed that geometric hot spots have significantly lower stickiness compared with the rest of the protein surface (Fig. 4c, Supplementary Table 6 and Supplementary Data 1). As a control experiment, we performed the same analyses among 5,481 homomers with cyclic symmetry, which are less likely than dihedral homomers to form fibres. Low values of nDp were not associated with low stickiness in cyclic homomers (Extended Data Fig. 7), suggesting that negative selection acts against infinite assembly rather than against finite dimerization. In another control, we introduced mutations on homomer faces through which the two-fold symmetry axes pass, and did not observe the formation of fibres (Supplementary Table 4).
This analysis of protein structures indicates that mutations inducing fibre formation can act through two mechanisms. First, the residues we mutated (K, D, and E) have the lowest interaction propensities16. Thus, their mutation knocks out their protective effect against protein self-association and fibre formation. Second, mutations to hydrophobic residues provide a source for gain in solvent entropy upon binding. Considering that leucine and tyrosine side chains can bury ~100 Å2, each mutation can yield 600–1,000 Å2 of surface to bury in homomers with six to ten subunits, which is typical of biological interfaces23,24,25. While we focused our analysis on leucine and tyrosine, we anticipate that mutations to other hydrophobic or aromatic residues will show a similar potential to promote new interfaces.
Besides the mutated region, additional ‘passenger’ contact regions are expected in the fibre assembly, and these must be energetically neutral or favourable. Here, minimally designed point mutations sufficed to trigger new interactions, even with proteins in nanomolar concentrations (Extended Data Figs 2 and 8). This finding implies that passenger contacts have a high probability of showing geometric and chemical complementarity, confirming predictions that symmetry enhances interaction propensity of protein surfaces1,3,4.
This work demonstrates that protein surfaces are prone to interact by chance and that amplification of such interactions by symmetry can drive supramolecular assembly. This under-appreciated property, in fact, is the basis for protein X-ray crystallography, which requires proteins to self-assemble into crystals. In that respect, it is notable that aggregation-inducing mutations are dogmatically interpreted in the context of misfolding. Our work indicates that a ready pathway to aggregation is the uncontrolled assembly of folded proteins (Fig. 4d). This pathway, which may be further amplified by co-localization26, will be important to consider in future studies predicting the molecular consequences of mutations, including single nucleotide polymorphisms.
Our observations imply that supramolecular assemblies of folded proteins are frequently sampled by evolution. Consistent with this view, recent works have revealed that specific proteins can self-assemble reversibly into foci27,28 or fibres29 in response to changes in cell physiology30 (Supplementary Text 1). The ease with which they can evolve, as demonstrated here, suggests that many more exist.
Lastly, from a synthetic biology perspective, we have described a simple strategy to program protein self-assembly at length scales of several micrometres in vivo. With their composition known and the structure of one fibre characterized, these assemblies pave the way to modelling, engineering and reprogramming protein self-assembly in vivo.
No statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.
Selection of proteins used in this work
We searched the 3DComplex database31 for E. coli homo-oligomers exhibiting dihedral symmetry. The structures chosen were not flagged as being erroneous in the PiQSi database32. Structures not annotated in PiQSi underwent manual inspection to discard those with a likely erroneous quaternary state. All dihedral homo-oligomers with higher-order symmetry were retained (D4, D5), and two homomers with D3 symmetry were chosen at random. This process resulted in 16 structures that were initially selected. Details of these structures, including their PDB accession numbers, publication references, number of subunits, symmetries, cellular abundances, and sequence identifiers, are given in Extended Data Table 1.
Cloning procedures and mutagenesis
Genes encoding the 16 homomers were amplified from the strain E. coli K12. They were cloned into yeast expression vector p413 GPD (ATCC 87354)33. Genes corresponding to the structures of 1POK, 2VYC, 1M3U, and 1D7A were also cloned into the vector p413 CYC1 (ATCC 87378)33, and the gene encoding the sequence of 1POK E239Y without YFP fusion was cloned into the vector p416 GPD (ATCC 87360)33. Molecular cloning was done using the PIPE method34. To visualize the homomers in vivo, we fused them to a YFP (Venus)35 separated by a flexible linker of sequence GGGGSGGGGS. Whether the YFP sequence was appended to the N or C terminus was determined by inspection of the structure with the aim of leaving the surface corresponding to the n-fold symmetry axis accessible. Upon expression of the 16 proteins in yeast cells, three proteins did not show homogeneous cytosolic localization and were discarded from subsequent analyses (Supplementary Table 1). At this stage, further inspection of all structures revealed that one was erroneously assigned a dihedral symmetry, and was also discarded (Supplementary Table 1). The other 12 homomers underwent site-directed mutagenesis (QuickChange, Agilent). All mutations introduced were either towards leucine or tyrosine. We exclusively targeted residues with high solvent exposure to minimally impact stability: that is, 25% or more relative accessible solvent area, with the majority showing >50% relative accessible solvent area.
Yeast strains and media
Yeast cells were grown in synthetic complete medium lacking histidine (SC-His) and uracil (SC-His-Ura) for co-expression of the YFP-tagged and untagged versions of 1POK E239Y. Plasmids encoding the proteins studied were transformed into S. cerevisiae strain BY4741 (MATa his3Δ1 leu2Δ0 met15Δ0 ura3Δ0) using the LiAc method, as described previously36. Transformants were inoculated from the Petri plates into a 384-well plate containing SC-His (or SC-His-Ura for the strain co-expressing the YFP-tagged and untagged variants of 1POK E239Y) with 15% glycerol and were stored at −80 °C.
Plate-to-plate transfers of cells were performed using a pintool (FP1 pins, V&P Scientific) operated by a Tecan robot (Evo200 with MCA384 head). Cells were inoculated from glycerol stocks into a polypropylene 384-well plate containing SC-His (or SC-His-Ura for the strain co-expressing the YFP-tagged and untagged variants of 1POK E239Y), and grown until they reached saturation. Cells from the saturated cultures were inoculated into 384-well glass-bottom optical plates (Matrical) filled with media. Cells were grown for at least 6 h until they reached an absorbance of 0.5–1 before being imaged by microscopy. Imaging was done by an automated Olympus microscope X83 with a 60× oil objective (Olympus, plan apo, 1.42 numerical aperture) coupled to a spinning disk confocal scanner (Yokogawa W1). A 488 nm laser (Toptica, 100 mW) was used for fluorescence excitation, and a green LED was used for brightfield imaging. The emission filter-sets used for brightfield and fluorescence images were identical (520/28, Chroma). Images were recorded on a Hamamatsu Flash4 camera. Focus was maintained throughout the imaging experiment by hardware autofocus (Olympus Z-Drift Compensation System).
Time-lapse series were performed in a 96-well glass-bottom optical imaging plate (Matrical), which was coated with concanavalin A as described previously37. After coating, a cell culture (absorbance ≈ 0.5–1) was dispensed into the well. Cells were incubated for 15 min, and the liquid was then removed from the plate. Cells adhering to the plate were washed three times with SC-His and, to maximize growth as a monolayer of cells, 200 μl of SC-His supplemented with 7.5 mg ml−1 low melting temperature agarose was placed on the sample in the imaging well. Images were taken every 230 s for 10.5 h.
Estimation of intracellular protein concentrations
Intracellular protein concentrations were estimated by calibration against reference solutions containing known concentrations of YFP. Purified YFP was serially diluted in PBS buffer to concentrations ranging from 110 μM to 1 nM. The YFP solutions were transferred to the plate used to image cells. Images of the YFP solutions were analysed using ImageJ38 by recording their mean fluorescence intensity, which showed that the microscope signal increased linearly with the concentration of fluorescent protein. The equation inferred from linear regression enabled us to convert fluorescence arbitrary units into YFP molarity (1 nM = 0.998 fluorescence arbitrary units − 0.16). We then used the equation so obtained to transform the intracellular fluorescence signal into homo-oligomer concentration. To measure intracellular fluorescence, we used ImageJ38 to select at least 50 cells per strain on the basis of brightfield pictures. We drew a region of interest for each cell, and subsequently extracted its median fluorescence in the fluorescence channel. For each strain, the median fluorescence across all cells was converted to YFP molarity using the equation obtained above.
Protein expression and purification
Genes encoding the wild-type proteins and mutants were cloned into the vectors pET30b+ and pET29b+, containing an N- or C-terminal 6× His-tag, respectively, and a kanamycin resistance cassette. Vectors were transformed into BL21(DE3) cells. Transformants were grown in lysogeny broth at 37 °C until they reached an absorbance of 0.6. Then, protein expression was induced by addition of isopropyl β-d-1-thiogalactopyranoside to a final concentration of 0.4 mM, and cells were grown overnight at 25 °C. Cells were harvested by centrifugation and resuspended in lysis buffer: sodium phosphate 20 mM, 500 mM NaCl, and 20 mM imidazole, pH 7.4. Lysozyme and benzonase were added to the lysis buffer at a concentration of 0.2 mg ml−1 and 50 units ml−1 respectively. Cells were lysed by sonication and centrifuged. From this point, three procedures were used to purify the different proteins, as follows.
1. Wild-type proteins, isolated YFP, and mutants D497L and K491Y of 2VYC were purified from the soluble fraction of the cell lysates. Crude lysates were loaded on His Gravitrap columns (GE Healthcare) and eluted in sodium phosphate 20 mM, 500 mM NaCl, and 500 mM imidazole, pH 7.4.
2. Other mutants were present in the insoluble fraction of the cell lysates. Therefore, pellets were washed twice with lysis buffer 1% Triton X-100 and one more time with lysis buffer. Solubilization of inclusion bodies was performed by resuspending the pellets in Tris 20 mM, 2 M l-arginine, pH 7.5.
3. Purification of 1FRW D170L/D173L/K175L/K176L variant was done in denaturing conditions. After harvesting, cells were resuspended in 7 M urea, sodium phosphate 20 mM, 500 mM NaCl, and 20 mM imidazole, pH 7.4. They were lysed by sonication, loaded into a His gravitrap column, and the elution was done in 7 M urea, sodium phosphate 20 mM, 500 mM NaCl, and 500 mM imidazole, pH 7.4. Refolding was achieved by removing urea by dialysis.
The purity of all the samples was assessed by SDS–polyacrylamide gel electrophoresis and protein concentration was determined using a NanoDrop ND-1000 by absorbance at 280 nm. Purified proteins were dialysed against suitable buffers for subsequent characterization, as described in Supplementary Table 3.
Precipitation assay of 1POK E239Y
Three solutions of the wild-type and mutant (E239Y) isoaspartyl dipeptidase (PDB 1POK) were prepared at a concentration of 1 mg ml−1 (measurement 1) in Tris 20 mM, pH 7.5. NaCl was subsequently added to a final concentration of 0.1 M to every sample. Samples were pelleted by centrifugation at 20,000g for 20 min, and the protein concentration of the soluble fractions was re-measured using a NanoDrop spectrometer (measurement 2). Pellets were re-dissolved in Tris 20 mM, pH 7.5 without NaCl this time. Re-solubilized pellets were centrifuged for 20 min at 20,000g and the protein concentration of the soluble fraction was measured again, corrected for the amount of soluble protein discarded in measurement 2, and compared with the original concentration.
Precipitation assay of 1POK E239Y tagged with YFP
Five serial dilution series of 1POK E239Y–YFP in Tris 20 mM, pH 7.5 were prepared. The concentration of octamers ranged between 6.25 μM and 9 nM. The samples were transferred to a 384-well glass-bottom plate and imaged using the confocal microscope as described in the section on Microscopy. After imaging, 10× PBS was added to four dilution series to reach a final concentration of 1× PBS (137 mM NaCl, 2.7 mM KCl, 10 mM Na2HPO4, 1.8 mM KH2PO4, pH 7.4). The samples were incubated for 2 h at 30 °C. Samples were then centrifuged for 15 min at 2,500g, and 10 μl of the supernatant were imaged. Images were analysed with ImageJ, using the median fluorescence intensity of each image to estimate the concentration of soluble protein. The ratio of fluorescence intensity in the supernatant relative to the intensity in the sample where no PBS was added gave the fraction of soluble protein. Resulting values were multiplied by 1.11 to account for the PBS dilution.
Circular dichroism experiments
Circular dichroism experiments were performed on a Chirascan spectrometer (Applied Photophysics). A mild detergent, n-dodecyl β-d-maltoside, was used to increase the solubility of some of the proteins assessed. Secondary structure content and thermal stability of wild types/mutants were always compared using the same buffer for each pair (Supplementary Table 3). Protein concentrations used in all the experiments ranged between 0.1 and 0.2 mg ml−1. Thermal denaturation experiments were monitored by following ellipticity at 220 nm from 25 to 90 °C at a heating rate of 1 °C min−1. Since many of the proteins did not exhibit a thermal denaturation transition, we used guanidine hydrochloride to a concentration of 2.5 M at 90 °C to achieve complete denaturation. This last measurement was used to infer ellipticity of the unfolded state and served as a reference for normalization.
TEM data acquisition
All proteins studied by electron microscopy were prepared in 20 mM Tris, pH 7.5, except for 1POK E239Y and 1POK E239Y–YFP for which we added 100 mM NaCl, and for 2VYC (wild type, D497L, and K491Y mutants) which were prepared in 100 mM sodium phosphate, pH 7. This last buffer was described to maintain the decameric form of 2VYC39. Ten microlitres of the protein samples at about 0.1 mg ml−1 concentration were applied to glow-discharged, carbon-coated copper TEM grids (300 mesh, EMS) for 10–20 s. Excess liquid was then blotted, and after a wash with distilled water, the grids were stained with a 2% uranyl acetate solution. Samples were visualized in an FEI Tecnai Spirit or T12 transmission electron microscope, equipped with an FEI Eagle camera or a Gatan ES500W Erlangshen camera respectively.
For single-particle cryo-electron microscopy, 3.5 μl of 1POK E239Y solution (Tris 20 mM, 100 mM NaCl, pH 7.5, 0.2 mg ml−1) was applied to glow-discharged Quantifoil holey carbon grids (R2/1, 300 mesh) coated with a thin layer of carbon. Grids were plunge-frozen in liquid ethane cooled by liquid nitrogen, using a Leica EM-GP plunger (3.5 s blotting time, 95% humidity). Grids were imaged at liquid nitrogen temperature on a FEI Tecnai TF20 electron microscope operated at 200 kV with a Gatan side entry 626 cryo-holder and a condenser aperture of 30 μm. Images were recorded on a K2 Summit direct detector (Gatan) mounted at the end of a GIF Quantum energy filter (Gatan). Images were collected in super-resolution counting mode, at a calibrated magnification of 23,657, yielding a physical pixel size of 2.11 Å. The dose rate was set to ~2 electrons per square ångström per second and a total exposure time of 8 s, fractionated into 40 subframes. Defocus range was 1.2–4 μm. All dose-fractionated images were recorded using an automated low-dose procedure implemented in SerialEM40.
Single-particle cryo-electron microscopy image processing
Recorded image frames (2–40) were binned by a factor of 2 and subjected to whole-image beam-induced motion correction using MOTIONCORR41. Contrast transfer function parameters were estimated using CTFFIND3 (ref. 42). A total of 574 images were selected for processing, from which 38,786 particles (1POK E239Y octamers) were manually picked using EMAN2 e2boxer43 and extracted into 128 pixel × 128 pixel boxes. Octamers were picked only from within filaments (Extended Data Fig. 5c) so that all extracted boxes contained one octamer in the centre and two adjacent octamers from the filament. Subsequent classifications and refinements were performed using RELION44. Particles underwent two-dimensional classification with a round mask of 160 Å diameter (Extended Data Fig. 5d). The vast majority of the particles were included in good classes and retained for subsequent processing. Particles were then classified in three-dimensions into four classes with sphere masks of 160 Å diameter and D4 symmetry imposed, using the octameric structure solved by X-ray crystallography and filtered to 40 Å as an initial model45. Then, particles from the two best three-dimensional classes were combined (17,277 in total) and subjected to three-dimensional refinement with a sphere mask of 136 Å diameter and D4 symmetry imposed (Extended Data Fig. 5e). Local resolution estimation using ResMap46 indicated resolution of 7–8 Å in the central octamer and resolution higher than 10 Å in the adjacent octamers (Extended Data Fig. 5f), probably because of flexibility within the filaments. Resolution at the interface-forming helices, which contained the E239Y mutation, was about 8 Å, indicating that this segment is relatively rigid. The global resolution in the final structure, estimated by the gold-standard Fourier shell correlation = 0.143 criterion (Extended Data Fig. 5g), was 10.5 Å.
Cyclic tetramers from the crystallographic structure45 were fitted in the density map using UCSF Chimera47. According to this fit, octamers stacked into a filament along the four-fold symmetry axis with 80.3 Å translation between adjacent octamers. This structure could be interpreted as a left-handed helix with 33.3° rotation between adjacent octamers, or a right-handed helix with 66.7° rotation. The α-helix where the mutation was introduced showed a good fit (Fig. 3c) and was at the main contact point between octamers.
Images showing protein filaments visualized by electron microscopy after negative staining were opened in ImageJ, lines were drawn from one unit to the next along the fibre, and each line was added to the ‘region of interest manager’48. The distribution of line lengths was then recorded. We used the average of the distribution to infer the distance separating adjacent units, and the standard deviation to estimate measurement errors, which are shown in the plot as vertical bars (Fig. 3b). We compared these distances with the height (h) of the minimum bounding-box containing the crystallographic structure. Because the bounding-box was calculated on the basis of α-carbons, we added a constant value of 5 Å to h to correct for the presence of side chains. We also considered a constant uncertainty of ±5 Å, which is reflected in the figure as horizontal bars.
Computational structural analyses
Proteins of known structure exhibiting either cyclic or dihedral symmetry were retrieved from the 3DComplex database, along with information on symmetry axes31. We used a non-redundant set filtered at 80% sequence identity. To avoid biases due to membrane and viral proteins when analysing surface stickiness, we discarded all structures containing one of the following chains of characters in their title or description: ‘*lipid*’, ‘*transport*’, ‘*rhodopsin*’, ‘*membran*’, ‘*virus*’, ‘*viral*’. A symmetry axis was considered as a unit (1 Å) vector s originating from the centre of mass of the structure. Similarly, the α-carbon of each residue i defined a vector ri originating from the centre of mass. Two bounding planes orthogonal to a symmetry axis were defined and intersected at the maximal (dmax) and minimal (dmin) values of the dot product s • ri, considering all residues i. The measure nDp for a given residue i was calculated as the minimal distance to either bounding plane: nDpi = min(dmax − s • ri, s • ri − dmin). Cyclic symmetries have a single symmetry axis, but dihedral symmetries have multiple axes. For example, D2 homomers have two two-fold axes. For those, we could not distinguish a unique ‘fibre-forming axis’, so nDp was computed relative to both axes, and each residue was assigned the lowest of the two values. For higher-order dihedral homomers (D3, D4, D5), nDp was computed in two different ways, which gave similar results, as follows.
A first approach was used in Fig. 4 and Extended Data Fig. 7, where D3, D4, and D5 homomers were processed similarly to D2 homomers. That is, nDp was computed relative to all symmetry axes, and each residue was assigned the lowest nDp value. Subsequently, we used only those residues for which nDp was lowest relative to the axis along which fibres could grow (three-, four-, and five-fold for D3, D4, or D5 respectively). Extended Data Fig. 7b illustrates nDp values projected onto the structure of 1POK, as well as residues associated with the four-fold symmetry axis used in the analyses. Our rationale for this approach was to exclude potential negative design effects associated with planar assembly along two-fold symmetry axes.
A second approach was used where D3, D4, and D5 homomers were processed as for cyclic homomers. That is, nDp was calculated exclusively according to the three-, four-, and five-fold symmetry axes, without considering the two-fold symmetry axes. Results of this calculation are shown in Extended Data Fig. 7d (green dashed line).
Data and code availability
Structural data used in and resulting from the bioinformatics analyses are available in Supplementary Table 6 and Supplementary Data 1; Supplementary Data 1 is available at Figshare at http://dx.doi.org/10.6084/m9.figshare.5119861. The electron microscopy density map and associated atomic coordinates of the fitted structure have been deposited in the Electron Microscopy Data Bank (EMDB) and Protein Data Bank (PDB) under accession numbers EMD-4094 and 5LP3, respectively. The R source-code used for analysis and other data are available from the corresponding author upon reasonable request.
Electron Microscopy Data Bank
Protein Data Bank
We thank S. Wolf and E. Shimoni for help with electron microscopy experiments, and J. Georgeson for setting up the microscope time-lapse. We thank members of the laboratory, D. Fass and A. Horovitz for discussions throughout the realization of this work, H. Weissman for discussions about electron microscopy, and D. Fass for invaluable feedback on the manuscript. This work was supported by the Israel Science Foundation and the I-CORE Program of the Planning and Budgeting Committee (grants 1775/12 and 2179/14), by the Marie Curie Career Integration Grants Program (number 711715), by the Human Frontier Science Program Career Development Award (number CDA00077/2015), and by a research grant from A.-M. Boucher. H.G.S. received support from the Koshland Foundation and a McDonald-Leapman Grant. Electron microscopy studies were supported by the Irving and Cherna Moskowitz Center for Nano and Bio-Nano Imaging. E.D.L. is incumbent of the Recanati Career Development Chair of Cancer Research.
Extended data figures
Cells divide while expressing the fiber-forming mutant of E. coli dipeptidase. Budding yeasts grow as they express the fiber-forming mutant of E. coli dipeptidase (1pok E239Y) fused to a yellow fluorescent protein. Images were taken every 230 seconds for 10.5 hours. We overlaid the brightfield channel showing the cells (grey) onto the fluorescent channel showing the fibers (green).