Introduction

Single-stranded RNA (ssRNA) viruses account for nearly half of all plant viruses. Most of them have only one type of structural protein, a capsid (or coat) protein (CP), and form either rod-shaped or flexuous filamentous virions1. The latter are much more common2, with viruses of the genus Potyvirus (family Potyviridae) representing the largest group3. Potyviruses have major economic impact and are responsible for more than half of the world’s viral crop damage4. Their genomic positive-sense ssRNA of about 10 kb generally encodes ten proteins3, which includes also the CP, whose copies form a flexuous filamentous capsid with left-handed helical symmetry around the viral ssRNA, as shown by the cryo-EM structures of watermelon mosaic virus (WMV), potato virus Y (PVY), and turnip mosaic virus (TuMV)5,6,7. CP consists of a highly conserved globular core flanked by two large extended regions with a high frequency of structural disorder (Fig. 1a)5,6,7,8. The C-terminal high intrinsic disorder region (C-IDR) is partially conserved and packaged in the lumen of the virion, supported by the helical ssRNA scaffold. The N-terminal IDR (N-IDR) exhibits very low amino acid conservation. It is exposed on the outer surface of the virion and is critical for the flexible nature of virions, connecting the CP units longitudinally and perpendicular to the filament axis5,6,7. This is in contrast to viruses from the genera of rigid rod-shaped tobamoviruses or hordeiviruses, where the relatively short C- and N-terminal structural elements are exposed on the outer surface of CP, and the connection between the CP units is established by the wedge-shaped CP cores9,10.

Fig. 1: Structural polymorphism of recombinant PVY CP.
figure 1

a Top: schematic representation of CP, the marked residues delineate N-IDR, Core and C-IDR. Residues G1-T41 are not resolved in the cryo-EM maps. Bottom: flDPnn96 prediction of structural disorder in CP (threshold = 0.3). b Cryo-EM micrograph of wild type VLPs. Architecturally distinct filament types are marked: red: VLPh+RNA, blue: VLPh-RNA, green: VLPr. c Cryo-EM 2D class averages and 3D reconstructions of the three VLP types. A CP subunit within each VLP is colored, color code as in b. Orange: RNA. The percentage of each particle type is indicated above the 2D classes. The overall resolution (in Å) and diameter (in nm) of the filaments are indicated. d Superposition of CP subunits of the three VLP forms. IDRs, the S125-G130 RNA-binding loop, and RNA (sticks) are colored as in b. For RMSD values, see Supplementary Fig. 2c. Top: schematic representation of CPΔC40 (e) and CPΔC79 (f). Middle: cryo-EM 3D reconstructions of the corresponding VLPs with subunits highlighted in red (VLPΔC40:h+RNA) or green (VLPΔC40:r) (e) and pink (VLPΔC79:h) (f). Bottom: cross-section of filaments with corresponding filament widths and inner channel diameters (MOLE 2.597). g Superposition of atomic models of different CPs showing the N-IDRs of CPΔC60:h (black), CPΔC79:h (pink), CPh (blue) and CPh+RNA (red). As the focus is on different N-IDR conformations, the CPh+RNA C-IDR is not fully shown; its direction is indicated by a dotted line. The black arrow marks the RNA-binding loop. For RMSD values, see Supplementary Fig. 4c. h Packing of CP subunits connected by N-IDR in VLPh (blue), VLPΔC79:h (pink), and VLPh+RNA (red). CP-CP distances between N113-Cα atoms of the subunits in adjacent helical turns are shown and the number of CP units per helix turn. i Thermal stability. Melting temperatures (Tm) are shown as mean ± SD (N = 6) for VLPΔC60 (dark gray), VLPΔC79 (pink), VLP (gray/light gray), and PVY virus (white) at different NaCl concentrations at pH 7.0. Statistical significance was determined by one-way ANOVA with Tukey’s multiple comparison test (p = 0.001). Markers (ac) indicate a statistically significant difference. The source data for panels a and i are provided in the Supplementary Data file.

Potyviral CP plays a role in virtually every step of the viral infection cycle, from transmission of the virion by aphids, virus assembly and disassembly, regulation of genome amplification, protein translation, to cell-to-cell and long-distance movement11. The structural context in which CP acts during the different phases of the viral cycle is not yet known, however, the intrinsic structural plasticity of CP5,6,7,8 mainly contributed by both IDRs seems to play the crucial role6,8.

The presence of different structural states or pleomorphism of viral capsids is quite common in enveloped12,13, icosahedral14 or even some helical viruses10,15,16 and has been associated with certain stages of their life cycle17,18. In addition, structural polymorphism is a well-known feature of recombinantly produced virus-like particles (VLPs) derived from icosahedral or rod-shaped viruses14,15,16. CP and its mode of self-assembly can be modulated using structural synthetic virology approaches, resulting in symmetric nanoparticles of different shapes and sizes with specific material properties that have great potential for medical, biotechnological, or smart material applications14,19,20,21.

Although the structures of several flexuous filamentous potexviruses22,23,24,25,26 and potyviruses5,6,7 have been recently determined, information on the structural diversity of these viruses and their VLPs is lacking6,7,23,26,27. In our work, we have investigated the structural landscape of self-assemblies formed by potyviral CP. While the structural analysis of natural supramolecular complexes formed by CPs during the viral life cycle is challenging due to the very complex and dynamic natural context, the successful production of recombinant potyviral VLPs has been reported for different expression systems, preferring plants or bacteria28. Interestingly, the cryo-EM structure of PVY VLPs prepared from bacteria, determined at 4.1 Å resolution, showed a markedly different architecture of VLP filaments than the structure of PVY virions, as they consisted of stacked octameric CP rings and did not contain RNA6. On the other hand, the structure of TuMV VLPs produced by transient expression in tobacco at 8.0 Å resolution indicated an RNA-free filamentous arrangement of CP units in left-handed helical symmetry7. Interestingly, the structure of VLPs determined at 2.6 Å resolution based on CP of another potyvirus, sweet potato feathery mottle virus (SPFMV), and produced in tobacco by transient expression in the presence of a replicating RNA, showed a virus-like architecture, with ssRNA directing the helical arrangement of CPs along the filament27. These studies showed that a specific potyviral CP self-assembles into filaments of a single architectural type under selected experimental conditions, but this differed among the three experimental arrangements, suggesting that polymorphism may also exist within a species of CP under certain conditions. In this study, we found that the wild type PVY CP can indeed form three architecturally distinct types of VLPs simultaneously. Furthermore, through structure-based engineering of PVY CP, we discovered that we can control the formation of a wide range of highly ordered supramolecular assemblies, their architecture, RNA encapsidation, and molecular properties. These can range from various filamentous to ring-shaped, cubic or spherical assemblies with high symmetry, most of which form without a template. To avoid spontaneous CP self-assembly in a complex bacterial environment, we have developed a system for CP self-assembly in vitro that allows the controlled formation of nanoparticles with desired properties. This remarkable structural diversity of PVY CP nanoparticles makes them great candidates for nanobiotechnological applications. Moreover, the high-resolution details about the structural plasticity of PVY CP could pave the way for a better understanding of CP polymorphism in a biological context.

Results

Recombinant PVY CP self-assembles into three architecturally distinct types of VLPs

To investigate the potential of PVY CP to form polymorphic assemblies, we produced PVY VLPs in bacteria. A comprehensive analysis of the cryo-EM data revealed two new filament architectures (Fig. 1b, c; Table 1; Supplementary Figs. 1a, 2) in addition to the predominant RNA-free stacked ring assembly (VLPr) that we had previously observed6. 25% of picked particles exhibited left-handed helical symmetry with no RNA packed inside (VLPh), similar to TuMV VLPs7. The remaining 8% of picked particles also exhibited left-handed helical symmetry and encapsidated RNA (VLPh+RNA), closely resembling PVY virion6 (Fig. 1c; Supplementary Table 1) and SPFMV VLPs27. An improved data analysis procedure seemed to play a crucial role here, as such a distribution of polymorphic filaments was also obtained by reprocessing our previous data6 (Supplementary Fig. 1b).

Table 1 Cryo-EM data collection, refinement and validation statistics.

The C-IDR is structurally defined only in VLPh+RNA filaments, i.e. in CPh+RNA, where the helical scaffold of RNA supports the cone-like organization of C-IDRs in the lumen of the filament (Fig. 1c, d). In the absence of RNA in VLPr and VLPh, the C-IDR in CPr and CPh is disordered, with no traceable cryo-EM density beyond A222 (Supplementary Fig. 2a). The fold of the CP core domain is conserved between the polymorphic filaments, except for the conserved RNA-binding loop S125-G1306, which adopts different conformations in the presence or absence of RNA (Fig. 1d). In all three structures, there was no cryo-EM density for the first 41 residues of N-IDR, exposed on the outer surface of the filaments (Supplementary Fig. 2a). Beyond H42, N-IDR in CPh+RNA folds similarly to that in PVY virus (Supplementary Fig. 2b), whereas N-IDR in CPr and CPh takes a different turn at K53 (Fig. 1d). Thus, the structural plasticity of PVY CP, in particular the two IDRs and the conserved RNA-binding loop S125-G130, enables the polymorphism of PVY VLPs produced in bacteria. In the absence of ssRNA, the N-IDRs adopt two slightly different conformations, as seen in CPr and CPh (Supplementary Fig. 2c) allowing the formation of two different types of RNA-free filaments (Supplementary Fig. 2d), with the stacked octameric ring assembly being the more stable and therefore predominant form.

C-terminal truncation of CP reduces the architectural diversity of VLPs

Potyviral CP with deleted C-IDR still forms filaments6,29. Because C-IDR is structurally defined in wild type VLPs only in the presence of ssRNA (VLPh+RNA), we examined how the absence of C-IDR affects filament architecture.

We prepared VLPs consisting of the CP units lacking 40 C-terminal residues, i.e whole C-IDR (CPΔC40), and analyzed them by cryo-EM (Fig. 1e). Interestingly, we detected only two different architectures of filaments. 65% of them had the stacked-ring architecture (VLPΔC40:r) and 35% had left-handed helical symmetry and encapsidated RNA (VLPΔC40:h+RNA)(Fig. 1e; Supplementary Fig. 3). The structures of the CPΔC40:r and CPΔC40:h+RNA subunits and the helical parameters of their VLPs were comparable to those of their wild type counterparts (Supplementary Fig. 4a; Table 1). This indicates that luminal C-IDR is not essential either for filament formation or RNA encapsidation. The absence of C-IDR facilitates the accessibility of the RNA-binding site, resulting in a significantly increased proportion of VLPΔC40:h+RNA filaments. VLPΔC40:r and VLPΔC40:h+RNA filaments are long flexible and hollow nanotubes with an inner dimeter of 4.1 nm and 3.7 nm, respectively (Fig. 1e; Supplementary Fig. 4b).

To prevent encapsidation of ssRNA, we further truncated the C-terminus, excluding 60 C-terminal residues (CPΔC60) containing both the C-IDR and the α8-helix, which is placed opposite to the RNA-binding loop and forms part of the RNA-binding cleft (Fig. 1d). This indeed led to formation of RNA-free filaments. Interestingly, the filaments had monomorphic helical architecture, VLPΔC60:h, with unique helical parameters (Supplementary Figs. 3, 4c; Table 1). Because we found that 19 C-terminal residues of this construct, including those of the α7-helix (Fig. 1d), were not defined by cryo-EM density, we prepared another deletion mutant, CPΔC79, excluding these residues (Fig. 1f; Supplementary Fig. 3). The VLPΔC79:h filaments had a similar structure to VLPΔC60:h, but a better-defined C-terminal part (Fig. 1g; Supplementary Fig. 4d; Table 1). The disrupted RNA-binding cleft in CPΔC60/CPΔC79 thus lost the ability to bind RNA, and consequently the RNA-binding loop S125-G130 adopted the conformation found in the RNA-free filaments VLPh and in VLPr (Fig. 1g; Supplementary Fig. 4d). On the other hand, the fold of the N-IDR in CPΔC60/CPΔC79 resembled that found in RNA-encapsidating filaments, consistent with the CP-CP distances along the VLPΔC79:h filament being closer to VLPh+RNA than to VLPh (Fig. 1h). Thus, disruption of the RNA-binding cleft by large truncation of the C-terminal part of CP prevents RNA binding and also leads to the formation of monomorphic VLPs. VLPΔC79:h filaments are compact flexible hollow nanotubes with an inner channel diameter of 5.4 nm (Fig. 1f), with thermal stability higher than that of wild type VLPs and comparable to that of PVY virions (Fig. 1i).

Simultaneous truncation of both CP IDRs leads to formation of stable octameric rings

The CP N-IDR plays an important role in filament assembly by participating in inter- and intra-ring interactions6. Moreover, we have shown here that the structural plasticity of this region enables variability in the packing arrangement of CP subunits in filaments, and thus polymorphism (Fig. 1d; Supplementary Fig. 2c). We have previously shown that truncation of the N-IDR at G40 results in an insoluble protein, whereas CPΔN49 and CPΔN49C40 with both IDRs truncated, self-assemble into single octameric rings and their short stacks6. To facilitate sample preparation for further structural analysis, we attached the His6-tag to the C-terminus of CPΔN49C40 (trCP)(Fig. 2a). Cryo-EM revealed that the affinity-purified sample consisted predominantly of trCP double octameric rings assembled in a head-to-tail (H2T) orientation (Fig. 2a; Supplementary Fig. 5). In addition, we observed some shorter filaments (<5%), with RNA-free stacked-ring or helical architecture, with the helical parameters similar to those of VLPΔC79:h (Fig. 2a; Supplementary Fig. 6a).

Fig. 2: CP with truncated IDRs preferentially forms (double) octameric rings.
figure 2

a Top: schematic representation of the trCP construct. The dashed box beyond S227 marks the presence of the additional linker with the TEV protease cleavage site and the His6-tag. Middle: representative cryo-EM micrograph of the trCP sample. Black rectangles: H2T-double rings; blue rectangles: RNA-free helical filaments; green rectangles: RNA-free stacked-ring filaments; orange rectangles: orthogonal stacking of two H2T double rings. Bottom: left: corresponding 2D class averages. Right: 3D reconstruction of the H2T-double rings, the overall resolution is indicated below in Å. b Organization of CP units in the wild type VLPr (left) and trCP H2T-double ring (right), with helical parameters shown below. The dashed rectangle highlights the contact between two rings in H2T. c Top: cross-section of the 3D reconstruction of the trCP H2T double ring with the central untraceable density in blue. Bottom: comparison of SEC (HiLoad Superdex 200 16/600) profiles of trCP before (black) and after removal of the His6-tag (gray, trCPnoHis) with the corresponding cryo-EM 2D class averages and SDS-PAGE gel. The source data for this panel are provided as Supplementary Data file. d Electrostatic surface of trCP H2T double ring with predominant negative (N-side, left) or positive (P-side, right) charge (APBS86, −/+ 5 kB T ec−1). e Model of the trCP H2T double ring in the cryo-EM density map showing the non-conserved charged residues (sticks) facing the interface as marked by the dashed gray rectangle in (b).

In wild type VLPr filaments, the N-IDR is responsible for the axial connection of the octameric rings, with no obvious interactions between the core regions of the CPs (Supplementary Fig. 2c)6. The truncation of N-IDR reduces the axial separation of the rings by 3.9 Å and shifts the twist angle in H2T double rings compared with VLPr filaments (Fig. 2b; Table 1). The blob of density in the center of the two rings (Fig. 2c), which was already observed in the 2D class averages, was assigned to a cluster of His6-tags, because it was absent in the 2D class averages after the removal of the His6-tag, which also led to the dissociation of double rings into single rings (Fig. 2c).

The octameric ring exhibits pronounced charge anisotropy, with positive (P-side) and negative (N-side) charge predominating on the opposite surfaces (Fig. 2d), which explains the ionic strength-dependent size distribution of the self-assembled particles (Supplementary Fig. 6b).

Single amino acid substitutions at the N-side restore filament formation

To investigate how disturbance of electrostatics affects the interactions between the H2T double rings, we substituted individual nonconserved amino acids in the core region pointing to the interface between the two trCP rings (Fig. 2e; Supplementary Fig. 7).

The substitutions on the N-side of the ring, trCPL99C, trCPK153E and trCPE150C showed a markedly increased tendency to form RNA-free assemblies larger than double rings (Fig. 3a-d; Supplementary Figs. 8, 9). The formation of double rings was negligible in the case of trCPL99C and trCPK153E, and instead we observed the formation of exclusively RNA-free filaments with helical (predominant form) or stacked ring architecture (Fig. 3b, c; Supplementary Fig. 8). In the case of trCPE150C, the assortment of particles was more heterogeneous, ranging from double rings to filaments. trCPE150C filaments accounted for only around 45% of the observed particles, with helical and stacked ring architectures represented to a similar extent (Supplementary Fig. 9). Interestingly, among various types of particles in the rCPE150C sample, we detected a significant proportion of two novel architectures (Fig. 3d). One of them with a central cube-shaped body composed of six orthogonally arranged octameric rings growing outwards by stacking copies of the rings to form cross-shaped junctions (Fig. 3d middle). In the second architectural type, the octameric rings joined to form a central spherical body on whose surface additional rings stacked in at least one direction (Fig. 3d right; Supplementary Fig. 9c). Overall, single amino acid substitutions of selected nonconserved residues on the N-side increased the stickiness of the surface and restored the formation of filamentous assemblies.

Fig. 3: N-side mutations resume formation of filaments and lead to novel architectures of filament junctions.
figure 3

a Comparison of SEC analysis (HiLoad Superdex 200 16/600) between the trCP sample and its N-side mutants trCPL99C, trCPK153E, and trCPE150C. The orange and gray shading indicate distinct fractions eluting earlier than H2T double rings formed by the trCP, thereby indicating formation of larger particles. The micrographs in bd show samples before SEC analysis. b, c Cryo-EM micrograph and 2D class averages of trCPL99C (a) or trCPK153E (b) filaments. Blue rectangles: helical RNA-free filaments; green rectangles: RNA-free stacked-ring filaments. Percentage of each type of filamentous particle is indicated. d Left: Cryo-EM micrograph of trCPE150C sample with 2D class averages of filaments below. White and black dashed rectangles: filament junctions growing form the central cubes or spheres, respectively; blue rectangles: RNA-free helical filaments; green rectangles: RNA-free stacked-ring filaments. Middle and right: 2D class averages and 3D reconstructions of trCPE150C filament junctions growing from the central cubic (cross-shaped junctions, middle) or spherical (spherical junctions, right) arrangement of octameric rings, respectively, with corresponding overall resolutions in Å. “N” denotes the N-side of the rings.

Single amino acid substitutions at the P-side lead to the formation of flipped double rings, cubic and spherical particles

Single amino acid substitutions at the P-side led to the formation of architecturally more homogeneous particles (Fig. 4a; Supplementary Fig. 10). trCPK176E, trCPG193D and trCPG193C assembled exclusively into double octameric rings, with cryo-EM reconstruction of trCPK176E revealing a head-to-head (H2H) arrangement of the two rings (Fig. 4b, c; Supplementary Figs. 10a, b, 11). Again, the two rings are held together by His6-tags (Supplementary Fig. 11c–f), but their central axis is slightly tilted compared with the trCP H2T double rings (Fig. 4c). However, no further stacking of H2H double rings or formation of filaments was detected, possibly due to the fact that both N-side surfaces in the double ring are exposed to the exterior.

Fig. 4: P-side mutations lead to novel octameric-ring assemblies, such as H2H double rings, cubes and spheres.
figure 4

a Comparison of SEC (Superdex200 10/300 GL) analysis of trCP and its P-side mutants trCPK176E, trCPG193C, trCPG193D, trCPK176C, trCPK176S and trCPK177E. Gray and blue shading indicate SEC fractions that elute similarly to the H2T rings and at an earlier time point, respectively. b 2D class averages and c 3D reconstructions (colored by rings) are shown from top to bottom for trCP, trCPK176E, trCPK176C and trCPK177E with their overall resolutions (Å) and particle diameter (nm) indicated below each reconstruction. “N” and “P” denote the N- and P-sides of the rings. For clarity, only the 3D reconstruction of the spherical trCPK177E assembly is shown, 3D reconstruction for cubes can be found on Supplementary Fig. 13.

SEC analysis of the P-side mutants trCPK176C, trCPK176S, and trCPK177E indicated the formation of larger particles than double rings (Fig. 4a; Supplementary Figs. 12, 13). Cryo-EM 2D class averages showed that most of these particles had a cubic shape (Fig. 4b; Supplementary Fig. 10c). 3D reconstruction of trCPK176C with an overall resolution of 3.0 Å revealed the cubes consisted of six orthogonally arranged rings (Fig. 4c; Supplementary Fig. 12), with no additional stacking of rings as found in trCPE150C (Fig. 3d middle). In the case of trCPK177E, around 30% of the particles had a spherical shape, consisting of 9 rings (Fig. 4c; Supplementary Fig. 13), similar to the spherical core of the particles formed by trCPE150C, but again without further stacking of rings on the exposed N-side.

Overall, selected amino acid substitutions of the nonconserved residues on the P-side facilitated the association of the octameric rings with the P-side involved in the interactions and N-sides exposed to exterior. This led to the formation of smaller particles, such as H2H double rings, and cubic or spherical assemblies of rings.

The cubic particles are stabilized by hydrophobic interactions and contain CP-derived cargo

To better understand what drives the orthogonal assembly of the trCP-derivatives, we analyzed the interactions in the locally refined 3D reconstruction of the trCPK176C cubes of 3.2 Å resolution (Supplementary Fig. 12; Table 1). This revealed that the hydrophobic interactions between the P-sides of the ring pairs on the C2 symmetry axis are crucial for stable assembly (Fig. 5a; Supplementary Fig. 14a). Two subunits of each interacting ring contribute to stabilization, one through the residues of the α5-helix from the core and the other through N-IDR residues, including the α1-helix (Fig. 5a). No disulfide bond was observed between the rings, because the C176 residues in the adjacent rings are too far apart. M54 in N-IDR is crucial for maintaining the interactions, as replacement by Cys in trCPM54C+K176C resulted in the formation of H2H double rings instead of cubes (Supplementary Fig. 14b, c).

Fig. 5: The orthogonal assembly of octameric rings into cubes is driven by electrostatics and stabilized by hydrophobic interactions.
figure 5

a Left: cubic assembly of trCPK176C along the C2 symmetry axis (black oval). Four distinctly colored trCPK176C subunits from adjacent rings are shown in colored ribbons (j, j+1 in one ring; k, k+1 in the adjacent ring). Gray surface: cryo-EM density. Right: magnification of the contact between two rings formed. Hydrophobic residues are shown in sticks. b Cryo-EM 2D class averages and 3D reconstructions of trCPK176C cubes before (left) and after (right) His6-tag removal, with the density map corresponding to His6-tag clustering in trCPK176C cubes shown in blue. c Left: asymmetric cryo-EM reconstruction of trCPK176C cube with additional central density in purple. The density maps of the front and back rings have been removed for clarity. Right: mass photometry spectra of the entire trCPK176C assembly show a peak centered around 1.3 MDa – consistent with additional cargo of about 245 kDa. d Deconvolution of denaturing mass spectrometry data for trCPK176C and trCPK176S reveals monomer masses of 22.3 kDa. Mutation of cysteine to serine eliminates the population of dimers. e Time evolution of clustering of triples of interacting rings for trCPK176C-noHis and trCPnoHis during MD simulation as a function of shape denoted by a triple scalar product p of ring orientations. Orthogonal packing corresponds to p = 1, while planar packing corresponds to p = 0. The source data are provided in the Supplementary Data file.

The center of each octameric ring contained a blob of density, which disappeared after removal of the His6-tags (trCPK176C-noHis) without affecting the cubic architecture (Fig. 5b; Supplementary Fig. 15a). Another blob of density was observed in the center of all cubic assemblies (Fig. 5c; Supplementary Fig. 15b), indicating the presence of putative cargo. Native mass spectrometry analysis of trCPK176C (Supplementary Fig. 15c) revealed charge series around 10,000 m/z, consistent with a double ring, and unresolvable peaks around 18,500 m/z, that we assign tentatively to a fully-formed cubic particle. To circumvent the challenge posed by this heterogeneity, we obtained mass photometry data for trCPK176C, trCPK176C-noHis and trCPK176S (Supplementary Fig. 15d–f). We measured masses of ~1.3 MDa for each, a mass higher than expected based on 48 copies of the protomers and consistent with a central cargo of approximately 250–350 kDa (Fig. 5c; Supplementary Fig. 15d, f). When the cubes were disassembled under denaturing conditions, no significant impurities were identified in the denatured spectra beside mass of the monomer (Fig. 5d; Supplementary Fig. 15e, g). A small population of covalently associated dimer was also present only in trCPK176C and trCPK176C-noHis. However, these CP dimers did not originate from the octameric rings assembling the cubes, as no disulfide bonds were observed within or between the octameric rings (Fig. 5a). Although we cannot assess at this point how important the central protein mass is for self-assembly, our results clearly indicate that no molecular species other than the subunits of trCP mutant are required for the formation of these cubic particles.

Due to highly symmetrical distribution and exposure of the C-termini on the surface of the octameric rings in the cubes, we replaced the C-terminal His6-tag on trCPK176C with the SpyTag30 (trCPK176C-SpyTag). Cubic particles, similar to those with C-terminal His6-tags were formed (Supplementary Fig. 16).

Given the rather unexpected result of self-assembly of the trCP-mutants into cubes, we investigated whether the preferential orthogonal assembly of the K176C mutant compared with the wild type trCP could be predicted by the coarse-grained molecular dynamics simulations (Fig. 5e). Starting from randomly distributed octameric trCPnoHis or trCPK176C-noHis rings in aqueous solution, the trCPK176C-noHis rings were indeed more prone to form orthogonal ring assemblies (triplets) (Fig. 5e; Supplementary Fig. 17) than the nonmutant trCPnoHis.

In summary, the cubic assemblies of selected P-side trCP mutants are composed exclusively of CP-derived units. 48 surface exposed C-termini can be modified to carry (removable) affinity tags such as His6-tag or Spy-tag.

Self-assembly can be controlled by fusion of heterologous proteins with CP

The supramolecular assemblies described above were purified directly from bacterial cell lysates. Next, we developed a system to prevent the self-assembly process in the expression system and instead trigger it in a controlled environment in vitro. To prevent the formation of filaments with C-IDRs packed in the lumen of the filament (Fig. 1c), we fused the 43-kDa maltose-binding protein (MBP) to the C-terminus of CP (Fig. 6a). Indeed, the CP-MBP fusion did not form filaments, and we were able to isolate the monomeric CP-MBP units (Supplementary Fig. 18a). The purified monomeric fraction of CP-MBP was then exposed in vitro to the tobacco etch virus (TEV) protease, which released MBP from CP, resulting in the formation of RNA-free filaments (ivVLPWT) (Supplementary Fig. 18b, c). We then applied this procedure to the CPΔC40 fusion with MBP (CPΔC40-MBP). In contrast to the VLPΔC40 formed in bacteria (Fig. 1e), the filaments produced in vitro were architecturally nearly homogenous, with 97% of the RNA-free stacked-ring architecture (ivVLPΔC40:r) (Fig. 6b; Supplementary Fig. 18d, e). Furthermore, this concept was successfully used for the in vitro triggered assembly of nanocubes, ivtrCPK176C (Fig. 6c; Supplementary Fig. 18f–h). Also in this case, cryo-EM 3D reconstruction revealed a cargo in the center of the cubes (Supplementary Fig. 18h).

Fig. 6: In vitro triggered self-assembly of engineered nanoparticles.
figure 6

a Schematic representation of fusion protein with MBP (CP*-MBP) with surface representation of both fusion components below (MBP PDB ID: 3HPI). ‘*’ marks different CP constructs. Created with BioRender (Biorender.com). Micrographs, 2D class averages and 3D reconstructions of in vitro assembled hollow nanotubes with predominant stacked-ring architecture in ivVLPΔ40:r (b) or cubic particles of ivtrCPK176C-noHis (c). The overall resolution of the 3D reconstructions (Å) and inner particle diameters (nm) are given below. The blue rectangle in b marks the presence of small population (~3%) of particles, whose helical parameters resemble the RNA-free helical form.

In summary, spontaneous self-assembly of CP and its derivatives in the bacterial expression system can be prevented by fusion of a heterologous protein at their C-termini. In vitro triggered self-assembly by proteolytic release of the fused protein leads to the formation of highly ordered RNA-free nanoparticles.

VLPs can be further stabilized by introducing disulfide bonds between CPs

It has already been shown for filamentous protein or peptide self-assemblies31,32,33 that the introduction of Cys residues at the interfaces axially connecting the subunits increases the stability of such particles. To investigate this possibility in the case of flexible PVY VLP filaments, we introduced disulfide bonds between adjacent CP subunits based on the VLPr structural model (Fig. 7a). The double Cys mutants of the full-length CP, T43C+D136C, L99C+K176C, E150C+G193C and S39C+E72C, successfully formed VLPs (Supplementary Fig. 19a) with SDS-PAGE analysis indicating disulfide bond formation (Fig. 7b). With the exception of VLPT43C+D136C, these filaments had longer median lengths (Fig. 7c), and elevated melting temperatures for 5–10 °C (Supplementary Fig. 19b, c) compared with the wild type VLPs. Moreover, VLPL99C+K176C, VLPE150C+G193C, and VLPS39C+E72C filaments survived incubation at 60 °C for 10 min under oxidizing conditions but not under reducing conditions, whereas VLP and VLPT43C+D136C disintegrated in both cases (Fig. 7d). VLPL99C+K176C, VLPE150C+G193C, and VLPS39C+E72C filaments were structurally polymorphic (Supplementary Fig. 20), and exhibited similar architecture to wild type VLPs, except that the VLPh+RNA form was essentially negligible. We could confirm the formation of disulfide bonds between adjacent rings only in the asymmetric reconstruction of their stacked ring forms (Fig. 7e). This suggests that not all adjacent Cys are paired, revealing the quasi-equivalence of subunits in the flexible filaments34. However, the uneven distribution of disulfide bonds along the filament could also be, at least to some extent, the result of the extreme sensitivity of disulfide-bonds to electron damage radiation35. Nevertheless, such interlocking brought adjacent rings in VLPL99C+K176C:r and VLPE150C+G193C:r 3.2 Å and 2.0 Å closer, respectively, than in VLPr (Fig. 7f). This was not observed in VLPS39C+E72C:r due to stapling of the CPs by structurally plastic N-IDRs (Fig. 7a). No interconnecting cryo-EM density was observed in the VLPh filaments, likely due to helical averaging along the filament. Overall, VLPs can be further thermally stabilized by introducing disulfide bonds between selected residue positions at axial CP-CP interfaces.

Fig. 7: Stabilization of VLPs by introducing disulfide bonds between CP units in filaments.
figure 7

a Schematic representation of VLPs (stacked-ring architecture). Positions of amino acid residues mutated to Cys are indicated in colored rectangles. Pairs of residues simultaneously mutated to Cys and possibly forming disulfide bonds are color-coded. b SDS PAGE gels of double Cys mutants (colored as in a) under reducing (+DTT) or oxidizing (−DTT) conditions. The bands corresponding to the mutant CP (monomer, oligomers) are indicated with arrows. ‘*’ and ‘**’ indicate impurities. c Violin plot showing the length distribution of the filaments, with the corresponding median lengths shown below. ‘n’: the number of measured filaments. Wild type VLP (gray), VLPT43C+D136C (red), VLPL99C+K176C (dark green), VLPE150C+G193C (light green), VLPS39C+E72C (yellow). d Negative staining TEM (nsTEM) micrographs of wild type and double Cys-mutated VLPs (color codes as in a) after 10’ incubation at 60 °C under oxidizing (−DTT) and reducing (+DTT) conditions. The scale in the nsTEM micrographs represents a spacing of 100 nm. e Cryo-EM density at positions expected for disulfide bonds, observed in asymmetric cryo-EM density maps of VLPL99C+K176C, VLPE150C+G193C, and VLPS39C+E72C stacked ring filaments (color codes as in a) with a corresponding mutant model of CPr fitted into the density. f Comparison of cryo-EM reconstructions of stacked-ring filaments (VLPr) of wild type VLP (gray), VLPL99C+K176C (dark green), VLPE150C+G193C (light green), and VLPS39C+E72C (yellow). The overall resolution and the distances between adjacent rings are shown. The source data for panels b and c are provided as Supplementary Data file.

Monomorphic RNA-encapsidating VLPs can be generated by a single amino acid substitution at the N-IDR/CP-core interface of adjacent CP units

Unlike other CP double cysteine mutants, VLPT43C+D136C was unique in having more uniform distribution of filament length and instability at 60 °C (Fig. 7c, d). Cryo-EM revealed exclusively RNA-packing filaments with left-handed helical symmetry, and overall resolution of 2.4 Å (Fig. 8a; Supplementary Fig. 21; Table 1), which to our knowledge is the highest resolution for the potyviral VLPs. The structure of CPT43C+D136C and the thermal stability profile of the respective filaments strongly resembled that of PVY virus (Supplementary Fig. 22a, b; Supplementary Table 1).

Fig. 8: Analysis of ssRNA packaged in VLPs formed by CPT43C+D136C.
figure 8

a Left: cryo-EM micrograph of VLPT43C+D136C. Only VLPs encapsidating ssRNA were detected (red rectangles). Right: 3D reconstruction of VLPT43C+D136C showing a CP subunit in red. b Superposition of CPn core regions of VLPr, VLPh, and VLPh+RNA with N-IDRs from CPm-2 (VLPr, green), CPn-9 (VLPh, blue) and CPn-10 (VLPh+RNA, red) (Supplementary Fig. 2c). RNA in CPh+RNA is shown as an orange cartoon. The conserved residues R46 and D136/E139 are shown in opaque or transparent sticks, respectively, using the same color code as for the N-IDRs. c VLPT43C+D136C length distribution of 468 selected particles from the nsTEM micrographs, with values above the peaks indicating the mean length ± SD. Expected VLP length was calculated with helical parameters for VLPT43C+D136C:h+RNA with 5 nt per CP unit. d Pie chart showing the mean percentage of RNA sequencing reads (per base) mapped either to CP mRNA, E. coli rRNA or other E. coli RNAs for the RNA extracted from VLPT43C+D136C. e Histogram showing five most abundant CDS after mapping to 3’ ends (Methods) in the RNA extracted from VLPT43C+D136C. Shown is the mean value of transcripts per million (TPM, expressed in percent) from two biological replicates. f Top: schematic of pRSFDuet-1 vector with introduced CPT43C+D136C and p97. “T7P” and “T7T” designate T7 promotor and terminator, respectively. Bottom: SDS-PAGE of total cell samples before (BI) and after (AI) induction of CP-p97 co-expression, and purified VLPs. g Normalized coverage plots of RNA sequencing of CP (gray) and p97 (red) coding sequences of total cell RNA (top) and RNA extracted from VLPs (bottom). Shown are smoothed (black) and raw (dark gray and dark red for CP and p97, respectively) mean of coverage (n = 2); standard deviation is indicated with light gray and orange. The vertical line designates position with significant decrease in normalized coverage in both samples. h Top: histogram showing mean values of transcripts per million (TPM, in percent) of mapped reads for CP (gray), p97 (red), or E. coli coding sequences (white) for total cell RNA and RNA from VLPs as determined by RNA sequencing. Bottom: histogram showing five most abundant CDS after mapping to 3’ ends (Methods) in RNA from VLPs. Each histogram shows the mean values of two biological replicates. The source data for panels c and f are provided as Supplementary Data file.

The cryo-EM density for VLPT43C+D136C was defined starting at residue V44 (Table 1), indicating the absence of the disulfide bond between C43 and C136 and thus the redundancy of one of the introduced cysteines. Indeed, negative staining TEM (nsTEM) of cell lysates revealed that VLPD136C resembled VLPT43C+D136C, whereas the purified VLPT43C showed stacked-ring filaments with a length similar to that of wild type VLPs (Supplementary Fig. 22c). Further cryo-EM analysis of purified VLPD136C confirmed monomorphic RNA-packing filaments (Supplementary Fig. 22d).

D136 is located in the β-hairpin of the CP core region. Together with E139 from the same β-hairpin and R46 from the N-IDR of the adjacent CP, it forms a triangle of conserved charged residues (Fig. 8b; Supplementary Fig. 7). Replacement of either residue by Ala resulted in the exclusive (VLPR46A) or predominant (VLPE139A) formation of RNA-encapsidating filaments (Fig. 8b; Supplementary Fig. 22e, f). In RNA-free VLPs composed of wild type CP, each CP subunit is linked to four adjacent subunits, with N-IDRs acting as clutches (Supplementary Fig. 2c). Disruption of these interactions by mutations in the R46/D136/E139 triangle favors the VLPh+RNA type of assembly, in which the loss of interaction between the β-hairpin and the N-IDR is compensated for by the extensive interaction network between 13 CP subunits and CP-RNA interactions present in ‘h+RNA’ filaments. Therefore, it was not surprising that the CP-construct CPΔC60:T43C+D136C, which integrates both the inability to bind RNA and the weakened N-IDR binding, was not soluble (Supplementary Fig. 22g). These results demonstrate that we can produce monomorphic RNA-encapsidating VLPs with a narrow length distribution by simple modifications of the CP-CP interface at the N-IDR-core contact.

CP encapsidates ssRNA with limited specificity

Within the narrow length distribution range of VLPT43C+D136C filaments, we detected four distinct maxima. The first was at 61 nm, which is close to the theoretical length of filaments (65 nm) encapsidating the 807 nt long CPT43C+D136C coding sequence (CDS) (Fig. 8c). Others were at 134 nm, 199 nm, and 267 nm approximately multiples of the first. This could be due to longitudinal fusion of the filaments, as is commonly in potyviruses such as PVY (Supplementary Fig. 23) or potato virus A36. Previous studies already suggested that recombinant potyviral CP encapsidates its own mRNA27,37. To verify this, we performed analysis of RNA extracted from VLPT43C+D136C filaments. Using RNA-free VLPΔC60 as a control we showed that 98% of RNA recovered from the purified VLPT43C+D136C sample was the RNA extracted from the filaments (Supplementary Fig. 24a). Reverse transcription quantitative PCR (RT-qPCR) (Supplementary Fig. 24b) showed that CPT43C+D136C mRNA was present at much higher levels in comparison to idnT background gene, reported to be stably expressed in E. coli upon heterologous protein overexpression38. To obtain a quantitative overview of all RNA transcripts encapsidated in VLPs, we employed nanopore direct RNA sequencing. This showed that ~70% of the RNA packaged in VLPT43C+D136C belonged to CPT43C+D136C mRNA and 30% were assigned to the bacterial RNAs (Fig. 8d; Supplementary Fig. 24c). Among all coding sequences (CDS), CPT43C+D136C was strongly predominant (~75%), with roughly even coverage of the entire sequence (Supplementary Fig. 24d, e). Some bacterial genes, such as hns, were also detected to a significant extent (11.6%) (Fig. 8e; Supplementary Fig. 24d).

These experiments suggest that the specificity of RNA encapsidation by CP is limited. Next, we investigated, whether we could encapsidate the mRNA of interest into the filaments formed by CPT43C+D136C. As a heterologous gene of interest, we chose the gene encoding p97, a human protein forming ~600 kDa hexamers39 that differ in architecture from VLPs. We first attempted to encapsidate p97 mRNA by adding in vitro transcribed mRNA to the CP-MBP system described above. However, after the release of MBP, no RNA was encapsidated, likely due to secondary or tertiary structural elements in the RNA produced in vitro that prevent CP from self-assembling around it. To address this issue, we used a bacterial co-expression system so that the nascent CP mRNA and the p97 mRNA were produced in temporal and spatial proximity (Fig. 8f). In this system, two heterologous mRNAs are transcribed, one for CPT43C+D136C and p97 (CP+p97) and the other for p97 only (p97). CPT43C+D136C and p97 proteins were successfully produced (Fig. 8f), and the VLP purification protocol allowed successful separation of filaments from p97 hexamers (Supplementary Fig. 25a, b).

RNA transcripts were identified and quantified by nanopore direct RNA sequencing of RNA extracted either from cells (total cell RNA) or from purified VLPs (RNA from VLPs). This initially showed enrichment of CP, p97 and some bacterial CDS in VLPs compared with total cell RNA (Supplementary Fig. 25c). However, a detailed analysis of sequencing coverage along the CP and p97 CDS revealed important differences between the two samples. Namely, whereas reads mapping to CP CDS were very abundant in both samples (Fig. 8g), coverage of p97 was significantly lower in RNA from VLPs. We also noted a marked decrease in p97 coverage after position 1500 in the total cell RNA sample (Fig. 8g, vertical dotted line). A sharp decrease was also observed at this position in the RNA from VLPs, with virtually no coverage in the following region, indicating the absence of the full-length p97 sequence in purified VLPs (Fig. 8g; Supplementary Note 1).

Because of the uneven coverage along both transcripts (Fig. 8g), we performed CDS quantification with the coverage near the 3’ end as a sensor for the level of the full-length sequence (Methods). This confirmed a very low abundance of full-length p97 transcripts in VLPs (1.8%) despite relatively high levels of p97 mRNA (19.8%) in total cell RNA (Fig. 8h; Supplementary Fig. 25d). Thus, coupling of the synthesis of p97 mRNA with the production of CP did not result in efficient encapsidation of p97 mRNA in VLPs in bacteria. Interestingly, bacterial hns mRNA was highly represented in VLPs (12.2%), significantly higher compared with its presence in the total cell RNA (3.7%). However, such enrichment in VLPs was not observed for CP or p97 (Supplementary Fig. 25d). Overall, the specificity of the recombinant PVY CP for the encapsidated RNA is not limited to CP mRNA.

Discussion

Powerful methodological approaches to high-resolution analysis have helped to discover that symmetric supramolecular assemblies of many viruses, storage or transport cages14, cytoskeleton40,41,42, flagella43,44, amyloid fibers45 and others can exist in structurally polymorphic states, with each type of self-assembly usually associated with a specific biological function. Structural polymorphism can also be applied to many recombinant VLPs, protein cages, and (artificial) peptide assemblies, providing a large repertoire of molecular platforms for vaccines, drug delivery systems, nanoreactors, biomaterials or nanomachines14,19,21,46,47,48. In particular, CPs from plant viruses represent a great resource for such nanoparticles, as they are biodegradable and usually nonpathogenic to mammals49. Among them, the most studied are ssRNA viruses such as rod-shaped TMV, and icosahedral cowpea chlorotic mottle virus (CCMV), whose CPs represent highly tunable molecular platforms for the production of nucleoprotein assemblies with remarkable architectures and material properties20,21,49.

In this study, we show how the intrinsic structural plasticity of CP from the flexuous filamentous virus PVY enables the formation of a wide assortment of highly-ordered nanoparticles, whose structural and chemical properties can be tailored by simple modifications. Unlike rigid rod-shaped TMV nanoparticles, which generally require an RNA template for stable formation21, most of the PVY CP types of nanoparticles shown here self-assemble without a template.

Our results can be summarized in seven points. First, recombinant PVY CP can simultaneously form three architecturally distinct types of VLPs (Fig. 1b, c). These filaments are mostly RNA-free, of either stacked-ring or helical architecture, with only a small fraction of the RNA-encapsidating filaments resembling the native virion. We have shown that the major source of structural plasticity and consequently polymorphism is provided by both IDRs and the conserved RNA binding loop S125-G130. The low proportion of RNA-encapsidated VLPs suggests that the efficiency of assembly of RNA-free filaments is higher than that of RNA-encapsidating ones at given conditions. The RNA-free filaments with stacked-ring architecture are predominant and thus represent the most stable form of CP self-assembly. Interestingly, only a slight change in N-IDR conformation leads to the formation of another type of RNA-free filaments with left-helical symmetry. The three types of polymorphic VLPs formed by the wild type PVY CP could potentially mimic different CP assemblies of structurally liable helical virions during different phases of the viral life cycle, such as virion assembly or disassembly and viral cell-to-cell or long-distance transport11,50, however, future in-depth studies of virus-associated structures in planta are required to confirm this. Second, we showed that most of the RNA packaged in recombinant VLPs was CP mRNA (Fig. 8c–e), which may be due to large amount of CP mRNA due to overexpression in bacteria. However, notable amounts of packaged RNAs in VLPs were of a bacterial origin, with some of CDSs even more enriched in VLPs than CP or the eukaryotic gene p97, compared with their levels in total cell RNA (Supplementary Fig. 25d). Capability of the potyviral CP to encapsidate heterologous viral RNA under certain conditions in vivo was reported before51,52. While recombinant CP shows limited specificity, it is expected that in plants, in order to prevent wasting viral resources, the interplay between the viral and/or host factors is dictating packaging of the viral ssRNA into stable virions11,53. More detailed studies are needed to understand whether the limited specificity of recombinant CP is due to the specific nucleotide sequence, RNA length, proximity of freshly overexpressed CP to heterologous RNA molecules, or the combination between these factors.

Third, we show that RNA can be encapsidated in VLPs even in the absence of C-IDR. Potyviral C-IDR has been shown to be critical for viral replication and regulated shift from translation to replication6,54,55. Here we show that C-IDR does not play an essential structural role in the filament formation or RNA-encapsidation, however, it does affect the fine structural details in filament architecture (Fig. 1e–i). Fourth, in addition to the ability of wild type PVY CP to simultaneously form filaments of different architectures, this protein and thus its self-assembly, is also highly tunable. Simple modifications in CP lead to a lower degree of polymorphism and even to the formation of monomorphic filaments, or filaments with novel architectures (Figs. 1e–i,  7, and 8a, b). Structure-based design can be used to produce purely RNA-encapsidating VLPs with relatively narrow length-distribution or exclusively RNA-free filaments with broad length-distributions (Fig. 7c). In both cases, the lumen of the filament can either be filled with C-IDRs or hollow in their absence. Fifth, we show that we can achieve a striking change in quaternary structure, i.e. structural metamorphosis, by simple genetic modifications of CP. By deletions and/or single-site mutations, we can reduce or even prevent the filament formation and instead produce single or double octameric rings of CP as well as highly ordered cubic or spherical self-assemblies of these rings, which can be further modified to form into cross shaped forms (Figs. 35). Sixth, we show that the outer surfaces of CP-derived nanoparticles, especially double rings, cubes, or spheres, can be equipped with surface exposed affinity tags such as His6-tag56 or Spy-tag30, thereby providing symmetric platforms for further functionalization (Figs. 2c and 5b, Supplementary Fig. 16). Finally, we have developed a system in which CP-derivatives are fused with a heterologous protein attached to its C-terminus to obtain nanoparticles of enhanced purity, which are assembled under defined and controlled in vitro conditions (Fig. 6). Such fusion proteins with IDRs not engaged in self-assemblies could be used to study molecular interactions between individual CPs and other viral or plant host molecules, such as HCPro57, Argonaut58, or RNA59.

In summary, the intrinsic structural plasticity of PVY CP allows a remarkable structural diversity of its supramolecular assemblies. The high-resolution data obtained in this study and the possibility of structure-based design of nanoparticles with novel architectures and tailored properties make PVY CP an excellent candidate for nanobiotechnological applications, such as vaccine and biosensor development, cargo storage and delivery, medical imaging, or energy and nanostructured materials49,60. Bacteria represent a preferred expression system as they allow efficient and cost-effective production of nanoparticles. Although the detailed information on the structural diversity of PVY CP shown here is based on nanoparticles produced in bacteria, it may facilitate future studies on the role of PVY CP in its natural environment.

Methods

Molecular cloning of CP variants

The wild type PVY CP from a complementary DNA (cDNA) of PVY-NTN strain (GenBank accession no. KM396648), and its double deletion mutant without (CPΔN49C40) or with C-terminal His6-tag, preceded by the TEV protease cleavage site (termed truncated CP, trCP), were previously cloned in vectors pT7-7 (CP) and pET28a (CPΔN49C40, trCP), respectively6. The C-IDR deletion constructs were cloned using classical restriction enzyme-based approach and inserted in pET28a vector. To obtain constructs with introduced mutations, site-directed mutagenesis was performed using inverse PCR method61,62 with one or two oligonucleotides (nucleotide sequence available upon request).

For CP-MBP constructs, sequence encoding maltose-binding protein (MBP) with a C-terminal N-rich linker and “factor Xa” cleavage site, was obtained from the pMAL-c2X vector backbone. This sequence was inserted between TEV protease cleavage site and His6-tag at the C-terminus of His6-tagged CP construct, cloned previously6. Cloning of CPΔC40-MBP and trCPK176C-MBP constructs was done via the Gibson cloning method63,64 (NEB).

For co-expression experiment, CPT43C+D136C and human p97 (kindly provided by Dr. Marta Popović, Ruđer Bošković Institute, Croatia) were cloned in the pRSFDuet-1 dual expression system vector (pRSFDuet1-Cdc45) using PCR and Gibson assembly63,64. RNA-packing CPT43C+D136C was cloned into the first multiple cloning site (MCS) and p97 into the second MCS, while the connecting region was identical to the commercial pRSFDuet-1 backbone. All sequences were verified by nucleotide sequencing (Eurofins Genomics or GENEWIZ).

Expression and purification of CP variants

E. coli BL21(DE3) cells, transformed with plasmids containing CP constructs, were grown to an OD600 of 0.8–1.2 in 2× YT medium (16 g l−1 tryptone, 10 g l−1 yeast extract, 5 g l−1 NaCl) supplemented with 5 mM MgCl2 and 2 mM CaCl2. Gene expression was induced with 0.1 mM Isopropyl β-D-1-thiogalactopyranoside (IPTG) and the cells were grown overnight at 20 °C.

Non His6-tag variants forming VLPs were purified as described previously6 with minor modifications and all purification steps done at 4 °C. In brief, the harvested cells were lysed by sonication on ice in phosphate-buffered saline (PBS) (1.8 mM KH2PO4, 10.1 mM Na2HPO4, 140 mM NaCl, 2.7 mM KCl, pH 7.4) and centrifuged at 20,000 × g for 40 min. The lysate was incubated for 30 min in the mixture of 4% PEG 8000 and 500 mM NaCl. Following centrifugation for 30 min at 14,000 × g, the pellet with VLPs was resuspended in PBS by gentle overnight shaking. Remaining solid material was removed by 30 min centrifugation at 35,000 × g. The soluble fraction with enriched VLPs was loaded on 20–60% sucrose density gradient and ultracentrifuged at 117,000 × g for 6 h in a Beckman 50 Ti rotor. All fractions of the gradient were collected and analyzed with SDS-PAGE to identify fractions containing CP. Selected fractions were pooled, dialyzed for 24 h against PBS, concentrated using Amicon Ultra centrifugal filters with a 100-kDa molecular weight cut-off to the final concentration of 1–3 mg ml-1 and supplied with glycerol up to 5% v/v (final concentration) before storage at −80 °C.

To achieve higher purity of the VLP samples used for RNA extraction, an additional purification step of ammonium sulfate precipitation65,66 was implemented before the standard VLP purification procedure described above. After cell lysis and centrifugation, ammonium sulfate was added to the soluble fraction to 15% (w/v) concentration. Following stirring for 30 min, the precipitated proteins were pelleted with 15 min centrifugation at 13,400 × g. The process was repeated in a stepwise manner with 5% (w/v) increase in ammonium sulfate concentration up to final 30% (w/v). The pellets pulled at different concentrations of ammonium sulfate were resuspended in PBS and subjected to SDS-PAGE analysis. Fractions with enriched either p97 or CP, were dialyzed via PD-10 desalting columns and with dialysis tubing (12–14 kDa cut-off), with CP-enriched fraction further purified as described above for filamentous VLP. All steps of the purification procedure were done at 4 °C. Final samples were concentrated to a concentration of 1–2 mg ml−1 and stored at −80 °C.

His6-tagged proteins (trCP, CP-MBP) were isolated from cells by sonication on ice in PBS with 10 mM imidazole, followed by 40-min centrifugation at 50,000 × g and Ni-NTA chromatography. The non-specifically bound proteins were washed from the column and the His6-tagged proteins eluted with PBS containing 300 mM imidazole. The eluted fractions were dialyzed against PBS overnight and concentrated using Amicon Ultra (30-kDa or 100-kDa cut-off), and loaded on the size exclusion column Superdex 200 10/300 GL (24 ml) or Superdex 200 16/60 PG (120 ml) (GE Healthcare) with PBS as the running buffer. Fractions with desired trCP variant on SDS-PAGE, were pooled and concentrated using Amicon Ultra centrifugal filters or Pierce™ Protein Concentrators, both with 100-kDa molecular weight cut-off, to the concentration of 1–7 mg ml−1 for various assemblies. For CP-MBP, size exclusion chromatography (SEC) step was performed on HiLoad Superdex 200 16/600 at identical conditions as those used for separation of column manufacturers size standards (GE Healthcare). In the case of trCP, trCPK176E and trCPK176C, His6-tags were removed using the TEV protease in 1:10-1:20 (TEV:trCP) molar ratio overnight at 20 °C, followed by the second Ni-NTA chromatography or SEC at room temperature. Fractions containing the proteins with cleaved tags were concentrated to 4 mg ml−1 and stored at −80 °C. Sample purity and protein folding were checked with SDS-PAGE and circular dichroism spectroscopy, respectively.

In vitro self-assembly of VLP filaments and cubic particles

In all types of CP*-MBP fusions, CP* self-assembly was initiated with the addition of TEV protease to the purified CP*-MBP in a molar ratio of 1:10–1:20 (TEV:CP*-MBP) and left overnight at 4 °C. For CP-MBP, the sample after TEV protease cleavage was loaded onto NiNTA column to separate the cleaved His6-tagged MBP and the non-cleaved CP-MBP fusion from the freshly self-assembled VLPs. For CPΔC40-MBP, assembled VLPs were purified using the standard VLP isolation procedure described above. For trCPK176C-MBP, additional purification was done by SEC using Superdex 200 16/600 (120 ml) column. The purified samples were concentrated with Amicon Ultra centrifugal filters (100-kDa cut-off) with presence of filaments before and after TEV protease cleavage supervised by negative staining transmission electron microscopy (nsTEM) for CP-MBP or cryo-electron microscopy (cryo-EM) for CPΔC40-MBP and trCPK176C-MBP.

Thermal stability assay

The thermal stability of the proteins was determined by differential scanning fluorimetry (DSF) at a protein concentration of approximately 0.1 mg ml−1 in the presence of 2× SYPRO Orange (Thermo Fisher Scientific)67. Samples were subjected to temperatures from 25 °C to 95 °C at a gradient of 1 °C min−1. Temperature melting profiles were acquired with LightCycler 480 system (Roche). Samples were measured in triplicates with two independent measurements. Melting temperatures Tm were determined as minimum values from first derivative of the measured data curves in OriginPro2023 (OriginLab). All results are expressed as means ± standard deviation (SD) with their comparison performed by one-way ANOVA (analysis of variance) followed by Tukey’s multi comparison test. A value of p < 0.001 was considered statistically significant. All source data with detailed statistical analysis are provided in the Supplementary Data file.

Native-PAGE

Characterization of the protein assemblies in native conditions was performed on 4–16% Native-PAGE Bis-Tris gels (Thermo Fisher Scientific). Samples were mixed with 4x Native PAGE sample buffer (Thermo Fisher Scientific) and run in the Dark Blue Cathode Buffer (Thermo Fisher Scientific) for 60 min at 150 V and for 40–60 min at 250 V according to the manufacturer instructions. The gels were fixed with 40% methanol (v/v) and 10% acetic acid (v/v) and destained with 8% acetic acid (v/v).

Negative staining transmission electron microscopy (nsTEM)

For visualization, the final concentration of CP construct was approximately 1.5–3 µM. Copper mesh grids (SPI Supplies) were Formvar-coated, stabilized with carbon and glow-discharged (EM ACE200, Leica Microsystems). The VLP sample (5–20 μl) was applied to a grid, left to soak for 5 min, blotted, washed and contrasted with 1% (w/v) uranyl acetate (aqueous solution). Grids were imaged at 80 kV by CM 100 transmission electron microscope (Philips), equipped with Orius SC 200 camera (Gatan) and Digital Micrograph software 2.1.1 or by TALOS L120 (Thermo Fisher Scientific), operating at 100 kV, equipped with camera Ceta 16 M and Velox v3.0 (Thermo Fisher Scientific).

Filament length distribution analysis

Filament lengths were measured from nsTEM micrographs using the Fiji (ImageJ 1.53c) software suite68 after manually tracing multiple points along at least 200 flexuous filaments. The violin plots of filament length distribution were produced using OriginPro2023 (OriginLab) with median values and ranges above the 25th and below the 75th percentile designated on the plots with white circle and black rectangular box, respectively. Values ‘n’ above each violin correspond to the number of measured filaments. Histograms of filament length distribution were analyzed using Gauss distribution fit in Origin2018 (OriginLab) and plotted in MATLAB R2021b (MathWorks) with values above the peaks provided as mean ± SD. All filament length measurements are provided in the Supplementary Data file.

Extraction of the total cell or VLP-encapsidated RNA

Extraction of total RNA from cells after induced overnight expression was done using the RNeasy Kit with optimized protocol for extraction from E. coli adapted from RNAprotect® Bacteria Reagent Handbook (Qiagen)69. Specifically, cell lysis was performed enzymatically using lysozyme from chicken egg white (Sigma) and proteinase K (NEB), followed by standard RNeasy protocol with on-column DNase I treatment (Roche) and final elution in RNase-free water.

RNA extraction from the purified VLP samples was performed based on the previously published protocol of extraction from Potyvirus particles70. In brief, the sample was incubated in the presence of 1% SDS (w/v) at 55 °C for 5 min, followed by phenol-chloroform extraction. The extracted RNA was then precipitated by the addition of 0.5 initial sample volume of 7.5 M ammonium acetate and 2.5 volumes of cold absolute ethanol at −20 °C for 1 h, followed by 25 min centrifugation at 12,000 × g. After washing the pellet with 70% ethanol (v/v) and air-drying for 15–30 min at room temperature, the precipitated RNA was resuspended in DEPC-treated water and treated with Turbo DNase rigorous protocol (Thermo Fisher) to remove any potential DNA contaminants, followed by isolation with RNA Clean & Concentrator-5 kit (Zymo Research) and storage at −80 °C.

RNA quantification with reverse transcription quantitative polymerase chain reaction

RNA was reverse transcribed using random hexamer oligos (IDT) and SuperScript IV reverse transciptase (Thermo Fisher) following the manufacturer’s protocol. After RT, cDNA was diluted 10x and used in the PCR. The final qPCR reaction was performed using Fast SYBR Green master mix (Thermo Fisher) in 6 μM primer mix for either CPT43C+D136C or idnT as a control gene, found to be stably expressed in E. coli upon induction of protein overexpression38. The reaction was measured using LightCycler 480 system (Roche) with the following conditions: 92 °C for 3 min followed by 40 cycles of 3 s at 92 °C and 30 s at 60 °C. The measurements were made in 2 biological replicates (for each 3 technical replicates). Ct values were obtained using automatic threshold detection by the software (Roche), mean Ct value and standard deviation were calculated in Origin2018 (OriginLab).

Polyadenylation, direct RNA sequencing and bioinformatic analysis

For poly(A)-tailing reaction E. coli poly(A) polymerase (NEB) was used. Purified RNA was polyadenylated for 1 min at 37 °C in the following reaction mix: 10 µl total RNA, with 2 µl 10× E. coli poly(A) polymerase buffer, 2 µl ATP, 5 µl nuclease-free water and 1 µl E. coli poly(A) polymerase (NEB). The reaction was stopped by the addition of 5 µl of 50 mM EDTA. Polyadenylated RNA was cleaned using 2.5× sample volume of AMPure XP beads (Beckman Coulter) and eluted in 10 µl nuclease-free water.

Direct RNA sequencing was performed using the Direct RNA Sequencing protocol (SQK-RNA002) for MinION adapted to sequencing using a Flongle flow cell (Oxford Nanopore Technologies). For adapter ligation, 1 µl of T4 DNA ligase (Thermo Fisher) was used and SuperScript IV (Invitrogen) was used for reverse transcription. For RNA adapter ligation 4 µl NEBNext Quick Ligation Reaction buffer (NEB), 2 µl RNA Adapter (RMX), 1.5 µl T4 DNA ligase were added and the total reaction volume was brought to 20 µl. Finally, RNA was cleaned using 1× sample volume of AMPure XP beads and eluted in 9 µl Elution Buffer (EB). The eluate was then loaded on a Flongle R9.4.1 flow cell.

Raw read files were base-called using guppy version 6.0.0 using high-accuracy mode (rna_r9.4.1_70bps_hac.cfg) with filtering set to minimum Q-score of 7. Base-called reads were mapped either to E. coli genomic coding sequences (CDS) or its genome. In the first case, base-called reads were mapped to E. coli BL21(DE3) genomic coding sequences (genome NCBI Reference Sequence NZ_CP081489.1) to which custom coding sequences of CP and p97 from the expression vector pRSF-Duet1 were added manually. Mapping was performed using minimap271 with ''-ax map-ont -k14'' parameters. Mapped reads were filtered using samtools v1.672 and mapped reads with the MAPQ score 60 were retained. Reads mapped to E. coli CDS were counted using NanoCount73 with default parameters, where filtering for reads that map within 50 nt of the 3’-end of the reference was enabled (3’ filtering). Estimated count values per coding sequence were used to calculate adjusted transcripts per million (TPM) values for only those transcripts that were present in both biological replicates. Values for transcripts not present in both biological replicates were discarded. Mann-Whitney-Wilcoxon two-sided test was performed over adjusted TPM values between both replicates of each measured RNA sample (CPT43C+D136C expression: p. val. = 3.9 × 10−7 comparing ‘RNA from VLPs’ replicates; CPT43C+D136C-p97 co-expression: p. val. = 0.027 comparing ‘total cell RNA’ replicates, p. val. = 0.204 comparing ‘RNA from VLPs’ replicates). Adjusted TPM values were averaged between biological replicates and used further in downstream analyses. Per base coverage was computed using bedtools v2.30.0. Per base coverage was further normalized by division with the sum of coverage of all bases mapped and multiplied by a million bases. Coverage plots were plotted using seaborn Python library. Smoothened per base coverage means were calculated as the mean value of 40 consecutive bases.

Mapping the reads to the E. coli BL21(DE3) genome (NCBI Reference NZ_CP081489.1) to which CP and p97 coding sequences were added as additional chromosomes, was performed using minimap2 with “-ax map-ont -k14” parameters. Mapped reads were filtered using samtools v1.6 for the MAPQ score of 60. The amount of rRNA reads in each of the samples was calculated by intersecting the filtered mapped sequences with a genomic GTF file using bedtools intersect. Relative amounts of different RNA species as bp % values were calculated by adding gene-specific base pairs (bps) and comparing them to the sum of all mapped bps.

Cryo-EM grid preparation and data acquisition

For grid preparation, 3 μl of the sample with a concentration of around 1 mg ml-1 for the filamentous particles or 3–4 mg ml−1 for non-filamentous assemblies, were applied to glow-discharged Quantifoil 200-mesh R2/2 holey carbon grids (Quantifoil) followed by vitrification in Vitrobot Mark IV (Thermo Fisher Scientific). With the exception of VLPΔC40 and trCPK176C, the samples were imaged on Glacios transmission electron microscope operated at 200 kV and equipped with Falcon 3 direct electron detector (Thermo Fisher Scientific). Data sets were acquired at a nominal magnification of 150,000 corresponding to calibrated pixel size of 0.950 Å and defocus range between −0.8 and −2.1 μm with a total dose of around 40 e Å−2.

For VLPΔC40 and trCPK176C, cryo-EM data was collected on Titan Krios transmission electron microscope (Thermo Fisher Scientific) operated at 300 kV at CEITEC, Brno, Czech Republic. VLP ΔC40 data set was acquired in linear mode with Falcon2 (Thermo Fisher Scientific) direct electron detector at a nominal magnification of 75,000 corresponding to a calibrated pixel size of 1.063 Å and defocus range of −1.3 and −0.4 µm, with 40 frames collected within 1.02 s exposure giving a total dose of 84 e2. trCPK176C data set was acquired on K2 Summit direct electron detector (Gatan) operating in counting mode at a nominal magnification of 165,000 corresponding to a pixel size of 0.822 Å and defocus range between −0.3 and −3.6 μm. 32-frame movies were collected during 4 s exposure time with a total dose of 32 e Å-2.

Cryo-EM image processing

The detailed workflow for each dataset-specific reconstruction is presented in Supplementary Figs. 1, 3, 5, 8, 9, 1113, 20 and 21. In general, more than 500 movies were collected for each sample and used for cryo-EM data processing, performed in cryoSPARC v3.3 or 4.174,75,76 except for VLPΔC40 and VLPT43C+D136C, where RELION-3.177 was used.

For disulfide bond-stapled VLPs, cryo-EM reconstructions (Supplementary Fig. 20) were performed using C1 symmetry with additional selection of subclasses based on observed extensive conformational variability using 3D Variability analysis75 and 3D classification. Final sharpened non-symmetric cryo-EM maps were checked for connecting density.

The resolutions of the final cryo-EM maps, in some cases locally sharpened with DeepEMhancer v0.1378, were determined based on the gold-standard FSC criterion of 0.14379. Local resolutions were calculated using BlockRes80, and cryo-EM densities were visualized in UCSF Chimera 1.1681 and ChimeraX 1.5. Details, EMPIAR and EMDB codes are provided in Table 1 and Supplementary Tables 1 and 2.

Model building

PDB ID codes of initial models used for model building are provided in Table 1. In each case, the initial model was fitted into the reconstructed cryo-EM map using UCSF Chimera 1.16 with one central CP unit and all the neighbors in direct contact subjected to several iterative cycles of manual refinement using WinCoot 0.9.8.182 and real-space refinement with secondary structure and geometry restraints in Phenix 1.20.1 package83. For helical filaments with RNA, the segment of 5 uracils from the PVY virion (PDB ID: 6HXX) was fitted into the empty density of one CP and subjected to the same iterative cycles of refinement as for the protein components. Molprobity84 was used for validation of individual models after each cycle.

For trCPK176C, atomic models were built in the cryo-EM map after local refinement (EMD-17063) with two distinct protomer structures, one in C2 symmetric contact and the adjacent one. Model from the locally-refined cryo-EM map (PDB ID 8OPK) was rigid body-docked into the globally-refined cryo-EM map (EMD-17062) to obtain the atomic model of the entire cubic particle (PDB ID 8OPJ). Final 3D models were visualized in UCSF Chimera 1.16 and ChimeraX 1.585. The surface electrostatic potential of the trCP was calculated by APBS 3.4.186. Detailed statistics of model building and refinement are presented in Table 1.

Mass spectrometry and photometry

To denature the assembly and determine an accurate monomer mass, constructs trCPK176C, trCPK176C-noHis and trCPK176S in PBS were diluted into a solution of 50% acetonitrile and 2% formic acid, to a final concentration of 5 µM of the cubic 48mer. For the native spectrum of trCPK176C, the sample had buffer exchanged in 200 mM ammonium acetate (pH 6.9) using Bio-Spin 6 columns (Bio Rad) and sprayed at the same concentration. Nanoelectrospray mass spectrometry data were acquired using a QExactive UHMR mass spectrometer (ThermoFisher) using gold-plated 1.2 OD mm capillaries prepared in-house, as previously described87. Resultant spectra were deconvolved and analyzed using UniDec88.

To acquire mass photometry data89, the constructs were diluted with PBS to 50 nM and measured on a Refeyn TwoMP mass photometer and analyzed using DiscoverMP v2.5.0 (Refeyn Ltd).

Molecular dynamics simulations (MD)

We took the atomic model of one ring from CPΔC40:r and truncated it on the N-terminus (ΔN49) to simulate the trCPnoHis. For trCPK176C-noHis starting model, an additional K176C mutation was introduced. We constructed two coarse-grained (CG) systems by arranging eight randomly oriented rings (either trCPnoHis or trCPK176C-noHis) with a minimum initial pairwise spacing of 16 nm between rings (i.e., 1.5 times the ring diameter). Rings were immersed in a 50 × 50 × 50 nm3 cubic box of water in which neutralizing counterions were eventually added. Each CG model of the ring was generated from the corresponding atomic model by using Martinize2 protocol90. An elastic network91 was applied to maintain the overall internal structure of an individual ring. All CG-MD simulations were performed with GROMACS 2019.692 and Martini 3.0 force field93. The systems were energy minimized with the steepest descent algorithm (50.000 steps), followed by a brief NPT (keeping the number, pressure and temperature constant) equilibration cycle to relax the initial configurations (200.000 steps of 5 fs). Afterward, the systems were simulated for 10 μs in an NPT ensemble with periodic boundary conditions (500.000.000 steps of 20 fs). The temperature was maintained at 300 K and pressure at 1 bar by coupling the dynamics using V-rescale thermostat94 and Berendsen barostat95. The cut-off value for the Coulomb and van der Waals interactions was set to 1.1 nm, and a relative dielectric constant was set to 15. The rings freely diffuse in the solution until they hit each other and occasionally form a contact. The formation of ring clusters was analyzed by first identifying all ring-triplets sharing all three pairwise contacts to each other for each given instant of time (trajectory frame). As an order parameter revealing the form of the clusters we introduced the absolute value of the scalar triple product p = |(n1,n2,n3)| = |n1(n2 × n3)|, with vector ni identifying a directional unit vector along the i-th ring normal. The value p = 0 corresponds to the plane-distributed ring-triplets while p = 1 corresponds to the mutually orthogonal orientation of rings forming the corner of the cube. The coordinates for the initial (equilibrated) structures and the final structures (after 10 micro seconds) can be obtained upon request in GROMACS format.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.