Abstract
Numerous molecular machines are required to drive the central dogma of molecular biology. However, the means by which these numerous proteins emerged in the early evolutionary stage of life remains enigmatic. Many of them possess small β-barrel folds with different topologies, represented by double-psi β-barrels (DPBBs) conserved in DNA and RNA polymerases, and similar but topologically distinct six-stranded β-barrel RIFT or five-stranded β-barrel folds such as OB and SH3 in ribosomal proteins. Here, we discover that the previously reconstructed ancient DPBB sequence could also adopt a β-barrel fold named Double-Zeta β-barrel (DZBB), as a metamorphic protein. The DZBB fold is not found in any modern protein, although its structure shares similarities with RIFT and OB. Indeed, DZBB could be transformed into them through simple engineering experiments. Furthermore, the OB designs could be further converted into SH3 by circular-permutation as previously predicted. These results indicate that these β-barrels diversified quickly from a common ancestor at the beginning of the central dogma evolution.
Similar content being viewed by others
Introduction
The central dogma of molecular biology is governed by numerous molecular machines, including DNA polymerases, RNA polymerases, and ribosomes. Despite the detailed understanding of their regulation mechanisms, the evolutionary origins of such complex molecular machines remain obscure.
The evolutions of some pivotal proteins in the central dogma may have originated from the well conserved small β-barrels within their core regions1. For example, the core domains of DNA polymerase D (PolD) from euryarchaea and all cellular RNA polymerases are composed of two homologous β-barrels with six strands, “double-psi β-barrels (DPBBs)”2,3,4 (Supplementary Fig. 1A). The ribosomal protein L3 (rL3) and several translation factors have a similar but topologically distinct six-stranded β-barrel, “RIFT”5,6 (Supplementary Fig. 1B). The structures of DPBB and RIFT have two-fold pseudo symmetry, indicating they originated as shorter homo-dimeric peptides5,7. Five-stranded β-barrel folds such as “OB” and “SH3” are often found in other ribosomal proteins and translation factors8,9,10 (Supplementary Fig. 1C and D). Given that these β-barrel domains are highly conserved across all extant organisms and play critical roles in replication, transcription, and translation, it is hypothesized that they were among the earliest components of the primordial central dogma machinery1,2,11.
These four β-barrels (DPBB, RIFT, OB, and SH3) are classified into different folds in the SCOP, CATH, and ECOD protein databases12,13,14,15, as they have distinct topologies. Even so, the partial structure and sequence similarities between these folds have been detected16. Over the last two decades, meticulous comparative analysis of sequence motifs and partial structures have independently suggested that the DPBB-RIFT, RIFT-OB, and OB-SH3 pairs diverged from a common ancestral protein5,6,9,17 (Supplementary Fig. 1). Despite these efforts, no experimental evidence has been provided to demonstrate that such drastic fold transitions could occur via a feasible pathway, probably because of the huge sequence/structure diversity between modern proteins with the different folds, especially between the pseudo-dimeric ones (DPBB and RIFT) and the monomeric ones (OB and SH3). Therefore, an experimental reconstruction of the ancient evolutionary process between these β-barrels has been awaited to reveal the profound protein fold evolution before the establishment of the central dogma.
Here, we study fold transition between distinct β-barrels by protein engineering and structural biology technique. We previously reconstructed the evolutionary pathway of the DPBB fold, initiated through the homo-dimerization of a half-sized peptide with about 40 amino acids, followed by gene duplication and fusion7. Furthermore, by reducing the amino acid repertoire of the peptide, we have created the homo-dimeric DPBB fold comprising only seven amino acid types (Ala, Gly, Asp, Glu, Val, Lys, and Arg; design-1)(Fig. 1A, B), which could have been synthesized by immature translation systems in early life7,18. In this study, by using this simplified DPBB peptide as the starting template, we experimentally reconstructed the evolutionary pathways between the various ancient β-barrel folds in the central-dogma machinery through an unexpected missing link.
Results
Conversion of homo-dimeric β-barrels through a missing-link fold
The most simplified DPBB protein we designed previously, design-1 (mk2h_ΔMILPYS), did not fold in the typical buffer conditions (50 mM phosphate, 150 mM NaCl), but crystallized under two different conditions, containing malonate or malic acid ions, and adopted the DPBB fold in the crystals (Fig. 1A, B)7. Here, we report its third type of crystal, formed under different conditions (100 mM Tris, pH8.5, 20% PEG-400, 200 mM lithium sulfate). Interestingly, we could not solve its structure by molecular replacement using the DPBB fold as a model, implying that design-1 had adopted a different conformation in the third type of crystal.
At first, we expected that it adopted a RIFT-like structure because DPBB and RIFT supposedly evolved from a common ancestral homo-dimeric peptide7. Indeed, they commonly have (i) six-stranded β-barrel structures, (ii) an internal pseudo-two-fold symmetry, (iii) and a sequence motif “GD-box”, although the 1st loop configuration and the β2 direction are different (Supplementary Fig. 2). Following this assumption, we tried to stabilize the unsolved conformation of design-1 by introducing amino acid residues conserved in Phs018, a RIFT protein. Phs018 has high two-fold symmetry and likely retains the properties of the ancient RIFT-fold proteins5. Phs018 has remarkable sequence similarity to design-1 (26–35% identity; Supplementary Fig. 2), and we replaced five residues in design-1 with the ones at the corresponding positions of Phs018 (Fig. 1A). AlphaFold2 (AF2)19 predicted that the resultant mutant, design-2 (Ph1), would fold into a RIFT-like structure, albeit with a low lDDT region in the 1st β-turn (Supplementary Fig. 3). Design-2 was expressed, purified, and analyzed physiochemically. Circular dichroism (CD) and size exclusion chromatography (SEC) experiments indicated that design-2 was folded (not random coil) and had moderate stability (Tm = 58 °C) (Supplementary Fig. 4). Design-2 also formed crystals under similar buffer conditions to the third type of design-1 crystal, including lithium sulfate.
By molecular replacement with the predicted design-2 model (RIFT fold) and subsequent manual remodeling, the crystal structures of design-1 (third type of crystal) and design-2 were determined (Supplementary note 1). Surprisingly, they both adopted into a unique β-barrel fold that topologically differs from DPBB and RIFT (Fig. 1C, D). Compared to the homo-dimeric DPBB fold, the directions of the β2- and β2’-strands were inverted, resulting in all anti-parallel strand patterns like those in RIFT. However, the 1st loop connecting β1– β2 was rolled up, unlike the simple β-turn in RIFT (Fig. 1C, D, and Supplementary Fig. 5A). As this loop configuration resembles the letter “Zeta” in the topological scheme, we named this β-barrel fold the Double-Zeta β-barrel (DZBB). Therefore, design-1 can fold into two different structures with an identical sequence. The five point-mutations derived from Phs018 (RIFT) in design-2 stabilized the DZBB fold.
To further convert DZBB into RIFT, the sequence forming the Z-loop in design-2 was replaced with two residues (DG, GG, or GD) facilitating β-turn formation20 (design-3, -4, and -5) (Supplementary Fig. 6 and Supplementary Table 1). CD experiments demonstrated that design-3 and design-5 were partially folded (Supplementary Fig. 7), and design-3 formed well-diffracting crystals. In contrast, design-4 was unfolded but still formed crystals in the presence of sulfate ions. The crystallographic analysis revealed that design-3 and design-4 adopted the homo-dimeric RIFT-fold (Fig. 1E and Supplementary Fig. 5B). Thus, the short In/Del at the 1st loop position is the determinant for the fold transition between DZBB and RIFT (Fig. 1A). The successful experimental conversion from DPBB to RIFT through the DZBB fold, by just a few mutations, indicates that DZBB is a missing link between the ancient homo-dimeric β-barrels in transcriptional and translational proteins (Supplementary Fig. 1).
Dual-folding of “design-1” induced by small ligands
Metamorphic proteins are a rare protein class that reversibly convert between distinct folded conformations in native conditions21,22,23. Design-1 exhibited the metamorphic property, adapting two different folds, DPBB and DZBB (Fig. 1B, C). Design-1 was originally obtained by substituting three tyrosine residues in the stable parent DPBB protein, design-0 (mk2h_ΔMILPS), composed of eight amino acid types7. Thus, these three mutations destabilized the DPBB fold and then allowed it to fold into the DZBB fold, resulting in the dual-folding property. Furthermore, the addition of five-point mutations in design-2 stabilized the DZBB fold and abolished the capability to adopt the DPBB fold. Such metamorphic states like those of design-1 could have existed to bridge between different folds smoothly during drastic fold transitions22,24.
Interconversion between the two folds also resulted in the distinct domain-swapping states (DPBB: β3β1’β2β2’β1β3’; DZBB: β3β1β2’β2β1’β3’). While domain-swapping with multiple oligomeric states have been observed in some natural and designed proteins25,26,27, the rearrangement of the β-strands orientation and drastic changes of intra- and inter-chain interactions in the homo-dimeric structure like design-1 is unusual, extending our knowledge of domain-swapping proteins.
Different ligand molecules probably induced the metamorphism in design-1. In the crystal structure of the DPBB-fold, two malonate ions bind to a positively charged pocket around the α1 helix, and are coordinated by Lys24, Arg27, and Lys33 (Fig. 2A). Alternatively, in the crystal structure of design-1 with the DZBB fold, two additional residues, Arg21, and Arg18’ from the other chain, coordinate the two sulfates (Fig. 2B and Supplementary Fig. 8). The sulfate ions may attract these two additional residues in the folding process, and then stabilize the inverted β2-strand in the DZBB-fold.
To test whether these small molecules can facilitate the folding of design-1 in solution, we analyzed its conformational change by monitoring the binding of the fluorescence probe 8-anilino-1-naphthalenesulfonic acid (ANS). ANS typically binds to a hydrophobic patches of the folding intermediates and molten globules, which changes its fluorescence spectrum. A low concentration of malonates (50–500 mM) did not alter the ANS fluorescence spectra (Fig. 2C). However, the fluorescence signal was increased at over 1,000 mM of malonates, and its peak was blue-shifted (Fig. 2C), implying that high concentrations of malonate induce at least partial folding of design-1. We also found that high concentrations of sulfate ion (≥1500 mM) increased ANS fluorescence (Fig. 2D). Its spectral pattern was slightly different compared to malonates, perhaps due to the difference in the DPBB- and DZBB-folds. The changes in the CD spectrum patterns depending on the sulfate ions were also observed (Supplementary Fig. 9A). These experiments demonstrated that the small ions mediate the folding of design-1, and their types may lead to two different structures (Fig. 2E).
Interestingly, ANS experiments showed that similar ions, phosphate, malic acid, and citrate, also promoted conformational changes (Supplementary note 2 and Supplementary Fig. 10). As these anionic ions probably existed in the primordial cells and on the early Earth28,29,30,31, they might have served as chemical chaperones to enhance the folding of ancient proteins and could have compensated for the low stabilities of evolutionary intermediates during the folding transition (Supplementary note 3 and Supplementary Fig. 7–10).
The transformation from DZBB to the monomeric OB-fold
In the homo-dimeric DPBB and RIFT structures, the β-strands from a monomer are mostly interlaced with the β-strands from the other chain, and thus these peptides can only fold as homo-dimers (Fig. 1B, E). In contrast, in the structure of DZBB, the secondary structure elements from a single subunit are mostly clustered together as in a monomeric protein, except for the swapped β2 strands (Fig. 1C, D). Surprisingly, a structural similarity search using the DALI software32 detected high correspondence between the monomeric part of DZBB and OB-fold proteins. In the superimposition of the DZBB and OB proteins, the β1, β2, and β3 strands of DZBB are well aligned with the β1, β3, and β4 strands of the OB-fold protein, respectively (Fig. 3A). In addition, their sequences are partially similar (Supplementary Fig. 11A). The only significant differences between their structures are the presence and absence of a few secondary structural elements (Fig. 3B). The DZBB monomer lacks two β-strands in the OB-fold (β2 and β5), and while the β2 strand is conserved within OB-fold proteins, the β5 strand is poorly conserved or even absent in some OB proteins; e.g., ribosomal protein L2. Helix α1 of the OB-fold (corresponding to α1 of DZBB) is also missing in some OB proteins (e.g., rL2, S17, and S28). Thus, the original OB-fold has been considered to be a four-stranded β-barrel, of which only β2 is absent in DZBB10.
To demonstrate the hypothesized interconversion between the DZBB-fold and the OB-fold, we created their chimeric proteins by combining the design-1 and OB proteins. The sequences of the OB domains in the rL2 proteins from thermophilic archaea, Thermococcus kodakarensis and Methanopyrus kandleri, resemble that of design-1, while helix α1 is absent in the rL2 proteins. In particular, the OB-domain of rL2 from M. kandleri exhibited a high similarity with design-1 (identity 25%, Supplementary Fig. 11A). Given that design-1 was originally constructed from the DPBB protein from M. kandleri7, the genome of this archaeon might still preserve the evolutionary information of ancient proteins. The sequence region surrounding β2-β3 of the OB-domain in rL2 was incorporated into the corresponding position of design-1, based on the superimposed structures (Fig. 3A). Through this process, we constructed six unique variations by modifying the positions and extents of insertion (design-6–11; Supplementary Fig. 11B and Supplementary Table 1). Biochemical and crystallographic analyses revealed that two of the six chimera proteins, design-6 (tkoL2_v1) and design-9 (mkaL2_v1) (Fig. 3C), adopt four-stranded OB-folds (Fig. 3D, Supplementary Fig. 5C and 12). The DZBB and OB fragments comprise approximately 40 and 60% of these chimeric proteins, respectively. Therefore, the monomeric OB-fold could be reconstructed by simply combining the DZBB and OB-fold protein segments, without optimizing the structural interfaces between both parts.
To examine what determines the DZBB–OB transition, additional intermediates have been engineered by sequential mutagenesis and experimental validation steps (design-12–17; Supplementary note 4, Supplementary Fig. 5D, 13A, and Supplementary Table 1). The 2nd generation mutants, design-13 (tkoL2_v1.2) and design-16 (mkaL2_v1.2), have ~60% sequence identities with design-2 (DZBB-fold), and retained moderate thermostability (Fig. 3C and Supplementary Fig. 14). In these designs, only a short segment was from rL2 (segment 1: from the middle of β1 to the start of β3), and the other parts are from design-2 (Fig. 3C and Supplementary Fig. 13A). Helix α1 (segment 2) was also omitted. The crystallographic analysis revealed that design-13 still adopts the four-stranded OB fold (Fig. 3E), indicating that segment 1, but not segment 2, could serve as the determinant for the fold transition between the DZBB and OB structures. To test this assumption, we conducted the reverse engineering by replacing segment 1 of design-13 and design-16 with the 8-amino acid sequence forming the Z-loop of design-2 (the 3rd generation mutants: design-18 (tkoL2_v1.2_Z) and design-19 (mkaL2_v1.2_Z))(Fig. 3C, Supplementary Fig. 13B, Supplementary Table 1). SEC and CD experiments demonstrated that both mutants folded and had moderate thermal stabilities (Supplementary Fig. 15). The crystal structures of design-18 and design-19 revealed that they adopt the DZBB fold even without the α1 region (segment 2) (Fig. 3F and Supplementary Fig. 5E). These results demonstrated that the segment 1 is sufficient to archive the fold-change from DZBB to OB.
Furthermore, we verified that the 13 a.a. sequence forming the flexible β-turn in segment 1 in the OB-fold designs could be shortened to 7 a.a. (Supplementary note 4, Supplementary Figs. 16–18, and Supplementary Table S1). Taken together, very short In/Del and a few point mutations are the determinants in the transition between the DZBB and OB folds. This facile interchangeability between DZBB and OB folds suggests that such a drastic fold transition likely occurred in the early evolutionary stage of life.
Transformation from OB to SH3
Given the high similarity between the OB and SH3 folds, they are presumed to have evolved from a common ancestral protein (Supplementary Fig. 1)9,10. Loren Williams’ group suggested that the four-stranded core fold of OB could have transformed to SH3 by a simple circular permutation (or vice versa)10. Following this evolutionary hypothesis, we tried to convert the reconstructed OB proteins to the SH3 fold. The fourth strand of design-6 and design-9 was trimmed and connected to the N-terminal end by two residues “DG,” an ideal sequence to form a β-turn20 (design-23: tkoL2_v1_SH3; design-24: mkaL2_v1_SH3)(Supplementary Table 1). While design-23 had a random-coil structure, design-24 exhibited a CD spectrum for a folded protein and remained almost intact even at 90 °C (Supplementary Fig. 19). We also determined its crystal structure and confirmed that design-24 adopted the four-stranded SH3 fold (Fig. 3G). This experimental conversion from OB to SH3 strongly supports the previous hypothesis that these folds could have emerged by a simple permutation of their four-stranded core fold10.
DNA binding abilities of the reconstructed β-barrels
As most β-barrels in the central dogma machinery function by interacting with nucleic acids, we investigated the DNA or RNA binding capabilities of the reconstructed β-barrels by an electrophoresis mobility shift assay (EMSA) (Fig. 4 and Supplementary Fig. 20). When mixing the stable DPBB protein (design-0) and the 20 bp double-stranded DNA (dsDNA), some portion of dsDNA was slowed and stacked in the well as previously reported7. Most DNA molecules did not migrate from the well when mixed with the proteins with the stable DZBB fold (design-2, -18, and -19) and RIFT fold (design-3 and -4), indicating that these proteins formed large aggregates with dsDNA (Fig. 4A). In particular, design-2, -18, -19, -3, and -4 interacted with dsDNA even in high salt conditions (500 mM NaCl) (Supplementary Fig. 20B). We also performed the EMSA experiment with varied concentrations of design-2 and design-3 (Supplementary Fig. 20C and D). The mobilities of the dsDNA fragments were gradually slowed down as the protein concentration increased. Design-2 seems to have a higher affinity for dsDNA than design-3, as dsDNA shifted even at lower concentrations of design-2. Furthermore, no significant sequence specificity was observed when we tested another 20 bp dsDNA (Supplementary Fig. 20E). Design-2, -18, -19, -3, and -4 might interact with the phosphate groups in the DNA backbone in a similar way to the sulfate ions in the crystal structures of DZBB (Fig. 2B). The proteins with the DPBB, DZBB, and RIFT folds also interacted with ssDNA and ssRNA (Supplementary Fig. 20F and G). These findings suggest that, like their modern descendants, the ancient DPBB, DZBB, and RIFT proteins also interacted with nucleic acid polymers.
In contrast, the reconstructed OB and SH3 proteins did not interact significantly with any oligonucleotides (Fig. 4B and Supplementary Fig. 20H–J). Only high concentrations of design-13 and -16 retarded the migration of dsDNA slightly (Supplementary Fig. 20K). The acquisition of the weak DNA binding affinity of these two proteins may have resulted from additional lysine residues at the end of β3 and the 3rd loop, compared to other OB-fold chimeras (Fig. 3C). While the typical modern OB-fold proteins interact with an oligonucleotide at the surface of β1–3 (β2–4 in the SH3-fold proteins)33, the corresponding region of ribosomal protein L2 used in this report (β2–3) does not directly interact with rRNA in the ribosome, likely resulting in the weak affinity of the reconstructed OB and SH3 proteins.
Discussion
The fold conversion over evolutionary history is usually difficult to verify due to large diversities in sequences and structures between proteins with different folds. Still, some fold transition events have been detected by meticulous statistical methods34,35. Recent protein engineering endeavors also demonstrated that the relatively small number of point mutations foster the fold conversion, implying close relationships between distinct folds36,37. However, it might sometimes be misleading to decipher ancient protein evolution by referring only to the protein folds found today. In our study, a missing-link protein fold, DZBB, which is not found in modern proteins, offered a simple explanation for the evolutionary relationship between diverse β-barrel folds.
We demonstrated that the short and simple peptide, design-1, adopts to not only the homo-dimeric DPBB fold but also another homo-dimeric β-barrel fold, DZBB, like a metamorphic protein (Figs. 1, 2). It should be emphasized that the DZBB structure could not be predicted by AlphaFold2. It is still challenging for the state-of-the-art AI to predict metamorphic structures and protein folds outside of the program training set38,39. The comparison of the predicted model and crystal structures is discussed in detail in Supplementary note 1.
From the DZBB fold, the evolutionary pathway among distinct β-barrel folds, RIFT and OB, could be reconstructed by simple and feasible mutation steps. A single deletion in the DZBB sequence converted it into the RIFT fold (Fig. 1). In contrast, the single insertion of a short sequence forming a β-strand and a few point mutations converted it to an OB fold, accompanied by the oligomeric state change from dimer to monomer (Fig. 3). Furthermore, the reconstructed OB fold could also be converted to the four-stranded SH3 fold by a simple circular permutation (Fig. 3G). Thus, these ancient β-barrel folds (DPBB, RIFT, OB, and SH3) could be readily interconverted by a few feasible mutations through the missing-link fold, DZBB (Fig. 5), indicating their significantly close evolutionary relationships (Supplementary note 5).
This also implies that the diverse β-barrel folds can be produced from a limited sequence space. In an evolutionary time scale, the variety of ancient β-barrel folds might have diverged in a very short period, like the rapid diversification of animal species in the Cambrian. This rapid diversification of the various β-barrel folds probably preceded and primed the subsequent development of the elaborate molecular machines underlying the central dogma40,41,42,43,44 (Fig. 5). Moreover, the reconstructed β-barrels (except for the mutant with the SH3 fold) retained the DNA and RNA binding affinities (Fig. 4 and Supplementary Fig. 20). Thus, during the diversification process of these folds, the fundamental nucleic-acid-binding property might have been inherited by the daughter folds, which then became specialized to the specific substrates and enzymatic reactions by stepwise mutations in each lineage.
Then, why was the DZBB fold lost in modern life? Interestingly, all modern DPBB and RIFT barrels exist as single-chain, pseudo-symmetric proteins (except for the related eight-stranded barrel6). Their ancient genes would have duplicated and fused tandemly to form monomers. The structures of homo-dimeric DPBB and RIFT barrels show that the N-terminal end of one monomer is close to the C-terminal end of the other chain (Fig. 1B, E). This spatial arrangement could have readily allowed the polypeptide to fuse to the other chain without any dynamic fold rearrangement. Being single polypeptides, DPBB and RIFT could also have acquired more elaborate functions, performed by asymmetrically optimized residues and additional domains specifically linked to their N- and C-termini. In contrast, the N-terminal end of one chain and the C-terminal end of the other in the homo-dimeric DZBB are distant (Fig. 1C, D), making it impossible to integrate two short polypeptides into a monomer through a simple gene fusion event. Because of this limited evolvability, the DZBB fold may have been outcompeted during the early evolution of life. The evolutionary pathways depicted here with the lost fold provide the groundwork for more detailed and broader studies of early protein evolution and the origin of the central dogma.
Methods
Construction of expression vector
The synthetic DNAs encoding the protein mutants used in this study, except for the design-3, -4, and -5 (Ph1_DG, Ph1_GG, and Ph1_GD), were purchased from Thermo Fisher Scientific and Integrated DNA Technologies and were amplified by PCR with the cloning_upstream and cloning_downstream primers (Eurofins Genomics; Supplementary Table 2). The genes encoding the design-3, -4, and -5 were constructed by the Splicing by Overlap Extension method with the mutant primers (Eurofins Genomics; Supplementary Table 2)45. The linear pET47b DNA was also amplified with pET47b_up and pET47b_down primers (Eurofins Genomics; Supplementary Table 2). Each PCR product was cloned into the pET47b vector for fusion with an N-terminal His-tag, by using an In-Fusion HD cloning kit (Clontech). The Escherichia coli DH5a strain was then transformed with the produced vectors. The resultant transformants were cultured on LB plates supplemented with 20 µg/mL kanamycin (37 °C, overnight), and then used to inoculate to LB liquid medium and grown at 37 °C overnight. Each plasmid was extracted from the cells using a QIAprep Spin Miniprep Kit (QIAGEN). The sequences of the inserted genes were confirmed by Sanger sequencing.
Protein expression and purification
E. coli BL21 Gold (DE3) cells (Agilent Technologies, CA) were transformed with the vectors harboring the genes of the respective protein mutants. The resulting transformants were cultured in 20 mL of LB medium supplemented with 20 µg/mL kanamycin (37 °C, overnight) and then inoculated into 2 L of LB medium (20 µg/mL kanamycin). After culturing the cells at 37 °C for two hours, 0.5 mM isopropyl b-D-1-thiogalactopyranoside (IPTG) was added to induce expression of the desired protein. The culture was continued under the same conditions for 4 hours. The cells were then harvested and stored at -20 °C.
The bacteria were disrupted by sonication in 60 mL of 50 mM potassium phosphate buffer, pH 6.5, and 150 mM NaCl. The lysate was centrifuged (4 °C, 11,000 x g, 20 min). The supernatant was filtered (0.45 µm pore-size) and then purified by HisTrap HP nickel affinity chromatography (GE Healthcare, IL). The N-terminal His6-tags were cleaved with HRV-3c protease (Funakoshi, Japan) at 4 °C for 1–2 days. The treated samples were again loaded onto the HisTrap column, and the flow-through fraction was recovered. The protein solutions were then loaded onto a HiLoad 16/600 Superdex 75 (GE Healthcare, IL) size exclusion chromatography column, equilibrated with 50 mM potassium phosphate buffer, pH 6.0, 150 mM NaCl. The purity of each protein sample was verified by SDS-PAGE. Because all of the designed proteins in this study lack tyrosine and tryptophan residues, the protein concentrations were determined by a BCA assay (Thermo Fisher Scientific), in which design-0 (mk2h_ΔMILPS) containing three tyrosine residues was used as the standard protein. The concentration of design-0 was determined by its absorbance at 280 nm.
Biophysical characterization
The protein samples were prepared in 50 mM potassium phosphate buffer, pH 6.0, and 150 mM NaCl. Each protein’s circular dichroism (CD) spectra were recorded from 200 to 250 nm at 20 °C, using a 0.1 cm path-length cell and a JASCO J820 circular dichroism spectrometer (JASCO, Japan). The protein concentrations were adjusted to approximately 20 µM. However, because the CD spectra of some mutants showed high tension (HT) voltages at the shorter wavelengths, their protein concentrations were diluted to 13–5 µM to reduce the HT voltages. The proteins with β-structures exhibit a variety of spectra patterns due to their structural diversity (e.g., parallel, left-twisted anti-parallel, and right-twisted anti-parallel arrangements)46. Still, they are well distinguishable from α-proteins and random coils.
To measure each protein’s thermal stability, the ellipticity changes at 208 or 222 nm were monitored as the temperature was increased from 20 to 90 °C at a rate of 1.0 °C/min. The CD spectra were obtained at 20, 30, 40, 50, 60, 70, 80, and 90 °C. After cooling from 90 °C, the spectra at 20 °C were recorded again to verify the refolding ability.
We performed size exclusion chromatography to examine the protein foldability. Each purified protein (100 µL, 20 µM) was loaded onto a Superdex 75 increase 10/300 (GE Healthcare, IL) size exclusion column, equilibrated with 50 mM potassium phosphate buffer, pH 6.0, 150 mM NaCl, and run on an AKTA FPLC (Amersham Biosciences) at a flow rate of 0.75 mL/min.
ANS fluorescence measurement
Design-1 (mk2h_ΔMILPYS) was diluted to 1 µM in the solutions containing various concentrations of malonates or ammonium sulfates (50, 100, 500, 1000, 1500, and 2000 mM). The pH values of the samples, including malonate or ammonium sulfate were adjusted to 7.0 and 6.0, respectively. Solutions of 2,000 mM potassium/sodium phosphate (pH 6.0), citrate (pH 7.0), acetate (pH 7.0), formate (pH 7.0), or glycine (pH 7.0) were also examined. The fluorescence probe 8-anilino-1-naphthalenesulfonic acid (ANS) was added (50 µM) and then the solution was placed in the dark for 30 min at room temperature. Using FP-8500DS fluorescence spectrometry (JASCO, Japan), the fluorescence spectra ranging from 400 to 650 nm were recorded with excitation at 380 nm.
Crystallography
Before crystallization screening, we dialyzed all purified protein solutions against 20 mM Bis-tris HCl, pH 6.0, and 150 mM NaCl. The samples were then concentrated to 3–82 mg/mL. To screen the crystallization conditions for each protein, 96-well sitting-drop vapor-diffusion plates, and Wizard 1&2, Wizard 3&4, (Molecular Dimensions, United Kingdom), and Index HT (Hampton Research, CA) crystallization solutions were used. For crystallization, 0.2 µL of each protein was mixed in a 1:1 ratio with reservoir solutions and incubated at 20 °C. The conditions in which each protein formed crystals are listed in Supplementary Table 4. The cryo-protectant solutions were prepared with the reservoir solutions supplemented with 10–20% glycerol (Supplementary Table 4).
The X-ray diffraction data were collected at the Photon Factory47,48 (Tsukuba, Japan), SPring-849,50,51,52 (Harima, Japan), or Swiss Light Source (Villigen, Switzerland). The beamlines are listed in Supplementary Table 4. The XDS program was used for the initial processing of diffraction data53. All crystal structures were solved by the molecular replacement method and refined with the program PHENIX54,55. The initial structure models for each mutant were determined by the MR phasing method, using phenix.phaser-MR. The model structures were updated manually using Coot and iteratively refined with Phenix.refine56. Statistics for diffraction data collection and refinement are summarized in Supplementary Table 5. 2Fo−Fc electron density maps of each crystal structures are shown in Supplementary Fig. 21.
Electrophoresis mobility shift assay (EMSA)
For the EMSA, 5 µM of protein was mixed with 10 nM of FAM-labeled oligonucleotides (20 mM Tris-HCl, pH 8.0, 50 mM NaCl). The sequences of the DNA and RNA are shown in Supplementary Fig. 20A. The protein and oligo-nucleotide mixtures were incubated in the dark at room temperature for 10 min. After adding the loading dye (30% glycerol and bromophenol blue), the samples were fractionated on a 2% agarose gel (0.5 x TBE buffer). After electrophoresis, the DNA or RNA bands in the gel were imaged with an Amersham Typhoon scanner (GE Healthcare).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The atomic coordinate files are available in PDB. The accession codes are: 8JVN and 8JVO (design-1), 8JVP (design-2), 8JVQ (design-3), 8JVR (design-4), 8JVT (design-6), 8JVS (design-9), 8JVU (design-13), 8JVV (design-15), 8JVW (design-18), 8JVX (design-19), 8JVY (design-20), 8JVZ (design-24). The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. Source data are provided with this paper.
References
Burton, Z. F., Opron, K., Wei, G. & Geiger, J. H. A model for genesis of transcription systems. Transcription 7, 1–13 (2016).
Lane, W. J. & Darst, S. A. Molecular evolution of multisubunit RNA polymerases: structural analysis. J. Mol. Biol. 395, 686–704 (2010).
Sauguet, L., Raia, P., Henneke, G. & Delarue, M. Shared active site architecture between archaeal PolD and multi-subunit RNA polymerases revealed by X-ray crystallography. Nat. Commun. 7, 12227 (2016).
Fouqueau, T., Blombach, F. & Werner, F. Evolutionary Origins of Two-Barrel RNA Polymerases and Site-Specific Transcription Initiation. Annu. Rev. Microbiol. 71, 331–348 (2017).
Coles, M. et al. Common evolutionary origin of swapped-hairpin and double-psi beta barrels. Structure 14, 1489–1498 (2006).
Alva, V., Koretke, K. K., Coles, M. & Lupas, A. N. Cradle-loop barrels and the concept of metafolds in protein classification by natural descent. Curr. Opin. Struct. Biol. 18, 358–365 (2008).
Yagi, S. et al. Seven Amino Acid Types Suffice to Create the Core Fold of RNA Polymerase. J. Am. Chem. Soc. 143, 15998–16006 (2021).
Nakagawa, A. et al. The three-dimensional structure of the RNA-binding domain of ribosomal protein L2; a protein at the peptidyl transferase center of the ribosome. EMBO J. 18, 1459–1467 (1999).
Agrawal, V. & Kishan, R. K. Functional evolution of two subtly different (similar) folds. BMC Struct. Biol. 1, 5 (2001).
Alvarez-Carreno, C., Penev, P. I., Petrov, A. S. & Williams, L. D. Fold Evolution before LUCA: Common Ancestry of SH3 Domains and OB Domains. Evolution 38, 5134–5143.
Bowman, J. C., Petrov, A. S., Frenkel-Pinter, M., Penev, P. I. & Williams, L. D. Root of the Tree: The Significance, Evolution, and Origins of the Ribosome. Chem. Rev. 120, 4848–4878 (2020).
Fox, N. K., Brenner, S. E. & Chandonia, J.-M. SCOPe: Structural Classification of Proteins–extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res 42, D304–D309 (2014).
Andreeva, A., Howorth, D., Chothia, C., Kulesha, E. & Murzin, A. G. SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res 42, D310–D314 (2014).
Sillitoe, I. et al. CATH: expanding the horizons of structure-based functional annotations for genome sequences. Nucleic Acids Res 47, D280–D284 (2019).
Cheng, H. et al. ECOD: an evolutionary classification of protein domains. PLoS Comput. Biol. 10, e1003926 (2014).
Kolodny, R. Searching protein space for ancient sub-domain segments. Curr. Opin. Struct. Biol. 68, 105–112 (2021).
Alvarez-Carreño, C., Gupta, R. J., Petrov, A. S. & Williams, L. D. Creative destruction: New protein folds from old. Proc. Natl Acad. Sci. Usa. 119, e2207897119 (2022).
Tagami, S. Why we are made of proteins and nucleic acids: Structural biology views on extraterrestrial life. Biophys. Physicobiol. 20, e200026 (2023).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Lin, Y.-R. et al. Control over overall shape and size in de novo designed proteins. Proc. Natl Acad. Sci. Usa. 112, E5478–E5485 (2015).
Murzin, A. G. Biochemistry. Metamorphic proteins. Science 320, 1725–1726 (2008).
Dishman, A. F. & Volkman, B. F. Unfolding the Mysteries of Protein Metamorphosis. ACS Chem. Biol. 13, 1438–1446 (2018).
Kim, A. K. & Porter, L. L. Functional and Regulatory Roles of Fold-Switching Proteins. Structure 29, 6–14 (2021).
Yadid, I., Kirshenbaum, N., Sharon, M., Dym, O. & Tawfik, D. S. Metamorphic proteins mediate evolutionary transitions of structure. Proc. Natl Acad. Sci. Usa. 107, 7287–7292 (2010).
Koharudin, L. M. I., Liu, L. & Gronenborn, A. M. Different 3D domain-swapped oligomeric cyanovirin-N structures suggest trapped folding intermediates. Proc. Natl Acad. Sci. Usa. 110, 7702–7707 (2013).
Bennett, M. J., Schlunegger, M. P. & Eisenberg, D. 3D domain swapping: a mechanism for oligomer assembly. Protein Sci. 4, 2455–2468 (1995).
López-Pelegrín, M. et al. Multiple stable conformations account for reversible concentration-dependent oligomerization and autoinhibition of a metamorphic metallopeptidase. Angew. Chem. Int. Ed. Engl. 53, 10624–10630 (2014).
Muchowska, K. B., Chevallot-Beroux, E. & Moran, J. Recreating ancient metabolic pathways before enzymes. Bioorg. Med. Chem. 27, 2292–2297 (2019).
Springsteen, G., Yerabolu, J. R., Nelson, J., Rhea, C. J. & Krishnamurthy, R. Linked cycles of oxidative decarboxylation of glyoxylate as protometabolic analogs of the citric acid cycle. Nat. Commun. 9, 91 (2018).
Liu, Z. et al. Prebiotic photoredox synthesis from carbon dioxide and sulfite. Nat. Chem. 13, 1126–1132 (2021).
Todd, Z. R. Sources of Nitrogen-, Sulfur-, and Phosphorus-Containing Feedstocks for Prebiotic Chemistry in the Planetary Environment. Life 12, 1268 (2022).
Holm, L. Dali server: structural unification of protein families. Nucleic Acids Res 50, W210–W215 (2022).
Draper, D. E. & Reynaldo, L. P. RNA binding strategies of ribosomal proteins. Nucleic Acids Res 27, 381–388 (1999).
Farías-Rico, J. A., Schmidt, S. & Höcker, B. Evolutionary relationship of two ancient protein superfolds. Nat. Chem. Biol. 10, 710–715 (2014).
Chakravarty, D., Sreenivasan, S., Swint-Kruse, L. & Porter, L. L. Identification of a covert evolutionary pathway between two protein folds. Nat. Commun. 14, 3177 (2023).
Ruan, B. et al. Design and characterization of a protein fold switching network. Nat. Commun. 14, 431 (2023).
Solomon, T. L. et al. Reversible switching between two common protein folds in a designed system using only temperature. Proc. Natl Acad. Sci. Usa. 120, e2215418120 (2023).
Chakravarty, D. & Porter, L. L. AlphaFold2 fails to predict protein fold switching. Protein Sci. 31, e4353 (2022).
Chakravarty, D., Schafer, J. W., Chen, E. A., Thole, J. R. & Porter, L. L. AlphaFold2 has more to learn about protein energy landscapes. bioRxiv https://doi.org/10.1101/2023.12.12.571380 (2023).
Theobald, D. L., Mitton-Fry, R. M. & Wuttke, D. S. Nucleic acid recognition by OB-fold proteins. Annu. Rev. Biophys. Biomol. Struct. 32, 115–133 (2003).
Törö, I. et al. RNA binding in an Sm core domain: X-ray structure and functional analysis of an archaeal Sm protein complex. EMBO J. 20, 2293–2303 (2001).
Maksimova, E., Kravchenko, O., Korepanov, A. & Stolboushkina, E. Protein Assistants of Small Ribosomal Subunit Biogenesis in Bacteria. Microorganisms 10, 747 (2022).
Shin, D. S., Pratt, A. J. & Tainer, J. A. Archaeal genome guardians give insights into eukaryotic DNA replication and damage response proteins. Archaea 2014, 206735 (2014).
Reeve, J. N. Archaeal chromatin and transcription. Mol. Microbiol. 48, 587–598 (2003).
Horton, R. M. et al. [17]Gene splicing by overlap extension. in Methods in Enzymology 217 270–279 (Elsevier, San Diego, CA, 1993).
Micsonai, A. et al. Accurate secondary structure prediction and fold recognition for circular dichroism spectroscopy. Proc. Natl Acad. Sci. Usa. 112, E3095–E3103 (2015).
Yamada, Y., Matsugaki, N., Chavas, L. M. G., Hiraki, M. & Wakatsuki, S. Data Management System at the Photon Factory Macromolecular Crystallography Beamline. J. Phys. Conf. Ser. 425, 012017 (2013).
Hiraki, M., Yamada, Y., Chavas, L. M. G., Wakatsuki, S. & Matsugaki, N. Improvement of an automated protein crystal exchange system PAM for high-throughput data collection. J. Synchrotron Radiat. 20, 890–893 (2013).
Okazaki, N. et al. Mail-in data collection at SPring-8 protein crystallography beamlines. J. Synchrotron Radiat. 15, 288–291 (2008).
Ito, S., Ueno, G. & Yamamoto, M. DeepCentering: fully automated crystal centering using deep learning for macromolecular crystallography. J. Synchrotron Radiat. 26, 1361–1366 (2019).
Hirata, K. et al. ZOO: an automatic data-collection system for high-throughput structure analysis in protein microcrystallography. Acta Crystallogr D. Struct. Biol. 75, 138–150 (2019).
Nakamura, Y. et al. Computer-controlled liquid-nitrogen drizzling device for removing frost from cryopreserved crystals. Acta Crystallogr. Sect. F. Struct. Biol. Cryst. Commun. 76, 616–622 (2020).
Kabsch, W. XDS. Acta Crystallogr. D. Biol. Crystallogr. 66, 125–132 (2010).
Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D. Biol. Crystallogr. 66, 213–221 (2010).
Terwilliger, T. C. et al. Decision-making in structure solution using Bayesian estimates of map quality: the PHENIX AutoSol wizard. Acta Crystallogr. D. Biol. Crystallogr. 65, 582–601 (2009).
Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D. Biol. Crystallogr. 60, 2126–2132 (2004).
Acknowledgements
This work is based on experiments performed at KEK (project number: 2020G056 and 2022G005), SPring-8, and SLS. The authors are grateful to the beamline staff scientists at KEK, SPring-8, and SLS. We thank Hideaki Niwa, Toshiaki Hosaka, and Kentaro Ihara for assistance with the X-ray diffraction experiments. We also thank Hongding Liu for assistance of the DNA cloning experiment. S.Y. and S.T. were supported by JSPS (18H01328, 20K15854, and 22H01346). S.T. was also supported by the Astrobiology Center Program of National Institutes of Natural Sciences (AB0503).
Author information
Authors and Affiliations
Contributions
S.Y. and S.T. conceived and designed the experiments. S.Y. performed all designs and experiments. S.Y. and S.T. performed the crystallographic analysis. All authors discussed the results and jointly wrote and commented on the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Lauren Porter and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yagi, S., Tagami, S. An ancestral fold reveals the evolutionary link between RNA polymerase and ribosomal proteins. Nat Commun 15, 5938 (2024). https://doi.org/10.1038/s41467-024-50013-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-024-50013-9
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.