Unraveling RNA dynamical behavior of TPP riboswitches: a comparison between Escherichia coli and Arabidopsis thaliana

Riboswitches are RNA sensors that affect post-transcriptional processes through their ability to bind to small molecules. Thiamine pyrophosphate (TPP) riboswitch class is the most widespread riboswitch occurring in all three domains of life. Even though it controls different genes involved in the synthesis or transport of thiamine and its phosphorylated derivatives in bacteria, archaea, fungi, and plants, the TPP aptamer has a conserved structure. In this study, we aimed at understanding differences in the structural dynamics of TPP riboswitches from Escherichia coli and Arabidopsis thaliana, based on their crystallographic structures (TPPswec and TPPswat, respectively) and dynamics in aqueous solution, both in apo and holo states. A combination of Molecular Dynamics Simulations and Network Analysis empowered to find out slight differences in the dynamical behavior of TPP riboswitches, although relevant for their dynamics in bacteria and plants species. Our results suggest that distinct interactions in the microenvironment surrounding nucleotide U36 of TPPswec (and U35 in TPPswat) are related to different responses to TPP. The network analysis showed that minor structural differences in the aptamer enable enhanced intramolecular communication in the presence of TPP in TPPswec, but not in TPPswat. TPP riboswitches of plants present subtler and slower regulation mechanisms than bacteria do.

www.nature.com/scientificreports www.nature.com/scientificreports/ conserved between species, although minor differences are observed mainly concerning the content of base pairings, being TPPsw ec two pairings longer than TPPsw at .
Both prokaryotic and eukaryotic TPP aptamers share a notable common 3D structure and organization. The superposition between TPPsw ec and TPPsw at crystal structures resulted in heavy atoms root-mean-square deviation (RMSD) of approximately 0.63 Å (Fig. 2C,D). The insertion and relative position of TPP is also similar presenting a 0.69 Å RMSD. The aptamer consists of a switch helix (P1) and two sensor arms (P2/P3 and P4/P5), forming a long-range tertiary rearrangement capable of stabilizing the interaction between the L5 loop and the P3 stem ( Fig. 2A,B). In both crystal structures, the P2/P3 arm helps accommodate the TPP pyrimidine ring while residues from the P4/P5 arm interact with the TPP pyrophosphate group (Fig. 2E,F).
Global and local stability of the aptamer domain. The structural stability of riboswitches in aqueous solution, in both apo and holo configurations, was evaluated by comparing the average RMSD (Table 1 and Supplementary Fig. S1) and root-mean-square fluctuation (RMSF) (Fig. 3) values calculated over the trajectory production stage, taking as references the structures obtained after equilibration.
During Molecular Dynamics (MD) simulations, both apo and holo forms presented deviations around 3 Å. No noticeable differences have been observed between the RMSD values obtained for both TPPsw ec and TPPsw at upon TPP complexation. Indeed, both average RMSD values per region and the overall pattern of fluctuations  (57) and A60(58) from loop L5 connect with residues G27 (25) and C28 (26) from the P3 minor groove. (E,F) The aminopyrimidine ring of TPP formed hydrogen bonds with G30(28) and the 2-OH′ of G9 (11). Direct contacts to nonbridging oxygens of β-phosphate of TPP were also established via N4 of C67(65) and N1 of G68(66). All other pyrophosphate-RNA contacts were mediated through two Mg 2+ ions (colored in grey).
were similar for all systems. These values are consistent with RMSD values seen in other studies reporting simulations of small RNA aptamers [21][22][23][24] which ranged between 3-4 Å approximately.
The inspection at some particular regions of the RNA aptamer evidenced slight differences in their dynamical behavior. For example, the P1 region presented higher RMSD values than that of all other substructures in both species. Also, this region was more flexible in the holo forms. It is worth mentioning that the nucleotides composing the P1 helix are located in the terminal regions of the aptamer, showing broader movements during MD simulations. Unlike the P1 segment, the P3 helix displayed higher RMSD values in the apo forms. In addition, the comparison of the fluctuations between apo and holo riboswitches evidenced that TPP binding resulted in the stabilization of the P3 helix but contributed at the same time to the disruption of the P1 helix, according to the RMSF analysis (Fig. 3). Interestingly, the nucleotide U36, located in the P2 helix of TPPsw ec , displayed more significant fluctuations in the holo form than in the apo form (holo: 5.4 Å; apo: 2.3 Å) (Fig. 3). No significant changes were observed in the P2 substructure in apo and holo form of TPPsw at .
Monitoring P3-L5 interaction. Despite being remotely located from the TPP binding site, the interaction between P3-L5 substructures is essential for metabolite binding 25,26 . The conformation of the P3-L5 region is kept in formation via two non-Watson-Crick (WC) base pairs. Nucleotides A59(57) and A60(58) from loop L5 connect with residues G27(25) and C28(26) from the P3 minor groove ( Fig. 2A,B) 27 . Numbers in parenthesis indicate the nucleotide position in the TPPsw at systems.
By monitoring the formation of non-WC hydrogen bonds between P3 and L5 we observed that, notably, the G27(25)·A59(57) interaction in apo TPPsw ec was the less stable with 68% occupancy during the simulation (Table 2). Upon TPP binding, the occupancy increased to 74% while no corresponding trend was observed upon comparison of the apo and holo TPPsw at systems (presenting ~81% occupancy in both cases). Also, the C28(26)·A60(58) was more stable and presented occupancies higher than 84% in all systems. In addition, the base pair G27·A59 of TPPsw ec appears to be influenced by the presence of the ligand, which contributed to its stabilization.
TPP-RNA interaction. To verify the stability of TPP-RNA complexes during MD simulations, we calculated the H-bonds occupancy and the average distances between pairs of atoms involved in key interactions (Table 3).   www.nature.com/scientificreports www.nature.com/scientificreports/ We confirmed the existence of similar interactions in all systems. The aminopyrimidine ring of TPP formed hydrogen bonds with N2 and N3 of G30(28) and the 2-OH′ of G9 (11). Direct contacts to non-bridging oxygens of TPP β-phosphate were also formed via N4 of C67(65) and N1 of G68(66). All other pyrophosphate-RNA contacts were mediated through the two Mg 2+ ions 26,27 (Fig. 4).
Analysis of the distributions of the total number of RNA-TPP hydrogen bonds revealed that most TPPsw ec formed more interactions than TPPsw at ( Supplementary Fig. S2). The number of sampled structures presenting four or more hydrogen bonds was consistently higher in the TPPsw ec system than in TPPsw at . The occupancy of the two hydrogen bonds formed between G30(28) and TPP was similar for both systems. Curiously, the occupancy of G9 (11) for TPPsw at was less than a half (26%) than for TPPsw ec (56%).
Each TPPsw ec nucleotide involved in pyrophosphate recognition formed more than one hydrogen bond because they interacted with other oxygen atoms from TPP, being C67·TPP-O1 and G68·TPP-O5 examples of these interactions, with occupancies of 41% and 30%, respectively (Table 3).

Principal component analysis suggests different dominant motions in apo and holo states.
To identify statistically relevant motions of TPP riboswitches in solution, we performed principal component analysis (PCA) on the snapshots obtained from the MD trajectories. Overall, the first three components (named PC1, PC2, and PC3) captured the dominant motions, presenting the highest contributions to total fluctuations. The first three PCs accounted for 54.9% and 52.6% of the overall variance in apo TPPsw ec and TPPsw at , respectively. In the holo states, the contribution of the first three PCs was slightly higher: 61.7% and 58.7% for TPPsw ec and TPPsw at , respectively.
We compared the projections of the trajectories onto the subspace spanned by the first three principal components. RMSFs and structural projections along the three PCs showed that the four systems displayed a uniform and overlapping PC subspace (Fig. 5).
Inspection of the atomic fluctuations along PC1 revealed substantial higher flexibility of the P3 helix in the apo systems, confirming its stabilization by ligand binding. However, this region presented higher fluctuations along PC2 in the holo state in the TPPsw ec system, but not in TPPsw at . Therefore, the stabilization of the most statistical relevant motions of P3 helix inducted by ligand binding was undoubtedly more pronounced in TPPswat, as lower fluctuations were noticed along both principal components. Furthermore, TPP binding resulted in the augmented flexibility of the P1 helix. In the holo TPPsw ec , the ligand promoted an increase in the flexibility of nucleotides 34-37 of the P2 helix that was unseen in TPPsw at .   www.nature.com/scientificreports www.nature.com/scientificreports/ Correlation network analysis reveals distinct responses to ligand binding in tppsw ec and tppsw at . We analyzed the correlations between pairs of nucleotides to investigate how TPP binding affects the dynamic couplings in TPPsw ec and TPPsw a . We calculated the dynamic cross-correlation matrices (DCCM) for each simulated system, as described in the methods section. A similar cross-correlation pattern was observed for both systems in their apo states. However, both the extent of regions displaying anticorrelations and their magnitudes were greater in TPPsw ec , mainly in the P3-P4-P5-L5 region (Fig. 6).
It was observed that TPP binding stabilized the P3-L5 interaction in TPPsw ec , as evidenced by weaker anticorrelations, which are likely to be associated with the separation of these regions in the apo state. This result is in line with PCA revealing increased stability of P3 motions in the holo state (Fig. 5). Interestingly, no similar trend was observed in TPPsw at . In TPPsw at system, TPP binding resulted in increased anticorrelations at the P1-P2 helices, indicating a possible destabilization of interactions that ultimately resulted in a larger separation between them. In contrast, the dynamic coupling pattern in this region was not altered in TPPsw ec .
Next, we performed a correlation network analysis by constructing weighted graphs in which each residue was represented by a single node, and the weight of the connection between pairs of nodes was proportional to their respective previously calculated correlation coefficients. To quantify the relative importance of each residue in the network, we computed the betweenness centrality per nucleotide for each simulated system (Fig. 7). This metric is used to identify critical communication nodes over the network. Residues presenting high betweenness values are considered "bottlenecks" of information as they are found in the shortest communication paths 28 .
We calculated the square inner product (SIP) to compare the overall similarity of the betweenness centrality profiles calculated for the apo and holo states. According to this analysis, high SIP values are associated with weak modulation of intramolecular communication as a consequence of the TPP binding. We obtained a higher SIP for TPPsw at (0.89) than for TPPsw ec (0.65), thus reinforcing more noticeable TPP related effects in TPPsw ec systems. In both holo systems, we noted increased centrality values at the P4-P5 helices. Interestingly, the nucleotides G27, C28, and A59 were critical for P3-L5 interaction and displayed higher betweenness values in holo TPPsw ec , indicating that ligand binding favors an efficient communication through these nucleotides (Fig. 7). This feature was not observed in TPPsw at , in which C26 and A58 centrality values were higher in the apo state.
The distribution of node degeneracies obtained for TPPsw ec was narrower at the P3 region in the holo state, indicating that TPP binding restricted the presence of a few residues in the shortest paths. Whereas a larger number of residues are accessed in the apo state, the shortest paths were mostly formed by nucleotides 59-61 in the holo state. In contrast, the distributions obtained for both TPPsw at states were very similar (Fig. 8A). www.nature.com/scientificreports www.nature.com/scientificreports/ We calculated the number of nodes per path to further evaluate and characterize the influence of TPP binding on P3-L5 interactions (Fig. 8B). This analysis was based on the hypothesis that communication involving fewer nodes along the pathway is likely to be more efficient. Indeed, in holo TPPsw ec , the P3-L5 communication required fewer nodes (Fig. 8B). Interestingly, while TPP binding did not modify the global distributions of betweenness centralities obtained for both species (Fig. 8C left boxes), opposing trends were perceived concerning the average betweenness centrality calculated for the residues participating in shortest paths (Fig. 8C right  boxes). However, TPPsw ec TPP binding resulted in increased centralities for the residues involved in shortest paths, leading to a slight decrease in average betweenness in TPPsw at .
To support this analysis, we computed the eigenvector centralities for the overall network and the shortest paths (Fig. 8C lower boxes). Again, TPP binding resulted in higher centrality in the shortest paths only for TPPsw ec . The selection of specific P3 residues imposed by ligand binding in TPPsw ec resulted in a stronger communication along pathways accessing a selection of neighboring residues with high eigenvector centrality. In agreement with our previous analysis (Figs 6-8), a corresponding effect was not observed in TPPsw at , which strongly suggests the weaker influence of TPP for an effective communication. www.nature.com/scientificreports www.nature.com/scientificreports/  www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
Plant and bacterial TPP aptamers share similar core structures and bind to the same ligand. However, minor structural and dynamical differences between them could be found, specially concerning the behavior of P3 helix. Bacteria and archaea commonly have a P3a stem 18 , which is not observed in eukaryotic riboswitches. The eukaryotic P3 stem is significantly variable in length, sequence, and base pairings 19 . Particularly in plants, the length of distal P3 extension varies among TPP aptamer representatives of the same species, as observed in Physcomitrella patens 8 . The P3 distal portion is not required for ligand binding of L5-P3 interaction 5,29 , but might act as an anchor for the aptamer as it was already pointed out by Anthony et al. 30 . Also, the authors claim that the correct folding could help in the competition with other RNA structures with different regulation mechanisms.
However, despite that the P3 stem is significantly variable in length in plants, the TPP aptamer is structurally stable. This stability might lead to slower arm movement than the observed in the helix arm of E. coli TPP riboswitch 31 . Cross-correlation analysis corroborated this hypothesis because stronger negative correlations were noticed in the apo TPPsw ec involving substructures P3-P4-P5-L5 (Fig. 6). Our findings also suggest that communication pathways between P3-L5 may be different in E. coli and A. thaliana. The communication between P3-L5 in TPPsw ec can be very efficient in the holo state, while in TPPsw at the corresponding effect was weakened, thus suggesting a slower response to TPP binding in plants than in bacteria (Fig. 8). www.nature.com/scientificreports www.nature.com/scientificreports/ The A. thaliana crystallographic structure used as starting point for our simulations contains a shortened P3 stem formed by 14 nucleotides. On the other hand, the corresponding structure in E. coli is composed of 18 nucleotides. The P3 helix of TPPsw at , although smaller than the one in TPPsw ec , showed no significant modifications in the presence of the ligand, indicating that the size of P3 can be oblivious to plants and its influence about slow folding can be negligible.
Guedich et al. wondered whether the slow TPPsw at folding would be related to a single nucleotide. The authors concluded that U35, located on the P2 helix, is crucial for shaping a TPP-binding competent riboswitch 32 . In our analysis, the equivalent pyrimidine nucleotide in TPPsw ec is U36 (Fig. 1B). The magnitude of the fluctuations at this position was 2-fold higher in the holo state than in the apo TPPsw ec state (Fig. 3). In contrast, in the TPPsw at system, similar fluctuations were perceived regardless of a ligand binding. Furthermore, PCA data also supported these outcomes by showing that the segment 34-37 of the holo TPPsw ec displayed the most significant motion amplitude along PC1 (Fig. 5).
Grounded on these findings, we hypothesize that different interactions found in the microenvironment surrounding nucleotide U36 of TPPsw ec (and U35 in TPPsw at ) are related to different TPP responses. In TPPsw ec , this nucleotide is neighbored at 3′ by a non-canonical A37-G9 base pair. A similar context is observed for U35 in TPPsw at , which is delimited by a non-canonical G34-G11 base pair but on 5′ instead. Nucleotides G9 and G11 of TPPsw ec and TPPsw at , respectively, form hydrogen bonds with the aminopyrimidine ring of TPP. Interestingly, our simulations have shown that hydrogen bond occupancy between G9(11) and N2 of TPP was less than a half for TPPsw at (25.67%) than for TPPsw ec (55.58%). This suggests that slight differences in the environment may directly interfere the TPP-aptamer interaction stability.
Finally, TPP riboswitches of Arabidopsis thaliana present subtler and slower regulation mechanisms than Escherichia coli [30][31][32] . Here, we have shown through molecular dynamics simulations and networking analysis that minor structural differences in the aptamer enable enhanced intramolecular communication in the presence of TPP in TPPsw ec , but not in TPPsw at . Weaker responses to changes in the TPP concentration may be related to the autotrophic mode of nutrition, which demands the endogenous synthesis of thiamine. Unlike in plants, bacteria can grow under thiamine-rich conditions allowing them to satisfy their full demand for compounds like thiamine exogenously 8 . Taken together, our results provide new insights into RNA behavior of TPP riboswitch, which may have adapted in a different way to the different metabolic demands of each group of organisms to accomplish distinct TPP binding modulation.

Materials and Methods
Analysis of crystallographic structures. For this study, TPP riboswitch 3D-structures of Escherichia coli (TPPsw ec ) and Arabidopsis thaliana (TPPsw at ) obtained by X-ray crystallography were selected from the Protein Data Bank (PDB) 33 . Both structures are bound to TPP and present high-resolution crystal, of 2.05 Å (PDB ID: 2GDI 27 ) and 2.25 Å (PDB ID: 3D2G 29 ), for TPPsw ec and TPPsw at , respectively. Corresponding sequence and secondary structures information were also taken from the PDB files and analyzed with 3DNA software suite 34 . TPPsw ec and TPPsw at sequences were then aligned using the SARA-Coffee mode of T-Coffee program 35 , and figures of sequence alignment were rendered using ALINE 36 . The secondary structure calculated on the basis of the PDB files of the crystallographic structures was compared through the SimTree server 20 . SimTree returns a similarity score and a mapping between the similar regions of the two structures in dot-bracket notation. Graphical representation of 2D and 3D structures were generated using VARNA 37 and UCSF Chimera, respectively. Molecular dynamics simulations. Molecular dynamics (MD) simulations were carried out using the GROMACS version 5.1.2 package 38 , and RNA interactions were represented using the amber 14sb 39 force field with parmbsc0 + chiOL3 40 . Bonded and Lennard-Jones molecular parameters for TPP have been obtained using the Generalized Amber force field (GAFF) 41 and AM1-BCC 42 tools while atomic partial charges were added using ANTECHAMBER 43 . ACPYPE 44 program was employed to create a GROMACS compatible topology file. Electrostatic interactions were treated using the particle mesh Ewald (PME) algorithm with a cut-off of 10 Å.
MD trajectories were monitored to investigate possible differences in the dynamical behavior between apo and holo TPPsw ec and TPPsw at . In the apo systems, TPP was removed from the X-ray crystal structure and replaced with solvent water. Two initially positioned magnesium ions in the crystal structure were kept in both the apo and holo systems and also contributed to neutralize the systems. These ions are essential for ligand binding and confer stability to the riboswitch as well 45 . Each system was simulated under periodic boundary conditions in a triclinic box whose dimensions were automatically defined considering a distance of 1 nm from the outermost RNA atoms in all cartesian directions. The simulation box was filled with TIP3P water molecules 46 .
Simulations were performed in three stages: (i) Energy minimization, (ii) thermalization and equilibration, and (iii) trajectory production.
Energy minimization procedure was performed through 5000 steps and a gradient tolerance <1.0 kJ mol −1 nm −1 of the steepest descent and conjugate-gradient algorithms. These steps were carried out with heavy atom restraints by applying a harmonic potential with a force constant of 1000 kJ mol −1 nm −2 for the steepest descent algorithm. Applications of the conjugate-gradient algorithm do not allow the application of restraints.
In the second phase, starting atomic velocities were assigned to all atoms of the system using a Maxwell-Boltzmann distribution, corresponding to an initial temperature of 20 K. Then, the systems were gradually heated up to 300 K over 500 picoseconds (ps) utilizing the Langevin thermostat. During this stage, all heavy atoms were harmonically restrained by applying a constant force of 1000 kJ mol −1 nm −2 .
Systems were subsequently equilibrated during twenty successive 100 ps long equilibration simulations where position restraints progressively approached zero. After this period, the systems were simulated with no restraints all at 300 K for 1 µs (trajectory production). All simulations were performed in the NPT ensemble. The V-rescale thermostat and Berendsen barostat were used for temperature (300 K) and pressure control (1 atm), respectively. www.nature.com/scientificreports www.nature.com/scientificreports/ trajectory analysis. As in the trajectory analysis, we were interested only in the structural aspects of the systems, regardless the temporal correlation. Two independent MD simulations were concatenated, and trajectory analyses were conducted for crystal structures systems.
To investigate structural changes of the TPP aptamers, root-mean-square deviation (RMSD) values were calculated separately for the whole RNA and its substructures after fitting to their respective parts, taking the initial structure of the production dynamics as a reference. Hydrogen bond formation was defined using a geometric criterion with VMD software. It was considered a hit when the distance between two polar heavy atoms, with at least one hydrogen atom attached, was less than 3.5 Å using a D-Ĥ-A angle cutoff of 30°. Motif Identifier for Nucleic acids Trajectory (MINT) 47 program was used to evaluate the number of Watson-Crick (WC)-edge and non-WC-edge hydrogen bonds (and their sum) per nucleotide throughout the simulations.
principal components analysis. The study of large-scale domain motions is essential for characterizing the conformational dynamics of macromolecules. Functional motions are usually described by a few numbers of degrees of freedom that can be calculated using principal component analysis (PCA) 48,49 . PCA analysis was carried out for all systems using Bio3D 50 library as implemented in R 51 . Rotational and translational motions were removed before calculation of the covariance matrix by least-squares superposition to the corresponding average structures. The 3 N × 3 N covariance matrices of C5′ atomic positions (Cartesian coordinates) were then calculated for each state. The conformations explored during the MD simulations were applied using hierarchical clustering in R (hclust) with the complete linkage method based on the PC1-PC2 subspace, where PC1 and PC2 denote the projections onto the two first eigenvectors.
Correlation network analysis. The cross-correlation and network analyses were carried out using the Bio3D and the igraph R packages 52 . Initially, the dynamic cross-correlation matrices (DCCM) were calculated separately for each simulation using as inputs the corresponding MD trajectory superimposed onto the initial structure. Then, each group of two matrices per riboswitch state was utilized to obtain a consensus matrix. A proximity/contact map filter was applied in the construction of the correlation network for residues that remained within 4.5 Å from one another for at least 75% of simulation time. Briefly, graphs were obtained considering C5′ atoms as nodes and the connection between nodes i and j were weighted using the absolute values of cross-correlations (C (i,j) ) coefficients (Equation 1): We also calculated the relative importance of each node for communication using centrality measures. According to the definition of the betweenness centrality (Equation 2), the relevance of a given node is defined by its presence in shortest communication paths connecting nodes over the entire network 53 .
B i j k N According to the above equation, the betweenness centrality of a node depends on the total number of the shortest paths between nodes i and j that pass-through n (σ | i j n ( , ) and σ i j ( , ), which is the total number of shortest paths between nodes i and j (regardless of whether they cross or not through n).
Another measure of centrality can be given by the eigenvector (Equation 3) that accounts for the global relevance of each residue based on the connections with neighboring nodes. In other words, nodes with high eigenvector centrality are those connected to other central residues 54 .
Ei v T is obtained as a solution to the eigenvalue problem α = − Ac c Ei Ei 1 , where A is the adjacency matrix for the network graph G. More mathematical details can be found at Kolaczyk 54 .
The Square inner product (SIP) (Equation 4) was used to compare the overall similarity of the centrality profiles calculated for the systems. It varies between 0 and 1 and is defined as where w A and w B are N-length vectors containing the fluctuation value for each atom in proteins A and B, respectively 55 . The Yen's algorithm 56 was used to calculate the shortest pathways connecting two nodes in the network. Path lengths are defined as the sum of the edge weights connecting a pair of nodes in a given pathway. The first 1000 shortest paths were collected and employed to calculate the node degeneracy value, which represents the percentage of pathways from the overall ensemble in which a given node is present.