Abstract
Refining modelled structures to approach experimental accuracy is one of the most challenging problems in molecular biology. Despite many years’ efforts, the progress in protein or RNA structure refinement has been slow because the global minimum given by the energy scores is not at the experimentally determined “native” structure. Here, we propose a fully knowledgebased energy function that captures the full orientation dependence of base–base, base–oxygen and oxygen–oxygen interactions with the RNA backbone modelled by rotameric states and internal energies. A total of 4000 quantummechanical calculations were performed to reweight base–base statistical potentials for minimizing possible effects of indirect interactions. The resulting BRiQ knowledgebased potential, equipped with a nucleobasecentric sampling algorithm, provides a robust improvement in refining nearnative RNA models generated by a wide variety of modelling techniques.
Similar content being viewed by others
Introduction
Threedimensional structures of RNAs offer the best clues for their functions and yet only a few thousand structures have been determined^{1}, compared to 24 million of noncoding RNA sequences collected in RNAcentral (as of October 25 2020)^{2}. This huge gap continues to expand exponentially because sequencing entire transcriptomes is only a fraction of the cost and time required for structure determination of a single RNA by the techniques such as Xray crystallography, nucleic magnetic resonance, and cryoelectron microscopy.
One possible solution to this growing problem is to predict RNA structures from their sequences by computational templatebased homology modeling and de novo folding^{3,4}. Homology modeling such as RNABuilder^{5} and ModeRNA^{6} attempts to predict the structure of a target RNA by mapping the query sequence onto the template structure(s) of its homologs. However, most RNAs do not have their respective homologous templates. Thus, method development was mainly devoted to predicting structures by de novo folding from sequences through fragment or motif assembling. Examples are FARNA^{7}, MCfold/MCSym^{8}, RosettaSWM^{9}, ifoldRNA^{10,11}, Vfold^{12,13}, 3dRNA^{14,15}, and SimRNA^{16}. Evaluation of these methods by RNApuzzles^{17,18,19} indicates that reasonable predictions are possible only for RNAs with either simple topology or existing homologous templates. Moreover, reasonable nearnative predictions were often ranked poorly by the predictors. Clearly both homology and de novo models would greatly benefit from a structurerefinement technique that can bring the models closer to native structures locally and globally and improve nearnative ranking.
Structure refinement, however, is challenging with few methods available for RNAs. Even for protein structure refinement, despite a long history of method development, only a moderate local improvement can be made by combining molecular mechanics force fields and knowledgebased potentials with predefined restraints to prevent large deviation away from the native basin^{20,21}. For RNAs, most allatom refinements were performed after coarsegrained sampling for removing steric clashes and nonideal bond lengths, bond angles, or torsional angles. For example, the QRNAS refinement^{22} utilizes a modified AMBER force field with enforced basepair planarity, explicit hydrogen bonds, and backbone regularization. FARFAR^{23,24} improves model accuracy through allatom refinement after FARNA coarsegrained sampling. It employed a Rosetta allatom force field that mixes physical and knowledgebased scores with RNAspecific terms for Watson–Crick (WC) base pairing, base stacking, and torsional potentials^{25}.
Here, we build a knowledgecentric refinement energy score. Atomiclevel knowledgebased energy functions, derived from known threedimensional structures, have traditionally focused on distance dependence in protein structures^{26} with similar statistical potentials developed for RNA^{27,28,29}. Unlike these atomic knowledgebased scores and previous allatom RNA refinement energy scores^{22,23}, the statistical potential in this work is tailored specifically to RNA interactions that were dominant by orientationdependent basepairing and stacking interactions^{30} with rotameric backbone^{31}, in contrast to dominant distancedependent hydrophobic interactions^{32} with rotameric sidechains^{33} in protein folding. Base–base interactions were described by fine grids in a sixdimensional orientational space and scaled by quantum mechanical calculations allowing better capture of both local and global interactions. This Backbone Rotameric and Quantummechanicalenergyscaled base–base knowledgebased potential (BRiQ) is integrated with a new nucleobasecentric tree algorithm that samples backbone conformations around predicted or known base pairs. The resulting refinement technique can consistently improve model structures for the majority of nearnative structural models as demonstrated by refining RosettaSWM motif^{34}, RNA puzzles^{17,18,19}, and FARFAR2^{24} models.
Results
BRiQ refinement energy score
The BRiQ refinement energy score is designed to capture stably stacked bases linked by a more flexible ribose and phosphate backbone. In particular, the interactions associated with bases and oxygen atoms in backbones are strongly orientationdependent. To capture this unique structural property of RNA chains, the energy score (E) is separated into six terms: orientationdependent interactions between two bases (\({{\rm{E}}}_{{\rm{bb}}}\)), between a base and a mainchain oxygen atom (\({{\rm{E}}}_{{\rm{bo}}}\)), and between two mainchain oxygen atoms (\({{\rm{E}}}_{{\rm{oo}}}\)), backbone rotameric energy (\({{\rm{E}}}_{{\rm{rot}}}\)), internal energy (\({{\rm{E}}}_{{\rm{internal}}}\)), and atomic clash energy (\({{\rm{E}}}_{{\rm{clash}}}\)). That is,
where we employed five oxygen atom types including OP for OP1 and OP2 and O2’, O3’, O4’, and O5’ in ribose. To capture the orientation dependence, the whole nucleobase (A, C, G, and U) is treated as a single rigid group with a local coordinate system and the relative position between two bases is described by the distance vectors and their orientations (Fig. 1A). The densities in the sixdimensional space were derived from known RNA structures using kernel density estimations, rather than histogram statistics (see Methods). This allows a smoother energy landscape (Fig. 1B) and the nearnative local minima are closer to the corresponding native structure in a fine orientational space. To remove the impact of indirect base–base interactions, statistical energy scores are scaled by quantum mechanical calculations according to the correlation coefficient between representative hydrogenbonded base pairs (Fig. 1B, C). Similarly, kernel density estimations were also employed to obtain base–oxygen (Fig. 1D) and oxygen–oxygen interactions whereas statistical rotameric states were obtained for ribose backbones (Fig. 1E). Empirical internal bond angle, bond length, and atomcrash energies were also developed to ensure appropriate bond geometry and packing density (Methods).
NuTree sampling algorithm
The above nucleobasecentric energy score is coupled with a nucleobasecentric tree (NuTree) algorithm for conformational sampling, in which each node is a base and each edge represents the relative position between two bases (Fig. 1F). Two connected bases in the folding tree could be sequential neighbors or hydrogenbonded base pairs. Instead of sampling in backbonedihedral angle space, conformations are sampled according to the predefined orientation space of each edge type because the orientations around more stable bases are the determinants of backbone orientations. For the same reason, coordinates of sidechains were built prior to those of the backbones. Only local moves were allowed for each node whereas both local and global moves were allowed for each edge, making both local and global structural improvements possible. In the process of Monte Carlo sampling, we only need to calculate the energy change upon a random structure move, without knowing the first or second derivative of the energy function. Therefore, this sampling algorithm could be used for optimizing an RNA structure with a nondifferentiable energy function in atomic accuracy.
Refining RosettaSWM conformations for RNA motifs
We first test our method using a benchmark of 48 motifs. This benchmark^{34} was developed to examine the ability of RosettaSWM to recover the loop region given some of the base pairs and environment contacts fixed (See Methods and Supplementary Data 1). We test our refinement method by refining all 88,352 conformations generated by RosettaSWM with the NuTree algorithm and the BRiQ energy score. The NuTree was constructed with the basepairing information extracted from RosettaSWM model, and the BRiQ energy was optimized by Monte Carlo sampling at low temperature. The lowest RMSD values of top 1% predicted models by the Rosetta and BRiQ energies are compared in Fig. 2A. The majority of predicted motif conformations (31/48, 64.5%) have an improved RMSD after BRiQ refinement with a median reduction of 0.2 Å RMSD change among all 48 motifs. The improvement is more impressive on the refinement from nearnative models (RMSD of RosettaSWM model less than 2 Å), 81% (17/21) predicted conformations have an improved RMSD after BRiQ refinement. The RMSDs of the top 1% predicted models for the remaining four motifs (4/21) are essentially unchanged.
Generate motif structures with basepairing information
As the NuTree sampling algorithm is a refinement protocol, nearnative conformations will be difficult to obtain if initial conformations contain false base pairs generated by other methods. Here, we further test whether or not nearnative structures can be generated from the NuTree algorithm if all motif base pairs are known. Starting from a random backbone conformation with base pairs fixed, we run Monte Carlo sampling at different temperatures (Methods) to optimize the BRiQ energy score. The lowest RMSD of top 1% predicted models by the BRiQ score are compared in Fig. 2B. The majority of predicted conformations (33/48, 68.8%) have an improved RMSD with correct and complete basepairing information, compared to partial fixation by RosettaSWM. The median improvement in the best RMSD value within top 1% predicted models is 0.29 Å.
Five representative motifs were chosen for illustration in Fig. 3. They are a standard helix made of GC pairs (Fig. 3A), two tetraloops (GCAAtetra loop, Fig. 3B, UUCGtetra loop, Fig. 3C) commonly used for testing molecular mechanics force fields^{35}, one example of refinement leading to worse structures (j55aP4P6fixed, Fig. 3D) and an example of internal loop (loopE, Fig. 3F). In these figures, the top to bottom panels represent the results from RosettaSWM, refinement of RosettaSWM conformations by BRiQ and the structure prediction by BRiQ with native basepairing information. For the GC helix (Fig. 3A), minimum RMSD sampled by RosettaSWM is 0.58 Å, compared to 0.14 Å by BRiQ. For GCAAtetra loop (Fig. 3B), nearnative local minimal improved from around 1.5 Å RMSD by RosettaSWM to about 1 Å by BRiQ. UUCGtetraloop (Fig. 3C) is an example that <1 Å nearnative structure is the global minimum in BRiQ, whereas the lowest energy conformation in RosettaSWM is around 3.5 Å. For these three motifs, BRiQ refinement of RosettaSWM models leads to an improved backbone fitness to the native backbone. Motif j55aP4P6fixed (Fig. 3D) is an example of worse RMSD values after refining RosettaSWM conformations. We noticed that this native structure has one base protruded into the solvent, which may be the reason for the failure of BRiQ as the solvation effect is only implicitly accounted for in a statistical potential. For loopE, the best predicted conformation by RosettaSWM is about 2 Å RMSD (panel Ei). However, these models have an incorrectly folded nonWC pair that prevented BRiQ to make significant further improvement over RosettaSWM (panel Eii). If native nonWC pairs were employed, we would obtain 0.4 Å RMSD for the best within top 1% (panel Eiii). We can achieve this highresolution structure even without using any nonWC pairs as restraints. We further found that if native conformations are directly refined by BRiQ, the lowest energy conformations of the most motifs (45/48, 94%) remain <2.5 Å away from the native structure. Refining models from the native structures and the RosettaSWM models finds the same minimum for most motifs with similar energy. As a result, most motifs have highquality nearnative conformations as the global minimum.
RNA puzzle refinement
A realworld test of any refinement techniques is to refine previously predicted models. Here we refined all submitted models in 24 RNA puzzle experiments (PZ1PZ25, Supplementary Data 2). For each submitted model, we generated 20 refinement models. Then, the lowest energy model within 20 refinement models is treated as the BRiQ predicted model. The lowest RMSD model from all BRiQ predicted models for a given RNA puzzle is compared to the lowest RMSD value among all submitted models in Fig. 4A. As the figure shown, if initial models contain models with RMSD < 4 Å, refinement always leads to improvement, with the largest RMSD reduction of 1.5 Å. Figure 4B further compares the lowest RMSD value among all submitted models to those lowest RMSD values among all BRiQ refined models. Essentially all RNAs have better models sampled after refinement with the largest RMSD reduction at 2.2 Å.
Another metric for describing structural difference is deformation index (DI) proposed by Parisien et al.^{36}, which emphasized on similarity in the base–base interaction network. Supplementary Data 2 also showed the results based on DI. Indeed, we found that more RNAs showed the improvement in DI after refinement than in RMSD. For example, 20/25 predicted DI values in RNA puzzles were reduced, compared to only 13/25 in predicted RMSD.
MolProbility^{37} is a tool for checking the quality of RNA structures. We calculated MolProbity scores for all RNApuzzle models. Supplementary Fig. S1 shows that the clash scores of 75 or larger are all decreased to less than 50 after BRiQ refinement. The average clash score reduced 40% from 20.87 to 12.58. Except a few outliers, the clash scores are less than 30 after refinement.
In addition to the whole motif, we further analyzed the refinement results of RNA puzzles at the basepair level. Here, we employed DDM to measure the relative orientational difference between predicted and native base pairing structures according to four pseudo atoms employed for representing each base (see Methods). As Supplementary Data 3 shows, the average DDM values from native basepairing structures of Watson–Crick pairs, nonWatson–Crick pairs, and basestacking decreased 30% from 0.545 to 0.384, 17% from 0.687 to 0.570, and 22% from 0.834 to 0.650, respectively. The improvement in base pairing structures after BRiQ refinement is found for essentially all RNA puzzles (except for nonWatson–Crick pairs in PZ02, PZ03, and PZ05), regardless whether or not there is an improvement of the overall RMSD or not. More improvement at the basepair level indicates that the BRiQ refinement occurred at the detailed atomic resolution.
It is interesting to know what happens if BRiQ is applied to refining native structures. We examined the deviation from the native structure at the basepair structural level. Supplementary Fig. S2 shows the change of base pair structures in DDM as a function of Xray structure resolution after refinement of native structures by BRiQ. Overall changes to the native base pairing structures are small. There is a trend that larger changes in base pair conformations were observed for lower resolution structures, suggesting more uncertainty for low resolution structures as expected.
Comparison to FARFAR2
We also test our refinement method starting from the recently developed FARFAR2 predicted models^{24} but only for those 12 RNA puzzles with RMSD of best FARFAR2predicted models less than 6 Å as shown in Fig. 4C. This is because it is not possible to refine those models that are far from native. For each puzzle, we select 2000 lowest energy models according to FARFAR2 (1547 models for PZ20) to run BRiQ refinement, and compare the best RMSD values of top 1% and 5% predicted models by FARFAR2 and BRiQ energy scores, respectively. As shown in Supplementary Data 4 and Fig. 4C, D, almost all (10/12) have reduced RMSD values after refinement with a median RMSD reduction of 0.48 Å for top 1% prediction and 0.45 Å for top 5% prediction. The only two RNA puzzles with increased RMSD values after refinement have very small changes (<0.08 Å for top 1% and <0.19 Å for top 5% predictions). When structural difference measured by DI (Supplementary Data 4), DI after BRIQ refinement were improved or maintained at the similar level for all 12 cases investigated. These results further indicate the consistency of BRiQ refinement.
As illustrative examples, Fig. 5 compares the conformations sampled by FARFAR2 and those further refined by BRiQ for Puzzles 4 and 18, respectively. Both show significant moves toward native structures of RNAs by comparing the dependence of energy scores on RMSD shown in Panels i and ii. Similar to the motif cases, BRiQ refinement yields improved backbone conformations over FARFAR2 models as shown in structural comparisons in Panels iii and iv of the figures.
Discussion
This work presents a protocol for refining RNA model structures produced by other modeling tools. The protocol is nucleobasecentric in its knowledgebased energy score as well as the sampling algorithm. The fully knowledgebased score (BRiQ) captures the orientation dependence of base–base and base–oxygen, and oxygen–oxygen interactions by using a local coordinate system for each nucleobase, whereas the structures of ribose and phosphate backbones were governed by rotameric statistical potentials and empirical internal energies. Moreover, the dominant base–base interaction was scaled by quantum mechanical calculations for removing possible indirect interactions contained in structural statistics. This statistical potential was coupled with the NuTree algorithm that samples the conformations around predetermined base pairs. This refinement protocol improves 81% RosettaSWM models with less than 2 Å RMSD, 100% RNA puzzle models with RMSD < 4 Å, and 83% FARFAR2 models with RMSD < 6 Å. The CPU time cost of refining a single RNA puzzle model is around 40 min for a model of 50 nt, and 2 h for a model of 100 nt on Intel Xeon Gold 6242 CPU 2.80 GHz, and the RAM requirement is about 2.9 G.
The BRiQ statistical energy score is built on the knowledge that backbone conformations are determined by more stable stacked base pairs. This was done by focusing on base–base, base–oxygen, and oxygen–oxygen interactions while treating the backbone as rotameric states around the bases. The robustness of the scoring function is illustrated by the consistent improvement in refining a diverse set of RNA models produced from different methods participated in RNA puzzles including ModeRNA^{6}, FARNA^{7}, MCFold/MCSym^{8}, ifoldRNA^{10,11}, Vfold^{12,13}, 3dRNA^{14,15}, and SimRNA^{16}.
In order to secure as many RNA structures as possible, we included all RNA structures with resolution higher than 3.0 Å to collect base and backbone statistics for our BRiQ knowledgebased score. Although Xray refinement at resolutions between 2.5 and 3.0 Å may contain errors and cryo EM structures may have poor density fitting, using 2.5 Å cut off and a limitation to Xray structures will lead to 1225 structures with only 10.5% bases, a dataset that is too small to capture useful allatom statistics. It is also noted that even for highresolution structures, there are regions which have poor Rfactor values or high clash scores. Structural regions with clashed atoms were automatically excluded from our statistics. Structural regions with poor Rfactors simply reflect the regions that are more dynamic and the conformations of the regions are probable conformations among many. Thus, it is reasonable to incorporate these conformations as a part of statistics as commonly done for statistical potentials of proteins^{26}.
Another issue of the structural database is that the RNA structures included were dominated by rRNA and tRNA with about 89% base pairs from rRNAs. This raises the question whether or not our BRiQ energy function is biased toward rRNA and tRNA so that it would not be generalizable to other RNAs. In fact, RNA puzzles all are made of noncoding RNAs that are not involved in protein synthesis. Despite of this, BRiQ can consistently refine these noncoding RNAs, indicating that BRiQ is not an energy function limited to rRNA or tRNA.
The above demonstrated transferability is achieved because the statistics was made at base and atom levels, not at the structural motif level. Moreover, we are only interested in detailed energy surfaces around the local minima. That is, subtle or large conformational changes due to different ligands and crystallization conditions are useful to increase the resolution of the energy surface. In Supplementary Fig. S3, we plot the probability as a function of the pair orientation distance contributed by the structures of same sequences and by other structures. It is clear that the structures of same sequences can fill the conformational space missed by other structures, generating a more refined energy surface. To understand the potential impact of homologous structures to the refinement results of RNA puzzles, we removed all homologous structures of RNA puzzles with 70% sequence identity (the sequence similarity for the RNA “twilight zone” is 80%)^{38}. We found that the homologous structures from RNA puzzles contributed only to 0.1% of all structural data. The changes to the BRiQ energy score are negligible. Refinement results with the new BRiQ score are essentially the same (except those caused by stochastic nature of Monte Carlo sampling). For example, the difference before and after removing homologs for the refined FARFAR2 structures is only 0.02 Å for rp04 and 0.2 Å for rp18 for the best in top 1%, respectively. We will update the BRiQ energy score when more nonredundant RNA structures become available.
The NuTree algorithm samples the RNA conformations around predetermined base pairs from a given model. These preformed base pairs serve as the nucleus to speed up the folding of the rest of an RNA chain. However, if a base pair was incorrectly modeled, it will be energetically difficult to correct the mistake. This is the main reason why the current refinement protocol works best for those nearnative structures, in which the majority of base pairs were correctly identified. Unlike Rosetta, the NuTree algorithm can fix both nested and nonnested (pseudoknot) base pairs. In other words, this algorithm will become more useful as secondary and tertiary base pairs are increasingly more accurately predicted by employing covariational analysis of RNA homologous sequences^{39,40,41,42} and deep learning techniques^{43,44}.
This work is limited to RNA structural refinement. Another question is whether or not the RBiQ statistical energy function coupling with the NuTree algorithm can serve as an effective method for ab initio structure prediction. Initial studies suggest that a new sampling algorithm is likely required because ab initio folding requires more frequent breaking and forming of base pairs than what are typically involved in the NuTree algorithm. The work in this area is in progress.
Methods
BRiQ energy score
Base–base interaction energy (\({{\rm{E}}}_{{\rm{bb}}}\))
Relative orientation between two bases. The planar shaped bases make the orientation dependence a must for base–base interactions. Here, we consider each base i as a rigid body with a local coordinate system CS_{i} whose origin is located at atom C1’. The xaxis is in the direction of C1’N9 and the zaxis is in the direction of C1’N9×N9C4 for A and G whereas the xaxis is in the direction of C1’N1 and the zaxis is in the direction of C1’N1×N1C2 for U and C. As shown in Fig. 1A, the orientation and distance between bases i and j can be described by (1) the distance r_{ij} between the origins of CS_{i} and CS_{j}, (2) the rotational angle ω_{ij} around r_{ij}, (3) the directional vector r_{ij} in CS_{i}, and (4) the directional vector r_{ji} in CS_{j} (a total of 6 dimensions). The distance r_{ij} has a range of 0–15 Å with a uniform grid space of 0.3 Å. The rotational angle ω_{ij} varied from −180° to 180° with a uniform grid of 8°. We represent the orientation of the directional vector r_{ji} by 2000 uniformly distributed points on a sphere from Monte Carlo simulated annealing of points with repulsive interactions proportional to the inverse of squared distances between the points. Thus, the space between the two bases is separated into 50 × 45 × 2000 × 2000 discrete regions. Once the energy values for all these regions are known, the energy value at a given distance and orientation can be linearly interpolated. The above coordinate system for representing the base–base relative orientation was similar to but is more sophisticated than what was proposed for a coarsegrained twoparticle representation of RNA chain^{45}. The 6dimensional base–base interactions were precalculated and stored in a table and only negative values were loaded into computer memory for quick access.
Baseorientation representation. To define an orientationdependent density, it is necessary to measure the structural similarity first. One common measure is the rootmeansquared distance (RMSD) between two base pairs. However, it is too timeconsuming to calculate RMSD. To speed up the calculation, we assumed that four fixed points (T1, T2, T3, T4) in the local coordinate system of a base (Supplementary Fig. S4a) are sufficient to represent the orientation of each base and the mean squared distance between these points in two bases A and B (DDM) can approximate RMSD with DDM defined by
We have optimized T1, T2, T3, T4 so that DDM has the highest correlation coefficient to RMSD. The final four points in each base coordinate system are: T1 = (2.158, 3.826, 1.427), T2 = (−0.789, −0.329, −1.273), T3 = (4.520, −3.006, 1.586) and T4 = (6.018, 1.903, −1.638). The resulting correlation coefficient is 0.974 (Supplementary Fig. S4b).
Orientation distribution density. We employed a modified radial basis function kernel h(d) to calculate the orientation distribution of one base around another base f(x) by the following equations:
and
where N is the number of base pairs in the database of RNA structures and DDM(x,x_{i}) is the distance between a target base pair and a base pair in the database. If \({\rm{DDM}}\left(x,{x}_{i}\right)\) between two bases is less than 0.15 Å, we consider two bases shared the same orientation. Here and below, we have employed an empirical value (0.1) for defining the spread of the kernel based on statistics from the PDB structures and a few trials. An example is shown in Supplementary Fig. S5.
The orientation distribution densities of base pairs with different sequenceseparation distances (1, 2 or >2) were calculated separately. For nonlocal base pairs (separation >2), the density was reweighted by quantum mechanical energy.
Quantummechanicalenergyweighted orientation distribution density. One issue associated with a statistical energy function is that a high population of specific orientation between two bases may not be due to a strong direct interaction at this specific orientation but due to the orientation preference of the interaction with another base. Here, we minimize the possibility of this type of indirect interactions by a weighting factor w(x_{i}) according to quantum calculations. The assumption is that the strength of the directional interaction is related to the strength of the quantum interaction energy. Here,
where \({\widetilde{x}}_{i}\) is the nearest representative configuration to \({x}_{i}\), \({{\rm{E}}}_{{\rm{QM}}}\left({\widetilde{x}}_{i}\right)\) is the quantum energy. The modified orientation distribution \({f}^{{\prime} }\left(x\right)\) satisfies
Here, quantum mechanical (QM) calculations were performed by using Guassian 09^{46}. We investigated all possible ten pairs of bases (AA, AU, AG, AC, UU, UG, UC, GG, GC, and CC). That is, both canonical (AU, GC, GU) and all other possible noncanonical pairs were included in the statistics. Moreover, for each basepair type, we generated 80 orientation cluster centers by minimizing the root mean square distance between all data points to the nearest cluster center. That is, not only base pairs, but base–base stacking and other base–base polar interactions were included in the statistics. For each representative configuration in a structural cluster, we employed 5 nearest basepairing conformations with the highestresolution PDB structures as the initial configurations for QM calculations. The initial structure from the PDB was further optimized quantum mechanically so as to minimize the effect of potentially inaccurate conformations. The steps for calculating QM base–base interactions are as follows: (1) remove all backbone atoms, (2) replace the C1’ atom with a H atom, (3) optimize the position of the H atom by ab initio Hartree–Fock calculations with the basis set 631 G* and (4) calculate the base–base interaction energy by E(AB) = H(AB) – H(A) – H(B) with the densityfunctional theory method M062X^{47} and the basis set 631+G(d,p). The average value of five QM calculations is considered as the QM energy for the representative configuration. A total of 4000 QM calculations for 10 possible base pairs (80 × 10 × 5) were performed. Figure 1C shows some examples of the calculation results. The scaling parameter (4.32 kcal/mol) in Eq. (5) between QM calculations and statistical energy functions is obtained from the slope of the regression analysis of hydrogenbonded base pairs (Fig. 1C).
Base–base interaction energy. The final energy function for base–base interactions is calculated by the following equations:
and,
where sep is the sequence separation distance, its value could be 1, 2 or 2+. f_{ref}(2+) was employed to scale the lowest energy value of nonlocal base pair to ‒8.0 based on the lowest energy from QM calculations and the scaling between statistical and QM calculations (Fig. 1C). f_{ref}(1) and f_{ref}(2) were employed to scale the lowest energy value of local base pair to ‒4.0. Here, we also set the maximum base–base interaction energy to zero because repulsive interactions from the above sixdimensional statistical analysis are not reliable. Figure 1B shows a schematic example of before and after QM scaling.
Interaction energy between a base and a mainchain oxygen atom \({{\rm{E}}}_{{\rm{bo}}}\left(x\right)\)
The interaction energy between a base and a mainchain oxygen atom is the interaction between a polar group and a polar atom. The orientation of a base is still represented by 4 points obtained previously. The relative orientation and position between an oxygen atom and a base can be described by the distance between the oxygen atom and the four points representing the base. The kernel function \({\rm{h}}\left(d\right)\) and the density \({\rm{f}}\left(x\right)\) for \({{\rm{E}}}_{{\rm{bo}}}\left(x\right)\) are given by
and
where x and x_{i} represent the target base–oxygen and a base–oxygen pair in the database, respectively, and \(d\left(x,{x}_{i}\right)\) are the RMSD between the two base–oxygen pairs. Unlike \({\rm{E}}_{\rm{bb}}\), the repulsive (positive) interaction was not set to zero but reduced by a decaying weighting function \({w}_{bo}\left({d}_{bo}^{\min }\right)=11/({1+e}^{(3.7{d}_{bo}^{\min })/0.08})\), where \({{\rm{d}}}_{{\rm{bo}}}^{{\rm{min }}}\) is the minimum distance between an oxygen atom and any atoms in a base, 3.7 Å is the approximate interaction distance between a mainchain oxygen atom and a base and 0.08 is an empirical parameter to control the rate of decay. We also impose an orientation dependence for the hydrogen bonding between an oxygen atom and the corresponding atoms in a base according to the θ angle between the base atom N or O, the main chain atom O (O2’, OP1, OP2) and the main chain atom C or P bonding to O. This was done by locating the angle range in RNA structures after removing top 3% angles from each side and defining s(θ)=0 for outside the angle range, s(θ)=1 when θ=θ_{M}, the median value of θ, and
where θ_{u} and θ_{L} are the upper and lower bounds for θ, respectively. The final expression for \({{\rm{E}}}_{{\rm{bo}}}\left({\rm{x}}\right)\) is
where \({f}_{{\rm{ref}}}\) is employed to scale the lowest energy value empirically to −3.0, which we set to be about onethird strength of GC pairs. As an example, Fig. 1D shows the distribution of a nucleobase C and an oxygen atom in a phosphate group and that of a nucleobase A and O4 in ribose.
Hydrogenbonding energy between mainchain oxygen atoms \({{\rm{E}}}_{{\rm{oo}}}\left(x\right)\)
We consider hydrogen bonding interactions between O2’O2’, O2’OP and OPOP. The configuration of hydrogen bonds is determined by four atoms (two oxygen atoms and their connecting C or P atoms). The distance d between two hydrogenbonding configurations is calculated by RMSD between the four atoms. Similar to \({{\rm{E}}}_{{\rm{bo}}}\left(x\right)\), we have
with \(f(x)={\sum}_{i=1}^{N}{e}^{0.5\left(\right.d({hb},{hb}_{i}/0.1)^{2}}\), \({f}_{\mathrm{ref}}\) is employed to scale the lowest energy value empirically to 3.0 for O2’O2’, 2.0 for O2’OP, 1.5 for OPOP and the decay function \({w}_{oo} {({d}_{oo})}={11}/({1+e}^{(3.3d)/0.07})\) with 3.3 Å is the typical hydrogen bond length between two oxygen atoms and 0.07 is the empirical parameter to control the decay rate. Here the angle dependence is implicitly accounted for by using 4 atomic coordinates (e.g., atoms C2’O2’OPP for the hydrogen bond between O2’ and OP) to define RMSD.
Energy for atomic clashes (\({{\rm{E}}}_{{\rm{clash}}}\))
Due to lack of adequate statistics for hardcore exclusion in all statistical energy functions, we have set repulsive terms to 0 in \({{\rm{E}}}_{{\rm{bb}}}\left(x\right)\) or quickly decay to 0 by weighting functions \({{\rm{w}}}_{{\rm{bo}}}\left({d}_{{bo}}^{{\min }}\right)\) and \({{\rm{w}}}_{{\rm{oo}}}\left({d}_{{oo}}\right)\) in \({{\rm{E}}}_{{\rm{bo}}}\left(x\right)\) and \({{\rm{E}}}_{{\rm{oo}}}\left(x\right),\) respectively, when the distance is less than a preset value. To avoid direct atomic clashes, we introduce an empirical energy function as below.
where d is the atomic distance between two atoms, r_{0} is the shortest statistical distance from RNA structures between the two atoms and \({k}_{{{\mathrm{clash}}}}\) is an empirical parameter to control the increasing rate of repulsion when two atoms approach to each other. We set \({k}_{{{\mathrm{clash}}}}=3\) after the energy minimized structure does not change much for \({k}_{{{\mathrm{clash}}}}\) between 2 and 5. r_{0} is orientation independent for atoms in SP^{3} hybridization but is orientationdependent for atoms in SP^{2} hybridization. That is, r_{0} is orientationdependent between any atom with an atom in a base but is orientationindependent between two mainchain atoms. For orientationindependent r_{0}, it is set as the top 5% shortest distance between two atoms found in RNA structures. For orientationdependent r_{0} between a given atom a and an atom b in base B, the atomic position of a is transformed to the local coordinate system of the base B and clustered around the atom b. The top 5% shortest distance in different orientations is set as r_{0} in different orientations. The functional form of the clash energy is shown in Supplementary Fig. S6.
Mainchain rotameric energy (E_{rot})
For protein structure prediction, the sidechain conformational space is discretized into rotamers. Here, we introduce mainchain rotameric states because stacked bases are more rigid than the main chains. The conformational space of a ribose rotamer is mainly determined by two dihedral angles (Fig. 1E). The first is the rotational angle χ rotating around the chemical bond connecting the base and ribose (N9C1’ for bases A and G and N1C1’ for bases U and C). The second is the improper angle ν between the C2’C4’O4’ plane and the C2’C4’C3’ plane within the connected ribose. Each dihedral angle can be separated into two regions, and each region into 300 to 500 rotamers, and a total of 1500 rotamers is employed to describe a ribose rotameric state.
The ribose rotamers are clustered according to RMSD. This is done by transforming atomic coordinates of eight ribose atoms to the local coordinate system for the base. RMSD is the average distance of ribose atoms in the local coordinate system. Once RMSD is known, we can use the kernel density estimation method to calculate the density of each rotamer, and then take the negative logarithm and convert it to the rotamer energy \({{\rm{E}}}_{{\rm{rot}}}\left({\rm{x}}\right)\) as below.
where \(\left(x\right)=\mathop{\sum }\limits_{i=1}^{N}{e}^{0.5(d(x,{x}_{i})/0.15)^{2}}\), x_{i} denotes a rotamer in the database and the summation is over all the rotamers in the database.
The internal energy \({\rm{E}}_{\rm{internal}}\)
We introduce the following internal energies \(({\rm{E}}_{\rm{internal}})\) between a phosphate group and two connecting riboses to account for energetics associated to changes in bond lengths \({{\rm{E}}}_{{\rm{bond}}}\), bond angles \({{\rm{E}}}_{{\rm{angle}}}\), and dihedral angles \({{\rm{E}}}_{{\rm{torsion}}}\)
where \(u\left({bl}\right)={k}_{{{\mathrm{bond}}}}({bl}1.422)\) and bl denotes the bond length between O5’_{i+1} and C5’_{i+1.} We made this bond slightly flexible so that we can add a phosphate group (atom P, OP1, OP2, O5’) between two fixed riboses at fixed dihedral angles, other bond lengths and bond angles. The average bond length found in the RNA structures (1.422 Å) is employed here. Parameter \({k}_{{{\mathrm{bond}}}}\) is set to 5.0, which is optimized by backbone modeling (rebuilt mainchain atoms after fixing base positions).
where \(u\left({\theta }_{{\rm{POC}}}\right)={k}_{{{\mathrm{angle}}}}({\theta }_{{\rm{POC}}}120.7)\) and \(u\left({\theta }_{{\rm{OCC}}}\right)={k}_{{{\mathrm{angle}}}}({\theta }_{{\rm{OCC}}}111.1)\). \({\theta }_{{\rm{POC}}}\) \({\rm{is}}\) between _{Pi+1}O5’_{i+1}C5’_{i+1} and \({\theta }_{{\rm{OCC}}}\) is between O5’_{i+1}C5’_{i+1}C4’_{i+1.} Similar to the bond length, we made these two angles flexible so that we can add a phosphate group between two fixed riboses at fixed dihedral angles, other bond lengths and bond angles. The average values for these two angles in RNA structures are 120.7° and 111.1°, respectively. Parameter \({k}_{{{\mathrm{angle}}}}\) is set to 0.1, after a few trials of backbone modeling. For both bondangle and bondlength energies, a harmonic function with linear extrapolations at both ends was employed to improve conformational sampling efficiency.
\({{\rm{E}}}_{{\rm{torsion}}}\) is calculated from three statistical energy functions (see Fig. 1E for angle definitions).
with \(E\left({\nu }_{i},{{\rm{\varepsilon }}}_{i},{\varsigma }_{i}\right)={\rm{ln}}P({{\rm{\varepsilon }}}_{i},{\varsigma }_{i}{\rm{}}{\nu }_{i})\), \(E\left({\varsigma }_{i},{\alpha }_{i+1},{\beta }_{i+1}\right)={\rm{ln}}P({\alpha }_{i+1}{\rm{}}{\varsigma }_{i},{\beta }_{i+1})\), and \(E\left({\beta }_{i+1},{\gamma }_{i+1},{\nu }_{i+1}\right)={\rm{ln}}P({\beta }_{i+1},{\gamma }_{i+1}{\rm{}}{\nu }_{i+1})\). Here, two improper angles (\({\nu }_{i}\) and \({\nu }_{i+1}\)) belong to the two neighboring riboses connected by a phosphate group. Two dihedral (angles \({{\rm{\varepsilon }}}_{i}\) involving C2’_{i}C3’_{i}O3’_{i}_{Pi+1} and \({\varsigma }_{i}\) involving C3’_{i}O3’_{i}_{Pi+1}O5’_{i+1}) determines the position of the phosphate group. In addition, \({\alpha }_{i+1}\), \({\beta }_{i+1}\), and \({\gamma }_{i+1}\) dihedral angles involve O3’_{i1} _{Pi+1}O5’_{i+1}C5’_{i+1}, _{Pi+1}O5’_{i+1}C5’_{i+1}C4’_{i+1} and O5’_{i+1}C5’_{i+1}C4’_{i+1}O4’_{i+1}, respectively^{48}. Here, the coupling between neighboring three dihedral angles, rather than the statistics of a single torsion angle was considered to improve the accuracy of mainchain modeling.
Structure database
All statistical energy terms in Eq. (1) were derived from 2247 RNA structures obtained by Xray crystallography and cryogenic electron microscopy with a resolution higher than 3.0 Å downloaded on January 23, 2020 from the PDB databank^{1}. This set contains 272 ribosome, 221 riboswitches, 138 tRNA, 106 ribozymes, 42 aptamers, 121 virus RNA, 17 introns, 3 spliceosomes, and 1327 others. Among them, there are 1459 protein–RNA complex structures and 788 RNAonly structures. The base pairing information is dominated by ribosome (about 89%). We did not use the Cambridge database because it contains simple structures only. We did not remove any redundant sequences because we need a database as large as possible. Moreover, RNAs can have different structures in different complexes and we want to capture all possible conformations. We can do this because statistics are collected at the atomic or base level. See the discussion in more details.
Conformational sampling algorithm
Confirmational representation by the NuTree
For conformational sampling, each RNA structure is represented by the NuTree. Each node in the NuTree denotes an RNA residue including its base position (the local coordinate system), the ribose rotameric state and the phosphate position attached to the 3′ position of the ribose (determined by dihedral angles ε and ζ, Fig. 1E). The edge of the NuTree represents relative positions between the local coordinate systems of two bases, which are described by a 3 × 4 coordinate transformation matrix. There are ten types of coordinate transformation matrices to describe nine types of edges: (1) sequenceneighboring base pairs along 5′ to 3′ and 3′ to 5′ directions in the loop regions, (2) sequenceneighboring base pairs along 5′ to 3′ and 3′ to 5′ directions in the Watson–Crick pairing regions, (3) sequenceneighboring base pairs along 5′ to 3′ and 3′ to 5′ directions in the nonWatson–Crick pairing regions or connection between Watson–Crick pairing and nonWatson–Crick pairing regions, (4)Watson–Crick pairs, (5) NonWatson–Crick pairs, (6) chain connection without a specific positional relation (jump). Except for jump, each edge has a corresponding move set. The move set was collected from all possible conformations of the same type in the structural database that were discretized into 60 representative conformations and each representative conformation was further discretized into 60 subrepresentative conformations. These 3600 representative conformations form the move set for each edge type. Figure 1F shows the NuTree for the GCAA tetraloop with a sequence of G_{0}C_{1}G_{2}C_{3}A_{4}A_{5}G_{6}C_{7} where the edge 0–1 is sequenceneighboring base pairs along the 5′ to 3′ direction in the Watson–Crick pairing region, edges 1–2 is sequenceneighboring base pairs along the 5′ to 3′ direction in nonWatson–Crick pairing regions, 2–3, and 3–4 are sequenceneighboring base pairs along the 5′ to 3′ direction in the loop regions, edges 0–7, 1–6 are Watson–Crick pairs and 2–5 is nonWatson–Crick pair. Although there is no edge between 4 and 5, we need to build the phosphate connection and calculate the internal energy between them.
For constructing a RNA model, the average coordinates were used for atoms in the bases. Atoms in riboses were generated from representative PDB conformations. Bond length and bond angles associated to the phosphate group were set to the average value in PDB library. It should be emphasized that atomic coordinates of riboses were not generated from a fixed bond length and angles but directly from the puckerdependent conformers^{49} from PDB structures.
Conformational sampling moves
Conformational sampling involves node and edge sampling. Node sampling allows either a random selection of ribose rotamers or a small adjustment of the base local coordinate system (<0.2 Å translational and <2° rotational motions). Both moves are local. They do not change the positions of other nodes and riboses. Edge sampling makes both local and global moves. The local move makes minor adjustment (<0.2 Å translational and <2° rotational motions). The global move randomly selects a move from the set according to the edge type. Edge sampling moves will change the positions of all downstream nodes but not the relative positions between the nodes.
The above sampling may change the relative positions of some neighboring riboses and make it necessary to rebuild the phosphate connection. Such connection can be rapidly built by the grid search from representative points for the lowest \({E}_{{{\mathrm{internal}}}}\) value in the ε–ζ torsional angle space. According to the distribution of conformations (Supplementary Fig. S7), we have divided ε–ζ space into 50 representative points, each representative point into 50 subrepresentative points and each subrepresentative point with 25 local points in 1° extension. That is, a total of 125 calculations is conducted for searching the lowest \({E}_{{{\mathrm{internal}}}}\) value.
Monte Carlo simulated annealing
All nodes in a NuTree are divided into three types: those with positions fixed (A), those with relative internal positions fixed but not absolute positions (B), and those with both relative and absolute positions changed (C). The energy change after each move is calculated by
where dE(AB), dE(AC), dE(BC), and dE(CC) denote the changes in interaction energies with the AB, AC, BC, and CC regions, respectively, and \(d{E}_{{{\mathrm{rot}}}}\) and \(d{E}_{{{\mathrm{internal}}}}\) are the changes in rotameric and internal energies, respectively. Each move is accepted or rejected according to the Metropolis criterion. We employed simulated annealing for energy optimization with an initial temperature set at 2.5, which is decreased by a factor of 0.95 at each round until temperature reached 0.01. In the refinement protocol, the initial temperature is set at 0.5 which is decreased to 0.01 by a factor of 0.9. The sampling steps at each temperature N_{step} is proportional to the edge number in the NuTree.
where n(wc edge) is the edge number of Watson–Crick pair or helix neighbor, n(nwc edge) is the edge number of NonWatson–Crick pair or NWC neighbor, n(other edge) is the number of other edges.
To increase the efficiency of conformational sampling, we give E_{clash} and E_{internal} a low weight of 0.05 at the initial temperature and gradually increase the weight to 1.
Motif test set
The test set is obtained from the benchmark of RosettaSWM^{34}, downloaded from https://purl.stanford.edu/fq893cm4516. After removing redundant RNAs and those with more than 50 bases and nonstandard RNA bases, we have 48 motifs listed in Supplementary Data 1.
RNA puzzle and FARFAR2 test sets
RNA Puzzle data set is downloaded from https://github.com/RNAPuzzles/rawdatasetandforassessment. The prediction results of FARFAR2^{24} are downloaded from https://purl.stanford.edu/wn364wz7925.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
The structural data and test sets used by BRiQ is publicly available at http://servers.sparkslab.org/downloads/BRiQdataset.tar.gz.
Code availability
The source code of BRiQ refinement is available at https://github.com/JianZhan/RNABRiQ. We have also made the code citable by obtaining a DOI for the Github repository, which allows a permanent reference to the version of the code used in this study^{50}.
References
Rose, P. W. et al. The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res. 43, D345–D356 (2015).
Petrov, A. I. et al. RNAcentral: an international database of ncRNA sequences. Nucleic Acids Res. 43, D123–D129 (2015).
Sun, L. Z., Zhang, D. & Chen, S. J. Theory and modeling of RNA structure and interactions with metal ions and small molecules. Annu. Rev. Biophys. 46, 227–246 (2017).
Antunes, D., Jorge, N. A. N., Caffarena, E. R. & Passetti, F. Using RNA sequence and structure for the prediction of riboswitch aptamer: a comprehensive review of available software and tools. Front. Genet. 8, 231 (2018).
Flores, S. C., Wan, Y., Russell, R. & Altman, R. B. Predicting RNA structure by multiple template homology modeling. Pac. Symp. Biocomput. 216–227 (2010).
Rother, M., Rother, K., Puton, T. & Bujnicki, J. M. ModeRNA: a tool for comparative modeling of RNA 3D structure. Nucleic Acids Res. 39, 4007–4022 (2011).
Das, R. & Baker, D. Automated de novo prediction of nativelike RNA tertiary structures. Proc. Natl Acad. Sci. USA 104, 14664–14669 (2007).
Parisien, M. & Major, F. The MCfold and MCSym pipeline infers RNA structure from sequence data. Nature 452, 51–55 (2008).
Das, R. Atomicaccuracy prediction of protein loop structures through an RNAinspired Ansatz. PLos ONE 8, e74830 (2013).
Krokhotin, A., Houlihan, K. & Dokholyan, N. V. iFoldRNA v2: folding RNA with constraints. Bioinformatics 31, 2891–2893 (2015).
Sharma, S., Ding, F. & Dokholyan, N. V. iFoldRNA: threedimensional RNA structure prediction and folding. Bioinformatics 24, 1951–1952 (2008).
Cao, S. & Chen, S. J. Physicsbased de novo prediction of RNA 3D structures. J. Phys. Chem. B 115, 4216–4226 (2011).
Xu, X. & Chen, S. J. Hierarchical assembly of RNA threedimensional structures based on loop templates. J. Phys. Chem. B 122, 5327–5335 (2018).
Zhao, Y. et al. Automated and fast building of threedimensional RNA structures. Sci. Rep. 2, 734 (2012).
Wang, J., Wang, J., Huang, Y. & Xiao, Y. 3dRNA v2.0: an updated web server for RNA 3D structure prediction. Int. J. Mol. Sci. 20, 4116 (2019).
Boniecki, M. J. et al. SimRNA: a coarsegrained method for RNA folding simulations and 3D structure prediction. Nucleic Acids Res. 44, e63 (2016).
Miao, Z. et al. RNApuzzles round III: 3D RNA structure prediction of five riboswitches and one ribozyme. RNA 23, 655–672 (2017).
Miao, Z. et al. RNApuzzles round II: assessment of RNA structure prediction programs applied to three large RNA structures. RNA 21, 1066–1084 (2015).
Miao, Z. et al. RNApuzzles round IV: 3D structure predictions of four ribozymes and two aptamers. RNA 26, 982–995 (2020).
Feig, M. Computational protein structure refinement: almost there, yet still so far to go. Wiley Interdiscip. Rev. Comput. Mol. Sci. 7, e1307 (2017).
Adiyaman, R. & McGuffin, L. J. Methods for the refinement of protein structure 3D models. Int. J. Mol. Sci. 20, 2301 (2019).
Stasiewicz, J., Mukherjee, S., Nithin, C. & Bujnicki, J. M. QRNAS: software tool for refinement of nucleic acid structures. BMC Struct. Biol. 19, 5 (2019).
Das, R., Karanicolas, J. & Baker, D. Atomic accuracy in predicting and designing noncanonical RNA structure. Nat. Methods 7, 291–294 (2010).
Watkins, A. M., Rangan, R. & Das, R. FARFAR2: improved de novo Rosetta prediction of complex global RNA folds. Structure 28, 963–976 e6 (2020).
Alford, R. F. et al. The Rosetta allatom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017).
Zhou, Y., Zhou, H., Zhang, C. & Liu, S. What is a desirable statistical energy function for proteins and how can it be obtained? Cell Biochem. Biophys. 46, 165–174 (2006).
Wang, J., Zhao, Y., Zhu, C. & Xiao, Y. 3dRNAscore: a distance and torsion angle dependent evaluation function of 3D RNA structures. Nucleic Acids Res. 43, e63 (2015).
Zhang, T. C., Hu, G. D., Yang, Y. D., Wang, J. H. & Zhou, Y. Q. Allatom knowledgebased potential for RNA structure discrimination based on the distancescaled finite idealgas reference state. J. Comput.Biol. 27, 856–867 (2019).
Tan, Y. L. et al. What is the best reference state for building statistical potentials in RNA 3D structure evaluation? RNA 25, 793–812 (2019).
Tinoco, I. & Bustamante, C. How RNA folds. J. Mol. Biol. 293, 271–281 (1999).
Murray, L. J., Arendall, W. B. 3rd, Richardson, D. C. & Richardson, J. S. RNA backbone is rotameric. Proc. Natl Acad. Sci. USA 100, 13904–13909 (2003).
Dill, K. A. & MacCallum, J. L. The proteinfolding problem, 50 years on. Science 338, 1042–1046 (2012).
Dunbrack, R. L. Jr. & Karplus, M. Conformational analysis of the backbonedependent rotamer preferences of protein sidechains. Nat. Struct. Biol. 1, 334–340 (1994).
Watkins, A. M. et al. Blind prediction of noncanonical RNA structure at atomic accuracy. Sci. Adv. 4, eaar5316 (2018).
Tan, D., Piana, S., Dirks, R. M. & Shaw, D. E. RNA force field with accuracy comparable to stateoftheart protein force fields. Proc. Natl Acad. Sci. USA 115, E1346–E1355 (2018).
Parisien, M., Cruz, J. A., Westhof, E. & Major, F. New metrics for comparing and assessing discrepancies between RNA 3D structures and models. RNA 15, 1875–1885 (2009).
Davis, I. W. et al. MolProbity: allatom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res. 35, W375–W383 (2007).
Gardner, P. P., Wilm, A. & Washietl, S. A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res. 33, 2433–2439 (2005).
De Leonardis, E. et al. Directcoupling analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction. Nucleic Acids Res. 43, 10444–10455 (2015).
Weinreb, C. et al. 3D RNA and functional interactions from evolutionary couplings. Cell 165, 963–975 (2016).
Wang, J. et al. Optimization of RNA 3D structure prediction using evolutionary restraints of nucleotidenucleotide interactions from direct coupling analysis. Nucleic Acids Res. 45, 6299–6309 (2017).
Cuturello, F., Tiana, G. & Bussi, G. Assessing the accuracy of directcoupling analysis for RNA contact prediction. RNA 26, 637–647 (2020).
Singh, J., Hanson, J., Paliwal, K. & Zhou, Y. Q. RNA secondary structure prediction using an ensemble of twodimensional deep neural networks and transfer learning. Nat. Commun. 10, 5407 (2019).
Singh, J. et al. Improved RNA secondary structure and tertiary basepairing prediction using evolutionary profile, mutational coupling and twodimensional transfer learning. Bioinformatics 37, in press (2021).
Poblete, S., Bottaro, S. & Bussi, G. A nucleobasecentered coarsegrained representation for structure prediction of RNA motifs. Nucleic Acids Res. 46, 1674–1683 (2018).
Frisch, M. J. T. et al. Gaussian 09 Revision A.02 (Gaussian Inc., 2016).
Zhao, Y. & Truhlar, D. G. The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06class functionals and 12 other functionals. Theor. Chem. Acc. 120, 215–241 (2008).
Richardson, J. S. et al. RNA backbone: consensus allangle conformers and modular string nomenclature (an RNA Ontology Consortium contribution). RNA 14, 465–481 (2008).
Westhof, E. & Sundaralingam, M. Interrelationships between the pseudorotation parameters P and taum and the geometry of the furanose ring. J. Am. Chem. Soc. 102, 1493–1500 (1980).
Xiong, P., Wu, R., Zhan, J. & Zhou, Y. Refining RNA structures by nucleobasecentric sampling with a backbone rotameric and quantummechanicalenergyscaled basebase statistical potential, RNABRiQ. https://doi.org/10.5281/zenodo.4661144 (2021).
Acknowledgements
This work was supported by Australian Research Council DP180102060 and DP210101875 to Y.Z. We also gratefully acknowledge the use of the High Performance Computing Cluster ‘Gowonda’ to complete this research, and the aid of the research cloud resources provided by the Queensland CyberInfrastructure Foundation (QCIF). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan V GPU used for this research. P.X. would like to thank the hospitability of Shenzhen Bay Laboratory while he is a visiting scholar to the laboratory. The support of Shenzhen Science and Technology Program (Grant No. KQTD20170330155106581) and the Major Program of Shenzhen Bay Laboratory S201101001 is acknowledged.
Author information
Authors and Affiliations
Contributions
P.X. designed and performed the method and wrote the manuscript, R.W. assisted in quantummechanical calculations, J.Z. and Y.Z. conceived of the study, participated in the initial design, and assisted in data analysis. Y.Z. drafted the whole manuscript. All authors read, contributed to the discussion, and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Communications thanks Anton Petrov and Eric Westhof for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Xiong, P., Wu, R., Zhan, J. et al. Pairing a highresolution statistical potential with a nucleobasecentric sampling algorithm for improving RNA model refinement. Nat Commun 12, 2777 (2021). https://doi.org/10.1038/s41467021231004
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467021231004
This article is cited by

trRosettaRNA: automated prediction of RNA 3D structure with transformer network
Nature Communications (2023)

Integrating endtoend learning with deep geometrical potentials for ab initio RNA structure prediction
Nature Communications (2023)

RNA Folding Based on 5 Beads Model and Multiscale Simulation
Interdisciplinary Sciences: Computational Life Sciences (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.