Focus on PNA Flexibility and RNA Binding using Molecular Dynamics and Metadynamics

Peptide Nucleic Acids (PNAs) can efficiently target DNA or RNA acting as chemical tools for gene regulation. Their backbone modification and functionalization is often used to increase the affinity for a particular sequence improving selectivity. The understanding of the trading forces that lead the single strand PNA to bind the DNA or RNA sequence is preparatory for any further rational design, but a clear and unique description of this process is still not complete. In this paper we report further insights into this subject, by a computational investigation aiming at the characterization of the conformations of a single strand PNA and how these can be correlated to its capability in binding DNA/RNA. Employing Metadynamics we were able to better define conformational pre-organizations of the single strand PNA and γ-modified PNA otherwise unrevealed through classical molecular dynamics. Our simulations driven on backbone modified PNAs lead to the conclusion that this γ-functionalization affects the single strand preorganization and targeting properties to the DNA/RNA, in agreement with circular dichroism (CD) spectra obtained for this class of compounds. MD simulations on PNA:RNA dissociation and association mechanisms allowed to reveal the critical role of central bases and preorganization in the binding process.


SI1a RMSD trend of ssPNA during 200 ns long MD simulation
In Fig. SI1a we report the Root Mean Square Deviation of ssPNA during 200 ns long MD simulation in order to investigate possible conformational transition with respect the initial helical one.  is smoother for the gamma modified and the RMSD fluctuation afterward considerably tighter denoting a higher degree of constraining and pre-organization.

SI2b: FES local minima convergence
In Fig. SI2b we report the local convergence analysis between four minima C-F as described in the main text. (Figure 2b) The convergence criteria is more flexible than the one employed in SI1b but, due to the large configurational space explored and the larger number of local minima characterizing the FES we can consider qualitatively acceptable these results. In particular we show the recrossing between states (top left), the head to tail (H-T) collective variable behaviour (top right), the sequential stacking recrossing (bottom left) and lastly the local free energy convergence between these four states along 400 ns. We recognize the difficulties in quantitatively converging the FES mainly due to the exploration of the stacking collective variable. However, these results show that WT-  The torsion angles H-N-C-H γ adjacent to the inter-residue amide bond on structures A-F reported in figure 2 (main text) have been monitored according to the following scheme SI2c, and comparing the corresponding torsion angle formed by the pro-R hydrogen in the corresponding monomer in the achiral ssPNA. Scheme SI2c: Torsion angles considered for evaluation of conformation of structures A-F in Fig.2 In Fig. SI2c we report the histogram (weighted for the bias) of the distance between the sequential α-amino acidic hydrogens (red arrows in figure) calculated for the ssPNA (red curve) and γ-ssPNA (green curve). The two series of histograms are based on the WT-MDMT free energies reported in the main text. (Figure 2a and b respectively) As expected for a preorganized single strand the average distances (d1-d5) calculated for γ-ssPNA are significantly shorter than those in ssPNA. Most notably, the histogram resulting from the unmodified PNA shows an appreciable broadening at shorter and longer distances. On the contrary, the statistical distribution calculated for the γ-ssPNA is always very narrow.
The unique exception is represented by the extremity d5 where the two systems behave similarly.
pro-R H γ

SI3 -Re-annealing simulations on ssPNA
The PNA:RNA system considered in this study present the following sequences:

3'-CTTGAG-5'
In figure SI3a we report the base pairing recorded in 200 ns long MD simulations for all 5 different re-annealing.

SI3c H-bonding Fraction
The complete disruption of the helical structure is achieved when the inter-molecular base pairing becomes ineffective. In order to track this phenomenon with our simulations we calculate the h-bonding parameter using the "coordination function" collective-variable as implemented in PLUMED2. For each base, this quantity ranges from 0 to 1 depending on the effective h-bonding interaction. Considering the whole PNA:RNA structure, the total pairing ranges from 0 (completely disrupted) to 6 (optimal pairing). The ratio between the recorded and the theoretical h-bonding at a given time frame defines the h-bonding fraction. Of particular interest is the h-bonding fraction calculated just before the duplex disruption as reported in the main article. We tested a wide range of time frames for calculating the h-bonding fraction just before the duplex disruption. In a sufficiently narrow time range (~10 ns before disruption) the h-bonding fraction is independent on this choice. We selected for the results reported in the main text 0.5 ns as the best time frame for hbonding fraction calculation.

SI4 -MD equilibrated PNA:RNA structure from data bank
In Fig. SI4 we represent the minimized and thermally equilibrated structure of the unmodified PNA:RNA described in the article. This structure is used as the starting point for the MD and WT-MDMT simulations.

SI5 -MD equilibrated γ-modified PNA:RNA structure
In Fig. SI5 we represent the minimized and thermally equilibrated structure of the modified PNA:RNA described in the article. This structure is used as the starting point for the MD and WT-MDMT simulations.

SI6 -Force field validations
As a starting model for assessment of the force field for PNA duplexes, we have chosen the 176D 1 NMR structure (reported in the Protein Data Bank database), which is one of the few PNA:RNA duplex structures reported in literature at the time of this study (crystal structure of PNA:RNA duplex was resolved for the first time at the end of December 2015). 2 The sequences of PNA and RNA are respectively H-GAACTC-Oand 5'-GAGTTC-3'. This duplex was modified by removal of phosphate group bound to 5' residue of RNA, because the chosen force field (ff99sb) recognizes the RNA strand only without that group. In the literature there are no consolidated force fields for PNA, and thus one of our aims was to improve their availability similarly to other biological cases as proteins or nucleic acids that are nowadays widely accepted. In order to test the parameters chosen 3 we performed a 200 ns long Molecular Dynamics simulation on the PNA:RNA duplex described above, using the protocol described in the Methods section. The duplex conformation was stable for all the simulation length with no significant structural modification. This can be inferred by examining the root mean square deviation (RMSD) that is less than 2.5 Å (Fig. SI6a). To better test the force field we checked the characteristic torsion angles of PNA obtained in the MD simulation, compared with those reported in literature. 4 The results are in good agreement with the experimental ones (Fig. SI6b).  SI6b: left, characteristic PNA angles; right, comparison between simulated angles and those reported in the literature. 4 Next, we considered the MD simulation of γ-modified PNA with the force field developed. Also in this case we needed the force field was validated before using it for further investigations on PNA properties. Therefore, we performed a 50 ns long simulation on a duplex obtained by manual insertion of serine side chain in γ position of each monomer of the PNA:RNA duplex 176D used in the previous simulation. This calculation was done with a shorter simulation time because this system was expected to be even more stable than the unmodified one, according to the general properties of γ-modified PNA. Indeed, also in this simulation the duplex resulted perfectly paired for the entire simulation, as proven by the low RMSD (Fig. SI6c). Moreover, characteristic PNA angles were found also in this case to be compatible with those reported in literature (Fig. SI6d). In order to better discriminate stacking stabilized conformations, we decided to use a local order parameter previously developed for describing crystal nucleation. 6 For each base, we defined a vector lying in the plane of the rings. This CV takes into account the distance, the angle and the coordination number between these vectors. Coordination number represents the number of vectors close within a defined cutoff. This parameter is important when studying nucleation since it determines the presence of an aggregate. The PNA considered in our study is 6 mer long and the maximum theoretical coordination number for each vector is then 5. However, our purpose is not to define an aggregate, but to discriminate stacking interactions. Therefore, for each couple of bases, we defined variables in order to have the coordination function ρ i (equation 1.2) always set to one, thus ruling out association. Distances and angles between these vectors are extremely important in stacking description: when two bases are stacked their distance should be defined in a given cutoff range and the angles should assume discrete values. An appropriate definition of this combination allows determining whether two or more bases are stacked (SI7 A) or fully (SI7 B -C) and partially unstacked (SI7 D).

SI7:
Schematic examples of stacking, not stacking and partial stacking arrangements. In blue are represented vectors used to describe local order parameter. A) Two bases are stacked; B) two bases are too distant for stacking interaction; C) two bases are close, but not stacked. In this case distance is favorable, but not the angle; D) Distance between bases is optimal, but angle not completely thus leading to partial stacking.
Going into the details, local order parameter is defined as the product of two sigmoidal and one single gaussian function: where r ij are the distances between the above discussed vectors, r cut is cutoff distance for the stacking interaction, n i is coordination number, n cut is cutoff for coordination number, ϑ ij is the angle between the vectors, ϑ k is a favorable angle for stacking and σ k is the width of the gaussian applied on that angle. Lastly, a and b are exponential factors, determining how steep are the sigmoids. These functions essentially monitor respectively the distance between bases (equation 1.1), the coordination number (equation 1.2) and angle between bases (equation 1.3). As discussed above, the coordination parameters were set to obtain a value of ρ i = 1 (n cut = 1). The distance function f ij was set to have a value of 1 for distance within a range from 5.0 to 6.5 Å, depending on the couple of bases i and j considered. Above the cutoff distance, the f ij value rapidly decreases to 0. The angle function θ ij is a sum of functions, one for each characteristic angle chosen. For each angle ϑ k a Gaussian function that exhibits maximum value of 1 for ϑ k , is defined. In order to have stacking, bases should present opportune values of distances and angles. Based on MD simulation data, the coefficients (r cut , ϑ k , σ k , a) were tuned in order to maximize f ij and θ ij when bases are stacked. The functions here described are referred to a single couple of vectors, but in our system we have several possible couples and also multiple bases coupled at the same time. Description of every single couple is not meaningful alone, so to describe entirely the system we used a linear combination of these local order parameters, defined for couples of bases, and we considered this combination a measure of total stacking (Stk).