Density functional theory studies on cytosine analogues for inducing double-proton transfer with guanine

To induce double-proton transfer (DPT) with guanine in a biological environment, 12 cytosine analogues (Ca) were formed by atomic substitution. The DPT reactions in the Watson–Crick cytosine–guanine model complex (Ca0G) and 12 modified cytosine–guanine complexes (Ca1-12G) were investigated using density functional theory methods at the M06-2X/def2svp level. The intramolecular proton transfers within the analogues are not facile due to high energy barriers. The hydrogen bond lengths of the Ca1-12G complexes are shorter than those in the Ca0G complex, which are conducive to DPT reactions. The DPT energy barriers of Ca1-12G complexes are also lower than that of the Ca0G complex, in particular, the barriers in the Ca7G and Ca11G complexes were reduced to −1.33 and −2.02 kcal/mol, respectively, indicating they are significantly more prone to DPT reactions. The DPT equilibrium constants of Ca1-12G complexes range from 1.60 × 100 to 1.28 × 107, among which the equilibrium constants of Ca7G and Ca11G are over 1.0 × 105, so their DPT reactions may be adequate. The results demonstrate that those cytosine analogues, especially Ca7 and Ca11, are capable of inducing DPT with guanine, and then the guanine tautomer will form mismatches with thymine during DNA replication, which may provide new strategies for gene therapy.

through the barrier 12 . When such double proton tunneling occurs during DNA synthesis, single base substitutions will occur. Fu et al. showed that among several possible DNA mutation processes, only the DPT mechanism could fully explain the universal mutation bias 9 .
Previous studies have focused on various environmental factors that might facilitate intermolecular proton transfer within base pairs. Zhang, J. D. et al. showed that when an additional hydride is placed on the C 6 and C 4 positions of cytosine, the anionic complex formed, which facilitates guanine H 1 transfer to the N 3 site of cytosine 13 . Noguera, M. et al. found that protonation at the N 7 and O 6 sites of guanine, affording H + G N7 C and H + G O6 C, strengthens the binding of the base pair and facilitates the N 1 -N 3 single-proton-transfer reaction 14 . The influences of metal cation (Cu + , Ca 2+ and Cu 2+ ) coordination to the CG and AT base pairs on intermolecular proton transfer were studied using density functional theory (DFT) methods 15,16 . Alya, A. A. et al. investigated the DPT reaction of a GC base pair under the effect of uniform electric fields on the order of 10 8 to 10 9 Vm −1 . They considered that fields applied along the axis of the double proton transfer in the -x (defined in the C to G direction) direction favor the canonical over the rare tautomers 12 . Chen, H. Y. et al. 17 showed that the substantial effect of GC stacking originates from the electrostatic interactions between the dipoles of the outer GC base pairs and the middle GC •− base-pair radical anion, the extent of the charge delocalization is very small and has little effect on proton transfer in GC •− . The effect of the surrounding water molecules on the DPT in GC was investigated using DFT methods, and the results demonstrate that water is crucial to the proton reactions. It does not act as a passive element but actually catalyzes the DPT 18 .
Thus far, few studies have examined the effects of neutral cytosine analogues on the DPT reaction in CG base pare. The purpose of this study was to use existing physic-chemical understanding and predictive capability to identify neutral cytosine analogues that can induce DPT reactions with guanine under physiological conditions and provide new strategies for gene therapy.

Results and Discussions
Molecular structures of the modeled cytosine analogues. To provide minimized structures to mimic the Watson-Crick base pairing in duplex DNA or RNA, the sugar-attachment sites of cytosine and guanine were methylated 19 . To build a library of cytosine analogues, multiple atoms on the cytosine were replaced. To improve its ability to donate protons, the amino group at C 4 was replaced by a hydroxy moiety. To improve the ability of N 3 to accept protons, the carbonyl oxygen at C 2 was replaced by a hydrogen, and separately, the C 2 carbon atom was replaced by a boron atom. In addition, C 5 was replaced by a nitrogen atom. The carbon atom at C 6 was replaced with methine, methylene, and carbonyl moieties. A total of 12 cytosine analogues were modeled (Fig. 2). energy barriers and rate constants of the intramolecular proton transfer. First, the structures of all the monomers, Ca 0-12 , were optimized by DFT M06-2X/def2svp method. The vibration analysis showed no imaginary frequencies in those monomers, indicating that the structures are stable at the current calculation level. Because both the O 4 atom's ability to provide protons and the N 3 atom's ability to acquire them are enhanced, the modeled cytosine analogues may undergo intramolecular proton transfer (H 4 transfer from O 4 to N 3 ). Therefore, the intramolecular single proton transfer (SPT) reactions of all monomers were investigated. The calculated SPT energy barriers (ΔG ≠ ) within the cytosine analogues ranged from 24.51 to 32.23 kcal/mol ( Table 1). The forward reaction rate constants (calculated from ΔG ≠ ) of the intramolecular proton transfer process range from 3.40 × 10 −5 to 1.23 × 10 −10 s −1 , which indicate that the intramolecular SPT process are not facile. Therefore, the barriers of all cytosine analogues are relatively high for the proton to hop, indicating that these molecules can be used as candidate molecules to induce DPT reactions with guanine.
Due to the rotational orientation of the hydroxyl group, cytosine analogues have possible conformers in which the H 4 atom is oriented toward the C 5 or N 5 atom, as shown in Fig. 2. The conformers were optimized by DFT at the current level and no imaginary frequencies were found. The present calculations show all the analogues have lower energy compared with their conformers (Table 1), indicating that these analogues are more stable than theirs conformers. the vdW surfaces and eSp extrema of the cytosine analogues. The electrostatic potential (ESP)-mapped van der Waals (vdW) surface has been used extensively for interpreting and predicting reactivity    Table 1. The values of the barrier of the intramolecular SPT reactions (ΔG ≠ ), the energy difference between analogues and their conformers (ΔE), the forward and reverse barriers of the intermolecular DPT reactions (ΔG ≠ f and ΔG ≠ r ), the forward and reverse reaction rate constants of intermolecular DPT reactions (k f and k r ), and the equilibrium constants (K eq ). All energy barrier values are given in kcal/mol. (2020) 10:9671 | https://doi.org/10.1038/s41598-020-66530-8 www.nature.com/scientificreports www.nature.com/scientificreports/ and intermolecular interactions of a wide variety of chemical systems 20,21 . The negative charge of the Ca 0 molecule is concentrated on the O 2 atom with an ESP extrema value of −71.53 kcal/mol. The positive electrostatic potential is centered around H 4a of the amino group with an ESP extrema value of +38.77 kcal/mol ( Fig. 3. Ca 0 ). When the amino group at the C 4 position of the cytosine was replaced by a hydroxy moiety, the ESP extrema value was increased to +47.47 kcal/mol, which is favorable for nucleophilic attack and release of the proton (Fig. 3. Ca 1 ). Furthermore, when the C 5 carbon atom was replaced by a nitrogen atom and the carbon atom at C 6 was a methine, methylene, or carbonyl, the ESP extrema value of H 4 was increased to +52.06 kcal/mol, +52.86 kcal/ mol and +69.67 kcal/mol, respectively, which are all favorable for nucleophilic attack and release of the proton (Fig. 3. Ca 2 -Ca 4 ). When the carbonyl oxygen at C 2 was replaced by a hydrogen, the negative charges and the ESP extrema shift to the N 3 atom, which is favorable for electrophilic attack and accepting a proton at the N 3 site (Fig. 3. Ca 5 -Ca 8 ). Furthermore, when the C 2 carbon atom was replaced by a boron atom, the ESP extrema values of N 3 are strengthen to −45.65 kcal/mol, −39.81 kcal/mol, −51.50 kcal/mol and −33.59 kcal/mol, which are all more favorable for electrophilic attack and accepting a proton at the N 3 site (Fig. 3. Ca 9 -Ca 12 ). So, these atomic substitutions, especially the substitutions of a hydroxy moiety for the amino group, a hydrogen for the carbonyl oxygen at C 2 and a boron atom for the C 2 carbon atom, are all favorable to induce DPT reactions with guanine.
Energy barriers and equilibrium constants of the DPT reaction. The structures of all the Ca n G complexes were optimized by DFT at the current level. The vibration analysis showed no imaginary frequencies in these complexes, indicating that the structures are stable. Intermolecular DPT, which generates hydrogen-bonded pairs of tautomers, was computationally predicted for all the complexes. DPT can take place through a concerted DPT (Ca 1-3 G and Ca 5-12 G) or via a stepwise mechanism involving two distinct SPT steps (Ca 0 G and Ca 4 G). The vibration analysis of each proton transfer reaction showed one imaginary frequency, and the vibration mode of the imaginary frequency corresponds to the reactants and products assigned to the transition state. An intrinsic reaction coordinate was prepared to confirm the existence of the transition states.
The Gibbs free energies of the fully optimized Ca 0 G complexes were defined as having zero energy. The relative Gibbs free energies of the optimized reactants, the first and second single-proton transition states, the double-proton transferred states, the intermediate product of single proton transfer, and the product of double-proton transfer were calculated.
The The forward DPT free energy barriers of the 12 analogue complexes (ΔG ≠ f ) range from −2.02 to 1.07 kcal/ mol, which are significantly lower than the value of Ca 0 G (8.80 kcal/mol). The barriers increased in the order Ca 11 G < Ca 7 G < Ca 3 G < Ca 12 G < Ca 5 G < Ca 4 G < Ca 8 G < Ca 6 G < Ca 10 G < 0 < Ca 9 G < Ca 2 G < Ca 1 G<<Ca 0 G. Lower energy barriers favor the DPT reaction, especially for complexes with negative barriers. The forward rate constants (calculated from ΔG ≠ f ) of the DPT process range from 1.14 × 10 12 to 1.72 × 10 14 s −1 , which are significantly higher than the value of Ca 0 G (4.05 × 10 6 s −1 ). The reverse DPT free energy barriers of the 12 analogue complexes range from 0.97 to 8.06 kcal/mol, which are higher than the value of Ca 0 G (0.71 kcal/mol). In addition, the reverse rate constants (calculated from ΔG ≠ r ) of the DPT process range from 1.34 × 10 7 to 1.34 × 10 12 s −1 , which are lower than the value of Ca 0 G (2.04 × 10 12 ). The DPT equilibrium (k eq = k f /k r ) ranges from 1.60 × 10 0 to 1.28 × 10 7 , which is significantly higher than the value of Ca 0 G (1.99 × 10 −6 ). The DPT equilibrium constants increased in the order Ca 0 G<<0 < Ca 2 G < Ca 1 G < Ca 4 G < Ca 6 G < Ca 5 G < Ca 3 G < Ca 9 G < Ca 8 G < Ca 10 G < Ca 1 2 G < Ca 7 G < Ca 11 G. For all 12 analogue complexes with positive equilibrium constants, the reaction may occur in the DPT direction. In particular, the DPT equilibrium constants of Ca 7 G and Ca 11 G (1.49 × 10 5 and 1.28 × 10 7 ) are over 1.0 × 10 5 , which indicates that these two proton transfer reactions will occur adequately. All the values are shown in Table 1.

Geometries of the Dpt reactions of the analogue complexes.
In the Ca 0 G complex, the intermonomer N 3 − N′ 1 distance (as labeled in Fig. 1) is 2.92 Å. This distance is consistent with the calculated distance (2.95 Å) and the experimental distance (3.09 Å) 12 , as is the distances of N 4 − O′ 6 . As the reaction progresses from the reactant to transition state 1 (TS 1 ), the intermediate product of the single proton transfer (SPT) and TS 2 , the guanine and cytosine monomers approach each other. The N 3 − N′ 1 distances in TS 1 , SPT and TS 2 are 2.65 Å, 2.72 Å and 2.78 Å, respectively. As the reaction proceeds, the N′ 1 − H′ 1 bond stretches faster than the N 4 − H 4a bond and completes the proton transfer first. In the products, the N 3 − N′ 1 bridge elongates again to 2.86 Å, which is close to their original separation in the reactants.
In the Ca 1-12 G analogue complexes, the intermonomer N 3 − N′ 1 distance and O 4 -O′ 6  www.nature.com/scientificreports www.nature.com/scientificreports/ complex. As the reaction progresses from the reactant to TS, most of the N 3 − N′ 1 distances and O 4 -O′ 6 distances are shorter than the distance in the Ca 0 G complex and favor the DPT reaction. In the products, the N 3 − N′ 1 and O 4 -O′ 6 bridges elongate again, and more than half of these bridges are longer than the distance in the Ca 0 G DPT product, which indicates that the DPT reverse reaction in those complexes is more difficult than it is in the Ca 0 G DPT product ( Table 2).

Effects of cytosine atom substitution on the DPT reaction.
When the amino group at the C 4 position of the cytosine was replaced by a hydroxy moiety and the other conditions were held constant, the energy barrier of the DPT reaction decreased by 7.73 kcal/mol. This result indicates that the hydroxy group significantly favored the DPT reaction. Since the electronegativity of oxygen atom is lower than that of nitrogen atom, the attraction of oxygen to proton is less than that of nitrogen, which is the possible reason that favors the transfer of H 4 from O 4 to N′ 6 . www.nature.com/scientificreports www.nature.com/scientificreports/ When the C 2 carbonyl oxygen atoms of the cytosine analogues were replaced by hydrogen, the energy barriers of the DPT decreased by an average of 1.01 kcal/mol. Replacing C 2 with a boron atom further decreased the energy barriers of the DPT by an average of 0.06 kcal/mol, further favoring the DPT reaction. Removing the carbonyl group from C 2 and replacing C 2 with a less electronegative boron atom can transfer the negative charge to N 3 , which is conducive to the ability of N 3 to acquire proton and favors the transfer of H' 1 from N′ 1 to N 3 . A large number of BN-containing heterocycles have been developed by means of the substitution of a carbon-carbon double bond with an isoelectronic boron-nitrogen bond (BN-substitution) 26,27 . BN-substituted nucleobases are very attractive BN-substituted heterocyclic compounds. Bielawski et al. 28 reported the synthesis of B(6)-phenyl-BN-uracil, which was characterized by mass spectrometry, IR spectroscopy and elemental analysis. It is interesting to note that Hiroshi et al. 26 also reported the synthesis of B(6)-substituted 5-aza-6-borauracils (BN-substituted uracils) and -thymines (BN-substituted thymines). The structures of these BN-substituted nucleobases are similar to the cytosine analogues in present study. Therefore, these boron-containing cytosine analogues can be synthesized as realistic candidates.
When the C 5 carbon atom of the cytosine analogues was replaced by a nitrogen atom, the energy barriers of the DPT decreased by an average of 0.03 kcal/mol. When the double bonds between N 5 and C 6 of these analogues were hydrogenated to single bonds, the energy barriers of the DPT further decreased by an average of 1.62 kcal/ mol. However, when the hydrogen on C 6 was exchanged for a carbonyl oxygen, the energy barriers of the DPT increased again. Therefore, the second result seemed to indicate that the DPT reaction was favored. Replacing C 5 with highly electronegative nitrogen atom may transfer the negative charge from O 4 to N 5 , which will reduce the attraction of O 4 to proton and favors the transfer of H 4 from O 4 to N′ 6 .

conclusions
The abilities of 12 modified cytosine-guanine complexes to undergo DPT were predicted and compared using theoretical calculations. The DE value, DPT barriers, hydrogen bond lengths and equilibrium constants of the Ca 0 G complex are similar to previously calculated values and experimental values, indicating that the present calculation method is reasonable. Because of the intramolecular SPT process are not facile, the 12 modeled cytosine analogues can be used as candidate molecules to induce DPT reactions with guanine. Eight modified complexes (Ca 3 G, Ca 5 G and Ca 7-12 G) were significantly more prone to undergo DPT reactions than Ca 0 G. In particular, Ca 7 G and Ca 11 G may undergo sufficient DPT reactions under physiological conditions according to the present calculations. Here, we present the part of the study on theoretical calculations and predictions of candidate molecules. Next these analogues are expected to be synthesized for incorporation into single-stranded RNAs and to  Table 2. Distances (in Å) between the electronegative atoms involved in the two H-bonds in Ca n G during the DPT process.
be validated in in-vivo model. By binding such modified RNA to DNA or RNA, DPT reactions of guanine may be induced at a fixed point. If cytosine analogues that can spontaneously induce DPT reactions of guanine under physiological conditions are identified experimentally, targeted pathogenic mutations can be used to restore original functions at the level of DNA or RNA.

theoretical Methods
DFT with the M06-2X functional and the def2svp basis set has been the primary research method used to investigate proton transfer reactions between cytosine analogues and guanine. The M06-2X functional is a high nonlocality functional with twice the nonlocal exchange (2X), and it is considered suitable for the study of main-group thermochemistry, thermochemical kinetics, noncovalent interactions, and electronic excitation energies to the valence and Rydberg states. The M06-2X functional also gives the best performance for hydrogen-transfer barrier calculations and the lowest values of balanced mean unsigned error (BMUE), which means it gives the best overall performance for barrier calculations [29][30][31] . The D3 dispersion correction was used in this study to improve the accuracy of the energy and structure calculations [32][33][34][35] . To simulate the DNA surroundings in a biological environment, all the calculations were carried out with water solvation at T = 310.15 K (37 °C) and p = 1 atm. The sophisticated polarizable continuum model 36 has been used to investigate solute-solvent interactions in water using the scrf = (solvent = water, pcm) keyword. The Gaussian 09 program package was used throughout this study 37 . The structures of all the monomers and complexes were optimized by DFT at the current level. The vibration analyses were performed to determine whether the molecular structures were stable. The H 4 and H' 1 hydrogen atoms of the optimized complexes were placed in the middle points between two electronegative atoms (N 3 -N′ 1 , N 4 -O′ 6 or O 4 -O′ 6 ) and then the structures of the transition states were optimized using the opt = (calcall, ts, noeigen) keyword. The vibration analyses were performed to determine the existence of the transition states. The forward energy barrier, ΔG ≠ f , is the difference between the Gibbs free energy of the transferred states and the reactants. The reverse energy barrier, ΔG ≠ r , is the difference between the Gibbs free energy of the transferred states and the products.
The GaussView 5.0 38 was used to create molecular structures of the modeled cytosine analogues. The molecular structure of classic cytosine, at first, was created and optimized by the program package. And then the 12 modeled cytosine analogues were created by replacing the atoms in the classic cytosine and optimized by the program package.
To investigate the ability of N 3 to accept protons and O 4 to donate protons, the ESP-mapped vdW surfaces and ESP extrema were rendered using the VMD 1.9.1 program 39 based on the outputs from the Multiwfn 3.6 program 40,41 .
To investigate the equilibrium of the DPT reactions and chemical reactions of the analogue complexes, the forward and reverse rate constants (k f and k r ) were given by the following equation 42 = . × Where k is the forward or reverse rate constants, T=310.15 K (37 °C), ΔG ≠ is the forward or reverse energy barriers. The equilibrium constants (K eq values) of the DPT reactions are the quotients of k f and k r (K eq = k f /k r ).