Introduction

Genetic mutations, alterations in the DNA sequence, have been identified as the cause of genetic disorders, tumor development and drug resistance in pathogenic microorganisms1,2,3. There are several types of mutations that can occur in DNA, such as copy number variations, duplications, deletions, insertion and single base substitutions, and the latter is the most common type (two-thirds of human genetic diseases are due to single-base alterations)4. Therefore, research on the mechanism of point mutations and single-base editing gene therapies has become an attractive topic4,5,6.

In double-helical DNA macromolecules, the complementarity between pyrimidines (cytosine and thymine) and purines (guanine and adenine) are due to hydrogen bonds. Each hydrogen bonding is essentially a proton trapped in an asymmetric double well potential between two electronegative atoms7. It has been hypothesized that proton transfer within the double well potential is an important mechanism of base substitution7,8. Tautomerism in guanine–cytosine base pairs resulting from intermolecular proton transfer has been suggested to be responsible for the universal guanine–cytosine to adenine–thymine mutation frequently observed in bacteria, fungi, plants, and animals9. The tautomers, indicated by an asterisks (C* and G*), resulting from the transfer of cytosine H4a to guanine and the back transfer from guanine H’1 to cytosine cannot form hydrogen-bonded complexes with their natural counterparts but can form mismatches with the bases A and T, respectively, during DNA replication (Fig. 1)7,10. Under normal circumstances, the relatively high energy barrier within the double well potential prevents the forward double-proton transfer (DPT) reaction, which accounts for the stability of DNA and the fidelity with which the genetic code is preserved and transmitted to daughter cells11,12. However, when the quantum nature of the proton is taken into account, the proton has a small yet finite probability of tunneling through the barrier12. When such double proton tunneling occurs during DNA synthesis, single base substitutions will occur. Fu et al. showed that among several possible DNA mutation processes, only the DPT mechanism could fully explain the universal mutation bias9.

Figure 1
figure 1

Structures and atomic numberings of the base pairs generated by GaussView 5.0. Top: Canonical C-G base pair and C*-G* base pair resulting from DPT; bottom: tautomers C* and G* mismatched to the bases A and T, respectively.

Previous studies have focused on various environmental factors that might facilitate intermolecular proton transfer within base pairs. Zhang, J. D. et al. showed that when an additional hydride is placed on the C6 and C4 positions of cytosine, the anionic complex formed, which facilitates guanine H1 transfer to the N3 site of cytosine13. Noguera, M. et al. found that protonation at the N7 and O6 sites of guanine, affording H+ GN7C and H+GO6C, strengthens the binding of the base pair and facilitates the N1–N3 single-proton-transfer reaction14. The influences of metal cation (Cu+, Ca2+ and Cu2+) coordination to the CG and AT base pairs on intermolecular proton transfer were studied using density functional theory (DFT) methods15,16. Alya, A. A. et al. investigated the DPT reaction of a GC base pair under the effect of uniform electric fields on the order of 108 to 109 Vm−1. They considered that fields applied along the axis of the double proton transfer in the -x (defined in the C to G direction) direction favor the canonical over the rare tautomers12. Chen, H. Y. et al.17 showed that the substantial effect of GC stacking originates from the electrostatic interactions between the dipoles of the outer GC base pairs and the middle GC•− base-pair radical anion, the extent of the charge delocalization is very small and has little effect on proton transfer in GC•−. The effect of the surrounding water molecules on the DPT in GC was investigated using DFT methods, and the results demonstrate that water is crucial to the proton reactions. It does not act as a passive element but actually catalyzes the DPT18.

Thus far, few studies have examined the effects of neutral cytosine analogues on the DPT reaction in CG base pare. The purpose of this study was to use existing physic-chemical understanding and predictive capability to identify neutral cytosine analogues that can induce DPT reactions with guanine under physiological conditions and provide new strategies for gene therapy.

Results and Discussions

Molecular structures of the modeled cytosine analogues

To provide minimized structures to mimic the Watson–Crick base pairing in duplex DNA or RNA, the sugar-attachment sites of cytosine and guanine were methylated19. To build a library of cytosine analogues, multiple atoms on the cytosine were replaced. To improve its ability to donate protons, the amino group at C4 was replaced by a hydroxy moiety. To improve the ability of N3 to accept protons, the carbonyl oxygen at C2 was replaced by a hydrogen, and separately, the C2 carbon atom was replaced by a boron atom. In addition, C5 was replaced by a nitrogen atom. The carbon atom at C6 was replaced with methine, methylene, and carbonyl moieties. A total of 12 cytosine analogues were modeled (Fig. 2).

Figure 2
figure 2

Molecular structures of the modeled cytosine analogues (Ca). The atoms that had being replaced were shown in red. The box showing the conformers of Ca1 and Ca2.

Energy barriers and rate constants of the intramolecular proton transfer

First, the structures of all the monomers, Ca0-12, were optimized by DFT M06-2X/def2svp method. The vibration analysis showed no imaginary frequencies in those monomers, indicating that the structures are stable at the current calculation level. Because both the O4 atom’s ability to provide protons and the N3 atom’s ability to acquire them are enhanced, the modeled cytosine analogues may undergo intramolecular proton transfer (H4 transfer from O4 to N3). Therefore, the intramolecular single proton transfer (SPT) reactions of all monomers were investigated. The calculated SPT energy barriers (ΔG) within the cytosine analogues ranged from 24.51 to 32.23 kcal/mol (Table 1). The forward reaction rate constants (calculated from ΔG) of the intramolecular proton transfer process range from 3.40 × 10−5 to 1.23 × 10−10 s−1, which indicate that the intramolecular SPT process are not facile. Therefore, the barriers of all cytosine analogues are relatively high for the proton to hop, indicating that these molecules can be used as candidate molecules to induce DPT reactions with guanine.

Table 1 The values of the barrier of the intramolecular SPT reactions (ΔG), the energy difference between analogues and their conformers (ΔE), the forward and reverse barriers of the intermolecular DPT reactions (ΔGf and ΔGr), the forward and reverse reaction rate constants of intermolecular DPT reactions (kf and kr), and the equilibrium constants (Keq). All energy barrier values are given in kcal/mol.

Due to the rotational orientation of the hydroxyl group, cytosine analogues have possible conformers in which the H4 atom is oriented toward the C5 or N5 atom, as shown in Fig. 2. The conformers were optimized by DFT at the current level and no imaginary frequencies were found. The present calculations show all the analogues have lower energy compared with their conformers (Table 1), indicating that these analogues are more stable than theirs conformers.

The vdW surfaces and ESP extrema of the cytosine analogues

The electrostatic potential (ESP)-mapped van der Waals (vdW) surface has been used extensively for interpreting and predicting reactivity and intermolecular interactions of a wide variety of chemical systems20,21. The negative charge of the Ca0 molecule is concentrated on the O2 atom with an ESP extrema value of −71.53 kcal/mol. The positive electrostatic potential is centered around H4a of the amino group with an ESP extrema value of +38.77 kcal/mol (Fig. 3. Ca0). When the amino group at the C4 position of the cytosine was replaced by a hydroxy moiety, the ESP extrema value was increased to +47.47 kcal/mol, which is favorable for nucleophilic attack and release of the proton (Fig. 3. Ca1). Furthermore, when the C5 carbon atom was replaced by a nitrogen atom and the carbon atom at C6 was a methine, methylene, or carbonyl, the ESP extrema value of H4 was increased to +52.06 kcal/mol, +52.86 kcal/mol and +69.67 kcal/mol, respectively, which are all favorable for nucleophilic attack and release of the proton (Fig. 3. Ca2-Ca4). When the carbonyl oxygen at C2 was replaced by a hydrogen, the negative charges and the ESP extrema shift to the N3 atom, which is favorable for electrophilic attack and accepting a proton at the N3 site (Fig. 3. Ca5-Ca8). Furthermore, when the C2 carbon atom was replaced by a boron atom, the ESP extrema values of N3 are strengthen to −45.65 kcal/mol, −39.81 kcal/mol, −51.50 kcal/mol and −33.59 kcal/mol, which are all more favorable for electrophilic attack and accepting a proton at the N3 site (Fig. 3. Ca9-Ca12). So, these atomic substitutions, especially the substitutions of a hydroxy moiety for the amino group, a hydrogen for the carbonyl oxygen at C2 and a boron atom for the C2 carbon atom, are all favorable to induce DPT reactions with guanine.

Figure 3
figure 3

ESP-mapped vdW surfaces and ESP extrema of cytosine analogues generated by Multiwfn 3.6 and VMD 1.9.1 (http://www.ks.uiuc.edu/Research/vmd/). The red color indicates positive ESP regions on the vdW surface; the blue regions correspond to negative ESP regions of the vdW surface. The significant local maxima and minima of ESP on the vdW surfaces are represented as orange and cyan spheres, and labelled by values given in kcal/mol.

Energy barriers and equilibrium constants of the DPT reaction

The structures of all the CanG complexes were optimized by DFT at the current level. The vibration analysis showed no imaginary frequencies in these complexes, indicating that the structures are stable. Intermolecular DPT, which generates hydrogen-bonded pairs of tautomers, was computationally predicted for all the complexes. DPT can take place through a concerted DPT (Ca1-3G and Ca5-12G) or via a stepwise mechanism involving two distinct SPT steps (Ca0G and Ca4G). The vibration analysis of each proton transfer reaction showed one imaginary frequency, and the vibration mode of the imaginary frequency corresponds to the reactants and products assigned to the transition state. An intrinsic reaction coordinate was prepared to confirm the existence of the transition states.

The Gibbs free energies of the fully optimized Ca0G complexes were defined as having zero energy. The relative Gibbs free energies of the optimized reactants, the first and second single-proton transition states, the double-proton transferred states, the intermediate product of single proton transfer, and the product of double-proton transfer were calculated.

The Ca0G complex, the model of the Watson–Crick guanine–cytosine (GC) base pair, undergoes a stepwise DPT process. The dissociation energy (DE) of the Ca0G base pair at the current level of theory is 21.74 kcal/mol, which is in good agreement with the reported experimental value of 21.0 kcal/mol22. However, the previously calculated DE values were 25.4 kcal/mol23, 23.8 kcal/mol24, and 24.4 kcal/mol25. This result indicates that the present calculations are a better mimic of physiological conditions. The energy of the DPT product of Ca0G lies 9.15 kcal/mol above the reactant species, which is similar to previously calculated values of 9.8 kcal/mol14 and disfavors the DPT reaction. The Ca0G–Ca0*G* equilibrium (with a calculated equilibrium constant of 1.99 × 10−6, which is in good agreement with previously estimated values of 2.0 × 10−6)12 largely favors the reactants, so this double-proton-transfer reaction will rarely occur. In fact, we also calculated the dissociation energy of Ca0G and the equilibrium constant of the DPT reaction using the DFT B3LYP/6-311 + +G(d,p) method. And the calculated values were 13.25 kcal/mol and 4.53 × 10−8 respectively, which were quite different from the experimental values (21.0 kcal/mol) of the dissociation energy and the previously estimated values (2.0 × 10−6) of the equilibrium constant. So, these results calculated by the DFT M06-2X/def2svp method are in good agreement with previous findings related to the DPT reaction of GC base pairs, indicated the calculations are more reasonable.

The forward DPT free energy barriers of the 12 analogue complexes (ΔGf) range from −2.02 to 1.07 kcal/mol, which are significantly lower than the value of Ca0G (8.80 kcal/mol). The barriers increased in the order Ca11G < Ca7G < Ca3G < Ca12G < Ca5G < Ca4G < Ca8G < Ca6G < Ca10G < 0 < Ca9G < Ca2G < Ca1G<<Ca0G. Lower energy barriers favor the DPT reaction, especially for complexes with negative barriers. The forward rate constants (calculated from ΔGf) of the DPT process range from 1.14 × 1012 to 1.72 × 1014 s−1, which are significantly higher than the value of Ca0G (4.05 × 106 s−1). The reverse DPT free energy barriers of the 12 analogue complexes range from 0.97 to 8.06 kcal/mol, which are higher than the value of Ca0G (0.71 kcal/mol). In addition, the reverse rate constants (calculated from ΔGr) of the DPT process range from 1.34 × 107 to 1.34 × 1012 s−1, which are lower than the value of Ca0G (2.04 × 1012). The DPT equilibrium (keq = kf/kr) ranges from 1.60 × 100 to 1.28 × 107, which is significantly higher than the value of Ca0G (1.99 × 10−6). The DPT equilibrium constants increased in the order Ca0G<<0 < Ca2G < Ca1G < Ca4G < Ca6G < Ca5G < Ca3G < Ca9G < Ca8G < Ca10G < Ca12G < Ca7G < Ca11G. For all 12 analogue complexes with positive equilibrium constants, the reaction may occur in the DPT direction. In particular, the DPT equilibrium constants of Ca7G and Ca11G (1.49 × 105 and 1.28 × 107) are over 1.0 × 105, which indicates that these two proton transfer reactions will occur adequately. All the values are shown in Table 1.

Geometries of the DPT reactions of the analogue complexes

In the Ca0G complex, the intermonomer N3 − N′1 distance (as labeled in Fig. 1) is 2.92 Å. This distance is consistent with the calculated distance (2.95 Å) and the experimental distance (3.09 Å)12, as is the distances of N4 − O′6. As the reaction progresses from the reactant to transition state 1 (TS1), the intermediate product of the single proton transfer (SPT) and TS2, the guanine and cytosine monomers approach each other. The N3 − N′1 distances in TS1, SPT and TS2 are 2.65 Å, 2.72 Å and 2.78 Å, respectively. As the reaction proceeds, the N′1 − H′1 bond stretches faster than the N4 − H4a bond and completes the proton transfer first. In the products, the N3 − N′1 bridge elongates again to 2.86 Å, which is close to their original separation in the reactants.

In the Ca1-12G analogue complexes, the intermonomer N3 − N′1 distance and O4-O′6 distance range from 2.70 to 2.81 Å and 2.46 to 2.61 Å, respectively, which are all shorter than those in the Ca0G complex (2.92 Å and 2.85 Å), indicating that the forward DPT reaction of those complexes is more favorable than that of the Ca0G complex. As the reaction progresses from the reactant to TS, most of the N3 − N′1 distances and O4-O′6 distances are shorter than the distance in the Ca0G complex and favor the DPT reaction. In the products, the N3 − N′1 and O4-O′6 bridges elongate again, and more than half of these bridges are longer than the distance in the Ca0G DPT product, which indicates that the DPT reverse reaction in those complexes is more difficult than it is in the Ca0G DPT product (Table 2).

Table 2 Distances (in Å) between the electronegative atoms involved in the two H-bonds in CanG during the DPT process.

Effects of cytosine atom substitution on the DPT reaction

When the amino group at the C4 position of the cytosine was replaced by a hydroxy moiety and the other conditions were held constant, the energy barrier of the DPT reaction decreased by 7.73 kcal/mol. This result indicates that the hydroxy group significantly favored the DPT reaction. Since the electronegativity of oxygen atom is lower than that of nitrogen atom, the attraction of oxygen to proton is less than that of nitrogen, which is the possible reason that favors the transfer of H4 from O4 to N′6.

When the C2 carbonyl oxygen atoms of the cytosine analogues were replaced by hydrogen, the energy barriers of the DPT decreased by an average of 1.01 kcal/mol. Replacing C2 with a boron atom further decreased the energy barriers of the DPT by an average of 0.06 kcal/mol, further favoring the DPT reaction. Removing the carbonyl group from C2 and replacing C2 with a less electronegative boron atom can transfer the negative charge to N3, which is conducive to the ability of N3 to acquire proton and favors the transfer of H’1 from N′1 to N3. A large number of BN-containing heterocycles have been developed by means of the substitution of a carbon-carbon double bond with an isoelectronic boron-nitrogen bond (BN-substitution)26,27. BN-substituted nucleobases are very attractive BN-substituted heterocyclic compounds. Bielawski et al.28 reported the synthesis of B(6)-phenyl-BN-uracil, which was characterized by mass spectrometry, IR spectroscopy and elemental analysis. It is interesting to note that Hiroshi et al.26 also reported the synthesis of B(6)-substituted 5-aza-6-borauracils (BN-substituted uracils) and -thymines (BN-substituted thymines). The structures of these BN-substituted nucleobases are similar to the cytosine analogues in present study. Therefore, these boron-containing cytosine analogues can be synthesized as realistic candidates.

When the C5 carbon atom of the cytosine analogues was replaced by a nitrogen atom, the energy barriers of the DPT decreased by an average of 0.03 kcal/mol. When the double bonds between N5 and C6 of these analogues were hydrogenated to single bonds, the energy barriers of the DPT further decreased by an average of 1.62 kcal/mol. However, when the hydrogen on C6 was exchanged for a carbonyl oxygen, the energy barriers of the DPT increased again. Therefore, the second result seemed to indicate that the DPT reaction was favored. Replacing C5 with highly electronegative nitrogen atom may transfer the negative charge from O4 to N5, which will reduce the attraction of O4 to proton and favors the transfer of H4 from O4 to N′6.

Conclusions

The abilities of 12 modified cytosine–guanine complexes to undergo DPT were predicted and compared using theoretical calculations. The DE value, DPT barriers, hydrogen bond lengths and equilibrium constants of the Ca0G complex are similar to previously calculated values and experimental values, indicating that the present calculation method is reasonable. Because of the intramolecular SPT process are not facile, the 12 modeled cytosine analogues can be used as candidate molecules to induce DPT reactions with guanine. Eight modified complexes (Ca3G, Ca5G and Ca7-12G) were significantly more prone to undergo DPT reactions than Ca0G. In particular, Ca7G and Ca11G may undergo sufficient DPT reactions under physiological conditions according to the present calculations. Here, we present the part of the study on theoretical calculations and predictions of candidate molecules. Next these analogues are expected to be synthesized for incorporation into single-stranded RNAs and to be validated in in-vivo model. By binding such modified RNA to DNA or RNA, DPT reactions of guanine may be induced at a fixed point. If cytosine analogues that can spontaneously induce DPT reactions of guanine under physiological conditions are identified experimentally, targeted pathogenic mutations can be used to restore original functions at the level of DNA or RNA.

Theoretical Methods

DFT with the M06-2X functional and the def2svp basis set has been the primary research method used to investigate proton transfer reactions between cytosine analogues and guanine. The M06-2X functional is a high nonlocality functional with twice the nonlocal exchange (2X), and it is considered suitable for the study of main-group thermochemistry, thermochemical kinetics, noncovalent interactions, and electronic excitation energies to the valence and Rydberg states. The M06-2X functional also gives the best performance for hydrogen-transfer barrier calculations and the lowest values of balanced mean unsigned error (BMUE), which means it gives the best overall performance for barrier calculations29,30,31. The D3 dispersion correction was used in this study to improve the accuracy of the energy and structure calculations32,33,34,35. To simulate the DNA surroundings in a biological environment, all the calculations were carried out with water solvation at T = 310.15 K (37 °C) and p = 1 atm. The sophisticated polarizable continuum model36 has been used to investigate solute–solvent interactions in water using the scrf = (solvent = water, pcm) keyword. The Gaussian 09 program package was used throughout this study37. The structures of all the monomers and complexes were optimized by DFT at the current level. The vibration analyses were performed to determine whether the molecular structures were stable. The H4 and H’1 hydrogen atoms of the optimized complexes were placed in the middle points between two electronegative atoms (N3-N′1, N4-O′6 or O4-O′6) and then the structures of the transition states were optimized using the opt = (calcall, ts, noeigen) keyword. The vibration analyses were performed to determine the existence of the transition states. The forward energy barrier, ΔGf, is the difference between the Gibbs free energy of the transferred states and the reactants. The reverse energy barrier, ΔGr, is the difference between the Gibbs free energy of the transferred states and the products.

The GaussView 5.038 was used to create molecular structures of the modeled cytosine analogues. The molecular structure of classic cytosine, at first, was created and optimized by the program package. And then the 12 modeled cytosine analogues were created by replacing the atoms in the classic cytosine and optimized by the program package.

To investigate the ability of N3 to accept protons and O4 to donate protons, the ESP-mapped vdW surfaces and ESP extrema were rendered using the VMD 1.9.1 program39 based on the outputs from the Multiwfn 3.6 program40,41.

To investigate the equilibrium of the DPT reactions and chemical reactions of the analogue complexes, the forward and reverse rate constants (kf and kr) were given by the following equation42

$$k=2.1\times {10}^{10}T{{\rm{e}}}^{-1000\varDelta {G}^{\ne }/(1.9859\times T)}$$
(1)

Where k is the forward or reverse rate constants, T=310.15 K (37 °C), ΔG is the forward or reverse energy barriers. The equilibrium constants (Keq values) of the DPT reactions are the quotients of kf and kr (Keq= kf/kr).