The origin of efficient triplet state population in sulfur-substituted nucleobases

Elucidating the photophysical mechanisms in sulfur-substituted nucleobases (thiobases) is essential for designing prospective drugs for photo- and chemotherapeutic applications. Although it has long been established that the phototherapeutic activity of thiobases is intimately linked to efficient intersystem crossing into reactive triplet states, the molecular factors underlying this efficiency are poorly understood. Herein we combine femtosecond transient absorption experiments with quantum chemistry and nonadiabatic dynamics simulations to investigate 2-thiocytosine as a necessary step to unravel the electronic and structural elements that lead to ultrafast and near-unity triplet-state population in thiobases in general. We show that different parts of the potential energy surfaces are stabilized to different extents via thionation, quenching the intrinsic photostability of canonical DNA and RNA nucleobases. These findings satisfactorily explain why thiobases exhibit the fastest intersystem crossing lifetimes measured to date among bio-organic molecules and have near-unity triplet yields, whereas the triplet yields of canonical nucleobases are nearly zero.

C hemically modified nucleobases provide DNA and, in particular, RNA with additional functions that are only beginning to be discovered 1 . Thiobases, where the oxygen atom of a carbonyl group is substituted by a sulfur atom, have been detected in transfer-RNA 2,3 , and the lack of some of these substituted nucleobases can lead to metabolic diseases in the mitochondria 4 . Thiobases are also employed in DNA-based nanotechnology 5 , in photocrosslinking studies 6,7 , in pharmacology 8,9 , and-due to their interesting photophysical properties-in photo-chemotherapy [10][11][12] . In fact, thiobases have recently been demonstrated as effective photo-chemotherapeutic agents for the treatment of skin cancer in tissue biopsies 13 as well as of bladder cancer in animal models 11 , and are being developed for clinical applications 14 .
The canonical nucleobases absorb primarily in the ultraviolet-C range (100-280 nm). In contrast, substitution of the oxygen atom by sulfur induces a significant red-shift in the absorption spectrum, making thiobases effective absorbers of ultraviolet-B (280-315 nm) and ultraviolet-A (315-400 nm) radiation 15 . An intriguing observation is that, while canonical nucleobases efficiently return to the electronic ground state following electronic excitation 16,17 , thiobases populate long-lived, reactive triplet states in near-unity yields 15 -a property that is exploited in their photonic applications. The photostability of canonical nucleobases is ascribed to ultrafast relaxation mechanisms that convert the photon energy into vibrational motion 17 . Although intense research activity has been aimed at determining time scales and relaxation pathways in thiobases by using time-resolved spectroscopic techniques 12,[18][19][20][21][22][23][24] and theoretical methods [25][26][27][28][29][30][31][32] , there exists no physical explanation of the electronic and structural factors that favour the formation of reactive triplet states instead of ultrafast relaxation to the ground state.
In this work, we provide a rationale for the origin of efficient population of triplet states in thiobases versus the ultrafast recovery of the ground state in canonical nucleobases. We use 2-thiocytosine (2tC) as a testbed to understand how thionation leads to a strong stabilization of different parts of the excited-state potential energy surfaces (PESs), which favours intersystem crossing (ISC) to the triplet manifold over internal conversion to the electronic ground state. Not only is 2tC specifically relevant for many medicinal applications-due to its anti-cancer 33 , anti-viral 34 , anti-bacterial 35 , anti-microbial 36 and cytotoxic activities 37 -but it is the only pyrimidine thiobase derivative that has not yet been investigated using time-resolved techniques 15 . By filling this gap we are able to deliver a comprehensive explanation of the remarkable photophysics of thiobases, providing design criteria for the synthesis of prospective drugs.

Results
Photochemical mechanism of 2-thiocytosine. A combination of steady-state and femtosecond transient absorption spectroscopy with ab initio computations and nonadiabatic dynamics is used to elucidate the excited-state dynamics of 2tC. Femtosecond transient absorption spectra (fs-TAS) were recorded exciting with UV-B (308 nm) and UV-A (321 nm) wavelengths, which correspond to the low-energy tail of the steady-state absorption spectrum (see Supplementary Fig. 1). According to calculations (see Supplementary Fig. 2, Supplementary Table 1 and Supplementary Note 1), the UV-A or UV-B wavelengths excite the lowest-energy 1 p s p* state (the S 2 at the equilibrium geometry) or the tail of the second 1 p s p* state (S 4 ), respectively. The TAS obtained after 308 nm excitation (Fig. 1, the TAS exciting at 321 nm is shown in Supplementary Fig. 3) shows an initial featureless rise in the absorbance covering the full spectral probe window from À 360 fs until a time delay of À 120 fs (that is, within the cross correlation of the pump and probe beams), when two distinct absorption maxima begin to emerge: one in the UV range (350-400 nm) and another in the visible range (around 580 nm). These absorption bands continue to grow until 320 fs, at which point the UV band reaches its maximum. As the UV band decays, the visible band continues to increase in amplitude and blueshifts to l max ¼ 525 nm at 3.76 ps, with an apparent isosbestic point at 420 nm. After 3.76 ps, no significant changes in the absorption maxima are observed for a time delay of up to 20 ps (not shown), the maximum time delay investigated in this work.
The TAS data were subjected to a target and global fitting analysis. Two lifetimes (t 1 ¼ 210 ± 50 fs and t 2 ¼ 480 ± 60 fs) were needed to model the dynamics during the first 4 ps. Figure 2 shows the decay-associated spectra (DAS) and the fitted representative kinetic traces taken at the transient absorption maxima, that is, 355, 523 and 581 nm (see Supplementary Fig. 4 for the traces exciting at 321 nm).
In order to interpret the TAS, and to deduce the relaxation pathways associated with the obtained lifetimes, the deactivation pathways out of the initially excited S 2 ( 1 p s p*) state were calculated using the MS-CASPT2 method (multi-state complete active space second-order perturbation theory). TAS were simulated (Fig. 3a, the corresponding data is given in Supplementary Table 2) from the excited-state minima ( 1 p s p*, 1 n s p*, 3 n s p* and 3 p s p*) using a sum of Gaussians centred at the calculated transition energies, with the amplitudes proportional to the oscillator strengths. The 1 p s p* state absorbs strongly around 363 nm and the 3 p s p* state dominates the visible part of the spectrum with two transitions around 441 and 546 nm,  giving rise to a broad absorption band. In contrast, the two n s p* states show a 10-fold weaker absorbance compared with the p s p* states. The absorption spectra of the singlet and triplet n s p* states are very similar, with transitions at 333 and 576 nm versus 333 and 574 nm, respectively. Figure 3b shows two linear combinations of the four calculated transient spectra harmonizing with the experimental transient spectra at selected time delays, that is, at 320 fs (when the UV band reaches its maximum intensity) and 3,760 fs (when the visible band reaches its maximum). The per cent contribution of each excited-state absorption can readily explain both the decay in the UV region (o400 nm) and the increase of the visible band around 525 nm. For comparison, the two corresponding experimental transient spectra are shown in Fig. 3c.
According to the calculations, the initial broad and featureless increase in absorption amplitude across the full spectral probe window until À 120 fs (see also DAS A in Fig. 2a) can be assigned to a linear combination of the singlet states with p s p* and n s p* characters. This assignment is further supported by dynamics simulations, which predict that a 1 p s p*/ 1 n s p* conical intersection (CoIn) can be reached within 160 fs upon excitation, from which the molecule populates simultaneously two singlet minima with 1 n s p* and 1 p s p* characters.
At a time delay of about À 80 fs, two distinct absorption bands begin to evolve, with maxima around 360 and 550 nm, respectively ( Fig. 1 and DAS B in Fig. 2a). These absorption bands are assigned to a linear combination of the absorption from the 1 p s p*, 1 n s p*, 3 n s p* and 3 p s p* states (Fig. 3b). The UV absorption band matches most closely the spectrum computed for the 1 p s p* state, thus the subsequent decay in this region is likely associated with a decay of 1 p s p* population.
Finally, the TAS at a time delay of ca. 4 ps, which has an absorption band peaking at 525 nm ( Fig. 1 and DAS C in Fig. 2a), compares satisfactorily with the spectrum calculated for the 3 p s p* state, possibly with some contribution from the 3 n s p* state (Fig. 3b).
On the basis of static MS-CASPT2 calculations, the relaxation pathways sketched in Fig. 4 can be postulated (see Supplementary  Table 3 for the corresponding data). After one-photon absorption (308 or 321 nm) the 1 p s p* state at 3.65 eV is populated (S 2 at the Franck-Condon geometry). A barrierless pathway leads to the 1 p s p*/ 1 n s p* CoIn at 3.02 eV, which connects to two different singlet state minima with 1 p s p* (3.02 eV) or 1 n s p* (2.95 eV) character. From these two minima, several relaxation pathways are plausible. Relaxation to the ground state involves overcoming energy barriers of at least 0.8 eV, making this process unlikely. Instead, two singlet-triplet ( 1 n s p*/ 3 p s p* and 1 p s p*/ 3 n s p*) minimum-energy crossing points (MECPs) located at 3.05 and at 3.08 eV, respectively, are accessible due to their small energy barriers and associated spin-orbit couplings of up to 170 cm À 1 . Energetically and geometrically close to these MECPs lies a CoIn between the 3 n s p* and 3 p s p* states, at an energy of 3.03 eV.
Hence, there exists a three-state near-degeneracy region involving the lowest singlet and the two lowest triplet states. From the 3 n s p*/ 3 p s p* CoIn, the system can relax either to the 3 n s p* minimum at 3.02 eV or to the 3 p s p* minimum at 2.85 eV. A time-resolved interpretation of the relaxation mechanism of 2tC is obtained by performing surface-hopping nonadiabatic simulations using the SHARC method (surface hopping including arbitrary couplings) 38 . Associated energies, gradients and relevant couplings are obtained at the multi-reference configuration interaction including single excitations (MRCIS) level of theory (see Supplementary Fig. 5 and Supplementary Note 2 for validation of the method). Figure 5 shows the excited-state populations based on 137 trajectories and the result of monoexponential fitting. The relaxation pathways in the dynamics simulations involve mainly four electronic stateslabelled according to their energetic ordering at each geometry as S 1 , S 2 , T 1 and T 2 -as foreseen by the quantum chemical calculations. These states encompass the 1 n s p*, 1 p s p*, 3 n s p* and 3 p s p* characters, but note that due to state crossing and mixing, no one-to-one correspondence between the S i /T i labels and the characters exists beyond the Franck-Condon geometry.
After excitation, the population in the S 2 state decays to the S 1 state with a lifetime of 160 fs (Supplementary Table 4)-a value that might be slightly overestimated due to a small barrier to reach the S 2 /S 1 crossing (0.13 eV), which is present with MRCIS but not with the more accurate MS-CASPT2 method (Fig. 4). The population of the S 1 state rises to a maximum of about 30% during the first 200 fs due to population transfer from the S 2 state. Afterwards, 2tC undergoes ISC to the triplet manifold with an average time constant of B250 fs. The population of the S 1 state amounts to only 13% after 1 ps.
The most important ISC pathways predicted by the dynamics simulations are S 1 -T 2 and S 1 -T 1 , with a minor contribution from S 2 -T 2 . The characters of the triplet states T 1 and T 2 are strongly mixed and cannot be assigned as strictly 3 p s p* or 3 n s p* during the dynamics simulations. It is precisely this mixed wavefunction character that leads to large spin-orbit couplings between S 1 and both triplet states, explaining why both S 1 -T 1 and S 1 -T 2 transitions are possible, and why ISC is very efficient. According to the simulations, T 2 is populated faster (in B100 fs) than T 1 (in 330 fs) because the average S 1 -T 2 energy gap is smaller than the S 1 -T 1 gap. Note that the spin-orbit couplings vary in these regions but are on average B160 cm À 1 between S 1 and T 2 as well as 50 cm À 1 between S 1 and T 1 . However, the T 1 state receives most of the population within 1 ps due to subsequent T 2 -T 1 internal conversion, leading to an average ISC lifetime of 250 fs. This value is in full agreement with the lifetime of 210±50 fs obtained experimentally. After 1 ps, 74% of the total population resides in either of the triplet states, and the remaining 16% of the population in the excited singlet manifold likely undergoes ISC at later times. A small fraction of the trajectories (10%) returned to the ground state because MRCIS underestimates the energy of the relevant S 1 /S 0 CoIns, as compared with MS-CASPT2 (Supplementary Table 3). This ground-state repopulation should be regarded as an artefact of employing MRCIS in the dynamics simulations and we therefore estimate a 490% total ISC yield from the simulations.
From a structural point of view, there are no large out-of-plane deformations during the dynamics as anticipated from the similarity and quasi-planar nature of all excited-state minima (Supplementary Note 3). During the S 2 -S 1 internal conversion, there is a bond inversion within the conjugated double bond system of the pyrimidine ring. As the trajectories relax to S 1 and the triplet states, ring puckering increases, probably due to the increased available energy or due to a more flexible pyrimidine ring in these states. The triplet minima are structurally similar to the singlet minima and thus motion on the T 1 and T 2 PESs cannot be distinguished from motion on the S 1 PES. In fact, the structural similarity is one of the factors leading to the efficient ISC observed in this work.
In summary, the experimental and theoretical results suggest that excitation of 2tC to the lowest-energy 1 p s p* state (S 2 ) leads to very efficient ISC to the 3 p s p* (T 1 ) state in three general steps, which are schematically depicted in Fig. 6. The first step is the relaxation from the vertically excited S 2 ( 1 p s p*) state to the S 1 state through an S 2 /S 1 CoIn. After transition from the S 2 to the S 1 state, the system populates the S 1 PES, which in some regions of the configurational space has dominant 1 p s p* character and in other regions 1 n s p* character, according to MS-CASPT2 calculations. This is also consistent with the observation of a band assigned to the 1 p s p* state in the TAS at short time delays. The second step is ultrafast ISC from the S 1 state to the triplet manifold (T 2 and T 1 states). In the time-resolved experiments, this process leads to the appearance of an absorption band assigned to the 3 p s p* state. This band begins to appear at approximately the same time as the 1 p s p* absorption band, indicating that internal conversion in the singlet manifold and ISC occur on very similar time scales. The experimental time constant t 1 for this second step is 210 ± 50 fs, in close agreement to the ISC lifetime of 250 fs observed theoretically. As the calculations show, the three-state near-degeneracy between S 1 , T 2 and T 1 , together with the large spin-orbit matrix elements, are responsible for the ultrafast ISC lifetime of 2tC. Finally, the third step is internal conversion from the T 2 to the T 1 state and subsequent vibrational energy redistribution to populate the lowest-energy minimum of the T 1 ( 3 p s p*) state. The second experimental time constant of 480 ± 60 fs is attributed to the latter step, together with residual ISC, as suggested by the decay of the UV transient absorption band. The eventual relaxation from the T 1 state to the ground state appears to involve significant energetic barriers and does not play a role on the time scales considered in this work, in agreement with the long-lived nature of the T 1 state of thiobases in solution 15 .
Thiobases versus canonical bases. Having revealed the electronic relaxation mechanisms of 2tC, we are now in a favourable position to compare the structural and electronic features of the PESs in the thiobases and the canonical nucleobases, in order to address the intriguing question as to why they are so photophysically different. To this end, Table 1 collects the energetic parameters for cytosine (C), uracil (U) and guanine (G), and those of the corresponding thiobase analogues that have been investigated thus far-since the relative energies of the states minima and the S 1 /S 0 surface crossings are strongly correlated to whether or not a nucleobase exhibits ultrafast relaxation to the ground state 39 .
The first conspicuous fact is that sulfur-substitution significantly lowers the energies of the singlet state minima. Both 1 np* and 1 pp* state minima are stabilized by up to 1 eV for all three thiobases (2tC, 2tU, 6tG) in comparison with the corresponding canonical nucleobase. This stabilization is well-known in thiocarbonyl compounds 40 and thiobases 15,30 . However, it is surprising that the energy of the most accessible (that is, lowest-energy) CoIn of each molecule stays approximately constant upon thionation (3.8/3.8 eV for C/2tC, 4.0/4.0 eV for U/2tU and 4.3/3.8 eV for G/6tG). As a consequence, canonical nucleobases exhibit efficient groundstate recovery after excitation, whereas the population of the thiobases rapidly decays to deep singlet state minima. The presence of one or more triplet states with energies similar to the singlet state minima and with large spin-orbit couplings facilitates subsequent ISC in the thiobases. While the energetic stabilizations can be correlated directly with the excited-state behaviour of the thiobases, it is not straightforward to rationalize why thionation lowers the energies of the states' minima so strongly, but has little impact on the energies of relevant CoIns.
We argue that a plausible explanation relates to the moiety of the molecule where the electronic excitation is localized. We focus on whether the excitation (that is, the excited electron or, more importantly, the hole left behind) is localized significantly on the (thio)carbonyl group or on the aromatic ring. States characterized by an excitation localized on the chalcogene atom are expected to stabilize considerably upon thionation because it takes less energy to excite an electron from an n or a p orbital of a sulfur atom into the p* system than from an n or p orbital of an oxygen atom (compare the lower ionization potential of S versus O) 41 . Conversely, if the electronic excitation is localized primarily on the pyrimidine ring (that is, electronic excitation from pyrimidine p to pyrimidine p*) no stabilization is expected upon thionation. Using the TheoDORE program 42 , the electronic wave functions were analysed to assign the localization of the relevant S 0 -S 1 excitations. The results are shown in Table 1 (the corresponding difference densities of the excitations can be found in Supplementary Fig. 6, the numeric data in Supplementary  Table 5). Remarkably, the stabilization of the excited-state minima of the thiobases can now be easily explained. Since the np* states are always localized on the carbonyl/thiocarbonyl group, the np* state minima are stabilized strongly upon thionation by B1 eV. Similar arguments explain why the pp* states stabilize, but to lesser extent because they are at least partially delocalized over the aromatic rings.
In order to rationalize the energy shifts of the CoIns upon thionation, we consider C and 2tC as the initial example. Several different S 1 /S 0 CoIns have been reported for C (Table 1) [43][44][45][46][47][48][49] . These CoIns have different electronic character and hence we expect different localization of the excitations. The excitation within the semiplanar CoIn is localized on the chalcogen atom and thus stabilized in 2tC. However, the ethylenic CoIn (which is the lowest-energy CoIn of C, and thus primarily responsible for its ultrafast relaxation) is not stabilized because its excitation is centred on the pyrimidine ring. The amino-out-of-plane (oop-NH 2 ) CoIn is stabilized slightly (by 0.3 eV). Although the semiplanar CoIn is also stabilized by 1.4 eV in 2tC and becomes the lowest-energy CoIn, it is still too high in energy compared with the 1 np* minimum, precluding ultrafast internal conversion to the ground state and favouring instead rapid decay to the 1 np* minimum and subsequent ISC to the triplet manifold. Satisfactorily, a similar rationale applies to the U/2tU pair. According to calculations, U relaxes primarily through the ethylenic CoIn 50-53 , with a twist at the carbon-carbon double bond, whereas the oxygen-out-of-plane (oop-O) CoIn 50,52 is much higher in energy. In 2tU, on the other hand, the oop-S CoIn is stabilized, whereas the ethylenic CoIn is much higher in energy 26,54 . The stabilization of the oop-S CoIn is, however, not sufficient enough to provide a fast relaxation channel to the ground state 54 . To our knowledge, there are no calculations on 2-thiothymine, but we expect that similar arguments hold for the thymine/2-thiothymine pair, because the methyl group is not expected to affect the electronic structure significantly.
The CoIns in the G/6tG pair also follow the discussed trend: the oop-S CoIn is stabilized more than the oop-NH 2 CoIn-although the effect is less pronounced than in the pyrimidine bases. Ground-state relaxation from the 1 pp* minimum in 6tG would be energetically possible, but the shape of the PES favours the population of the 1 np* minimum instead, from where all possible deactivation pathways to S 0 involve very large energy barriers.
From the above analysis, it can be proposed that thionation of the nucleobases stabilizes all states in which the electronic excitation is primarily localized on the sulfur atom (for example, n s p*, p s p*), whereas the other states (n N p*, p ring p*), and their corresponding CoIns, are not stabilized to the same extent. This leads to the conclusion that the intrinsic relaxation pathways in the canonical nucleobases-which involve extensive ring puckering but no extensive motion of the carbonyl group-are blocked in the thiobases because they are too high in energy relative to the considerably stabilized excited-state minima. Although there exist S 0 /S 1 CoIns involving sulfur-localization that are also stabilized 25,29,54 , they are too high in energy with respect to the excited-state minima and do not play a role in the deactivation.
Finally, localization of the excitation on the sulfur atomic orbitals also enhances spin-orbit coupling. This is because the spin-orbit operator is of short range nature 55 and depends on the atomic number, thus making this coupling largest if the excitations are localized on the heaviest atom. Further enhancement of population transfer from the singlet to the triplet manifold is due to the smaller singlet-triplet energy gaps in the thiobases compared with the canonical nucleobases because heavy atom substitution often leads to an increased density of states.

Discussion
Using transient absorption spectroscopy and theoretical calculations, we demonstrated that photoexcited 2-thiocytosine populates the triplet manifold on a femtosecond timescale and with close-to-unity triplet yield. Its relaxation after excitation proceeds in three general steps, which partially overlap: (i) barrierless deactivation from the S 2 to the S 1 state; (ii) ISC to the triplet manifold with a time constant of 210±50 fs, enhanced by the presence of a three-state near-degeneracy region involving the S 1 , T 2 and T 1 states; and (iii) internal conversion to the T 1 state. The latter step, together with some residual ISC is responsible for a time constant of 480±60 fs. More importantly, we have rationalized and generalized the influence of thionation on the excited-state dynamics of the canonical nucleobases. While it is generally recognized that thionation leads to a strong decrease in excited-state energies, here we show that the extent of this stabilization differs significantly in different parts of the PESs. Furthermore, we have shown that the stabilization depends sensitively on the electronic wavefunction character and favours wave functions where the excitation is localized on the sulfur atom. Hence, excited-state minima (for example, 1 n s p*) and conical intersections involving the thiocarbonyl group are stabilized upon thionation, whereas the conical intersections intrinsic to the pyrimidine skeleton are not. This observation explains satisfactorily the remarkably different behaviour of thiobases compared with their parent nucleobases and provides design criteria for the synthesis of prospective photochemotherapeutic drugs.
Femtosecond transient absorption spectroscopy. Femtosecond broadband transient absorption spectroscopy was performed using a Libra-HE pulsed laser system (Coherent, Inc.), which produces 800 nm, 100 fs pulses with a total output power of 4 W. A small fraction of this fundamental beam (o2 mW) was split off and focused into a continuously moving 2 mm CaF 2 crystal to generate the white light continuum probe pulses (320-710 nm). The remainder of the fundamental beam was directed through an optical parametric amplifier (TOPAS-800, Light Conversion, Ltd.) to generate the excitation (that is, pump) pulses at either 308 or 321 nm. All other wavelengths were removed from the excitation beam using a reflective wavelength filter and a Glan-Taylor polarizer.
The spectrometer (Helios, Ultrafast Systems, LLC.) uses an optical delay line to delay the probe pulses in time with respect to the pump. The pump power was attenuated to 1 mJ at the sample and its polarization was randomized using a depolarizing plate in order to prevent contribution of rotational effects to the dynamics. At each delay time the spectrum of the probe is collected with and without pumping the sample in order to produce the difference spectra (DA).
2tC solutions were prepared at concentrations of 5-8 mM in pH 7.4 aqueous PBS. All transient absorption experiments were performed in 2 mm optical path length quartz cuvettes (Starna Cells, Inc.). Sample solutions were stirred continuously throughout data collection and replaced with fresh sample, from the same stock, every 5-7 scans so that all scans had no more than 5% degradation as determined by the steady-state absorption. A home-made LabView program (National Instruments, Inc.) was used to correct all transient absorption data for group velocity dispersion of the white light probe 56 . Global and target analysis of the data were performed using a sequential kinetic model 22 , convoluted with the instrument response function fixed at 200 fs, as determined by the coherence signal of neat methanol 12 .
The sequential kinetic model included two lifetimes: The spectrum of the third component C can be considered as an offset and represents the transient species that has not decayed during the first 20 ps investigated in this work. The results of five separate experiments, three exciting at 308 nm and two exciting at 321 nm, were analysed. The target and global fitting analysis was performed independently for each of the five experiments in order to estimate the uncertainty of the lifetimes. Up to 91 kinetic traces from the broadband transient absorption data were used in this analysis. The uncertainties are reported as twice the standard deviation. The DAS are extracted from the global fit of the data and correspond to the wavelength-dependent amplitudes for each of the lifetime components in the kinetic traces.
Stationary quantum chemistry. According to Kasha's rule, the relaxation from S 4 via S 3 to S 2 should be ultrafast. This is in agreement with the experimental fs-TAS results, which showed that excitation at 321 nm (mainly to S 2 ) and excitation at 308 nm (which also excites S 4 ) induce the same excited-state dynamics. Hence, the computational investigations focused on the relaxation dynamics starting from S 2 .
Minima of the PESs were first optimized at the CASSCF(14,10)/ANO-S-VDZP level of theory, whereas for the optimization of CoIns and MECPs CASSCF(14,10)/ 6-31G* was used. Using CASSCF(14,10)/ANO-S-VDZP, all minima and crossing points were connected with minimum-energy paths to check for the absence of energetic barriers. If a minimum-energy path successfully finishes from a surface crossing to a minimum, it implies that there are no energetic barriers in between. The minima, CoIns and MECPs were then reoptimized at the MS-CASPT2/ANO-L-VTZP level of theory using the methodologies described in refs. 57, 58 and the ORCA optimizer 59  The levels of theory used in the various calculations, including the numbers of averaged states, are also compiled in Supplementary Table 6. The active space or reference space orbital composition is presented in Supplementary Fig. 7. All calculations were performed with the amino-thion tautomer of 2-thiocytosine, which is justified in Supplementary Fig. 8 and Supplementary Table 7. All optimized geometries are reported in Supplementary Note 3.
While MS-CASPT2 is, in many cases, an excellent method to describe excited states, it cannot be yet routinely employed in dynamics simulations, due to the lack of efficient implementations of analytical MS-CASPT2 nuclear gradients. Hence, all stationary quantum chemistry calculations were repeated with MRCIS (as specified in 'nonadiabatic dynamics simulations') to scrutinize the accuracy of the MRCIS PESs employed in the dynamics simulations versus the MS-CASPT2 ones. See Supplementary Fig. 5, Supplementary Table 3 and Supplementary Note 2 for further discussions.
Nonadiabatic dynamics simulations. The on-the-fly electronic structure level of theory for the nonadiabatic dynamics simulations was MRCIS with a reference space of CAS(6,5) type. The orbitals were taken from previous CASSCF(10,8)/ cc-pVDZ calculations (see Supplementary Fig. 7 for the orbitals used), with state-averaging over four singlet and two triplet states (note that the number of states in the MRCIS calculations does not need to equal the number of states in the SA-CASSCF calculations). The CAS(6,5) has been created removing the least occupied p*-orbital and the most occupied two p-orbitals from the CAS (10,8), and therefore the reduced reference space still contains all orbitals involved in the primary excitations of all the relevant states. Inner shells were kept frozen in the MRCIS stage of the calculations. All MRCIS calculations were conducted with COLUMBUS 7 (ref. 62), with integrals from MOLCAS 60 , which also provided scalar-relativistic one-electron integrals according to the Douglas-Kroll-Hess method and the spin-orbit integrals using atomic mean-field integrals 55 .
The surface hopping code SHARC was employed 38 . Surface hopping is an extension of classical molecular dynamics to include nonadiabatic transitions between excited states, and SHARC in turn is an extension of surface hopping to include other types of couplings beyond nonadiabatic ones. In the present study, the additional couplings were the spin-orbit couplings among the states S 0 to S 3 and T 1 to T 2 . Within SHARC, these states are mixed to yield 'diagonal' states, on whose PESs the dynamics is carried out; this procedure includes calculating the gradients of the diagonal states from the spin-free gradients as well as propagating the electronic wavefunction using the spin-orbit Hamiltonian and the nonadiabatic couplings of the spin-free states. SHARC is available free of charge from http://sharc-md.org, and the development version from the authors upon request.
Initial conditions for the dynamics simulations were obtained by sampling from a quantum-harmonic Wigner distribution around the ground state minimum, based on a frequency calculation performed with the MRCIS method. 4,000 geometries were sampled from the distribution and for each geometry the vertical excitation energies and oscillator strengths were calculated, which were used to simulate an absorption spectrum ( Supplementary Fig. 2) and to select the initial electronic state for the dynamics.
One hundred thirty seven trajectories were propagated for a total simulation time of up to 1 ps. The time step for the nuclear motion was 0.5 fs, while the electronic Schrödinger equation was integrated with a time step of 0.02 fs using the local diabatization methodology 63 , which employs wavefunction overlaps 64 to provide the information about the nonadiabatic couplings. An energy-based decoherence correction 65 was applied to the electronic states in the diagonal basis. As stated above, the simulations were carried out on spin-adiabatic potentials, obtained by diagonalization of the spin-orbit Hamiltonian including states S 0 to S 3 and T 1 to T 2 . However, for the analysis of the results the state populations were transformed back into the spin-free basis, since this allows for easier interpretation of the results. Note that the energies of S 4 , T 3 and T 4 (but no gradients, spin-orbit couplings or wavefunction overlaps) were also calculated at each time step in order to scrutinize whether neglecting these states is justified. Furthermore, since the S 3 population decays within only 20 fs to the S 2 state, for all analysis and discussion purposes the S 3 population was merged into the S 2 population.
Wave function localization analysis. The calculations were performed using optimized geometries reported in the literature for cytosine 48 , uracil 43,52 , guanine 66 , 2-thiouracil 54 and 6-thioguanine 25,29 , whereas for 2-thiocytosine the geometries reported here were used. For each molecule, the np* and pp* excited-state minima and relevant CoIns were included. At each geometry, a CASSCF/ANO-L calculation was performed, averaging over 4 singlet states. For cytosine, uracil, 2-thiocytosine, and 2-thiouracil, a CAS(14,10) was used, whereas for guanine and 6-thioguanine a CAS (18,13) was employed. These calculations were performed with MOLCAS 60 . The one-electron difference density matrices between the ground state and the first excited state were then analysed using the TheoDORE program 42 in terms of the Mulliken detachment populations in order to arrive at the classification in Table 1. The Mulliken populations are given in Supplementary Table 5.
Data availability. All data is available from the authors upon request.