A hybrid stochastic model of folate-mediated one-carbon metabolism: Effect of the common C677T MTHFR variant on de novo thymidylate biosynthesis

Folate-mediated one-carbon metabolism (FOCM) is an interconnected network of metabolic pathways, including those required for the de novo synthesis of dTMP and purine nucleotides and for remethylation of homocysteine to methionine. Mouse models of folate-responsive neural tube defects (NTDs) indicate that impaired de novo thymidylate (dTMP) synthesis through changes in SHMT expression is causative in folate-responsive NTDs. We have created a hybrid computational model comprised of ordinary differential equations and stochastic simulation. We investigated whether the de novo dTMP synthesis pathway was sensitive to perturbations in FOCM that are known to be associated with human NTDs. This computational model shows that de novo dTMP synthesis is highly sensitive to the common MTHFR C677T polymorphism and that the effect of the polymorphism on FOCM is greater in folate deficiency. Computational simulations indicate that the MTHFR C677T polymorphism and folate deficiency interact to increase the stochastic behavior of the FOCM network, with the greatest instability observed for reactions catalyzed by serine hydroxymethyltransferase (SHMT). Furthermore, we show that de novo dTMP synthesis does not occur in the cytosol at rates sufficient for DNA replication, supporting empirical data indicating that impaired nuclear de novo dTMP synthesis results in uracil misincorporation into DNA.

. The reaction-based specification of the model according to the notation introduced in Gostner et al. 61 . Rectangles identify model variables, non-boxed substrates are model constants, green circles identify enzymes, dark blue arcs identify matter transformation, and light blue arcs identify regulatory events (dotted lines indicate activations and solid lines indicate inhibitions). The purple boxes indicate reactions and variables associated with the folate cycle and the homocysteine remethylation cycle, respectively. activity by decreasing its affinity for the FADH cofactor. Such substitution affects enzyme stability and hence the partitioning of folates between dTMP synthesis and homocysteine remethylation 25,26 . Decreased MTHFR activity resulting from the polymorphism decreases 5mTHF synthesis, leading to impaired homocysteine remethylation and elevated serum homocysteine 24 . The 677 T variant is also associated with a redistribution of cellular folate cofactors; 5mTHF is the predominate form of folate in red blood cells in MTHFR 677CC carriers, whereas 10fTHF is the predominate form of folate in MTHFR 677TT carriers [27][28][29] . 10fTHF is less chemically stable than 5mTHF, and the MTHFR 677TT variant is associated with lower folate status 30 and higher folate requirement 31 . Recent studies suggest that the contribution of the MTHFR variant to NTD risk is due to its impact on cellular folate status, rather than impaired homocysteine remethylation 32 . Likewise, mouse models of NTDs indicate that impaired dTMP synthesis, and not homocysteine remethylation, cause folate-responsive NTDs [33][34][35] . Reed et al. investigated the consequences of the MTHFR C677T polymorphism, assuming 70% enzyme activity for heterozygote and 30% enzyme activity for homozygote, in comparison to CC homozygotes (which was set to 100% activity) using parameters for folate monoglutamates, which are not the physiological form of folate cofactors in cells. Under these conditions, the variant allele decreased concentrations of 5mTHF and SAM and increased the concentrations of homocysteine, SAH, and rates of dTMP and purine biosynthesis 21 , which is inconsistent with current understanding of biochemical changes associated with NTD risk. The effect on the redistribution of folate cofactors towards 10fTHF that is associated with the 677 T variant, or its impact on other pathways within the network, was not reported 21 .
Here, we studied the partitioning of CH2F, a cofactor for both homocysteine remethylation and de novo dTMP biosynthesis [36][37][38] , and the effects of known genetic and nutritional variables that impact movement of CH2F through the network. Existing models are limited by adopting kinetic parameters determined from the use of folate monoglutamate substrates 19,20 . Folate polyglutamates are the functional form of folate cofactors in cells and have much higher affinity for their respective FOCM enzymes than the corresponding monoglutamate forms of folate 39,40 . Therefore, we updated the parameters in the deterministic model to include the physiologically-relevant polyglutamate forms of folate cofactors and demonstrate that it faithfully recapitulates existing data in the literature (see Methods, Model Validation and Fig. 2). The decisions for selecting individual enzyme kinetic parameters for this model were driven by: 1) available data for physiologically relevant polyglutamate forms of the folate cofactors derived from characterization of mammalian enzymes, and 2) data from human models, specifically L1210 cells, because of the richness and quality of the data used to derive kinetic parameters. Given the high conservation of folate enzymes among mammals, our model could be applied to mammalian systems in general, even though we are not proposing a completely homogeneous model with respect to species.
We were able to identify key nodes in the network of Fig. 1 that exhibit high degrees of stochastic behavior, including the influence of nutrient status and genetic variation on stochasticity through simulations. We explored the impact of the MTHFR C677T polymorphism and its interaction with folate status on partitioning of CH2F within the network, including its impact on de novo dTMP biosynthesis to understand the etiology of NTDs. The results of the computational model provide evidence that the rates of de novo dTMP synthesis as currently modeled in the cytosol are insufficient to support DNA synthesis in S-phase in mammals, accounting for uracil misincorporation into DNA that occurs in folate deficiency and in mouse models of NTDs.

Results
The effect of folate status and the MTHFR polymorphism on pathways affecting NTD risk. In the current model, MTHFR activity was decreased to model the effect of the MTHFR C677T polymorphism. In addition, a two-fold increase in MTHFR activity was modeled to examine whether there was a dose-response relationship between MTHFR activity and various readouts of the network. This model shows that 5mTHF levels decrease as MTHFR activity decreases, reflecting the effects of the MTHFR C677T polymorphism ( Table 1). The current model also recapitulates the biological observation that decreased MTHFR activity results in accumulation of 10fTHF and THF ( Table 1). Inclusion of 10fTHF in the folate distribution is a strength of the current model in that it allows for estimation of the effect that perturbations to the system have on accumulation of this unstable form of folate, which likely accounts for the decreased folate status linked to NTDs in carriers of the polymorphism 30, 32 . . Where possible, a trend arrow is provided on the right to show the experimental outcome observed in Herbig et al. 28 . All the trends are consistent with literature (green arrows) except for the total % of 5 mTHF in the glycine range 5-10 mM (red arrow).
Scientific RepoRts | 7: 797 | DOI:10.1038/s41598-017-00854-w SAM and SAH levels vary markedly with changes in MTHFR activity (SAM levels decrease and SAH levels increase by more than 50% when comparing the "CC" model to the "TT" model), with a SAM/SAH ratio around one being achieved in CC homozygotes compared to 0.24 in TT homozygotes ( Table 2). Although methylation potential, otherwise known as the SAM/SAH ratio, changes markedly with varying MTHFR activity, homocysteine concentrations appear relatively insensitive to changes in MTHFR in this model (Table 2), inconsistent with known effects of the MTHFR TT genotype in elevating serum homocysteine levels 41 . However, in this model the lack of elevation in homocysteine due to the MTHFR C677T polymorphism reflects that the model represents a closed system leading to intracellular conversion of cellular homocysteine to SAH as opposed to export of homocysteine into the circulation ( Table 2).
The activity of each enzyme in the FOCM network as predicted by the current model indicates that accumulation of 10fTHF resulting from decreased MTHFR activity is due to two factors: (1) an increased flux through both the 10fTHF synthetase activity leading to increased synthesis (FTS), and (2) an increased flux through the cyclohydrolase/dehydrogenase activity of MTHFD1 which converts CH2F to CHF to 10fTHF (MTCH, MTD activities, respectively, Table 3). Interestingly, the accumulation of 10fTHF does not affect flux through the enzymes that use 10fTHF as a co-factor for de novo purine synthesis (PGT, AICART; Table 4). This is consistent with empirical experimental findings that increasing cellular levels of 10-formytetrahydrofolate dehydrogenase, which consumes 10fTHF, do not affect de novo purine synthesis 42 . Flux through the de novo dTMP synthesis pathway increases with decreasing MTHFR activity, consistent with empirical studies indicating these two pathways compete for CH2F (Table 4, columns DHFR, TYMS and SHMT) 43 . These findings are in agreement with empirical data showing that the TT polymorphism results in an increase in CH2F available for dTMP synthesis as indicated by isotope tracer studies in humans 26 .  Table 3. Fluxes of the reactions catalyzed by the enzymes FTS, MTCH, MTD, MTHFR and MTR and the binding of 5mTHF and SHMT for different levels of MTHFR activity (ranging from 2x to 0.3x of wild type). CC, CT and TT refer to the C667T polymorphism.
5mTHF binds to and is a potent inhibitor of SHMT and glycine N-methyltransferase (GNMT). Decreased MTHFR activity, which lowers cellular 5mTHF levels ( Table 2), increases the flux through the SHMT1-catalyzed reaction in the direction of serine catabolism to glycine (Table 4). This reflects the depletion of intracellular 5mTHF as well as glycine, resulting from increased GNMT activity that results from 5mTHF depletion ( Table 5). Methionine synthase (MTR) flux is highly sensitive to MTHFR genotype reflecting its dependence on availability of its substrate 5mTHF that is generated by MTHFR ( Table 3).
The MTHFR C677T variant affects both CH2F partitioning (between homocysteine remethylation and dTMP biosynthesis) and intracellular folate concentrations; the 677 T variant lowers intracellular folate levels. Therefore, the impact of this variant on FOCM was modeled at two different levels of folate (Tables 6-10) (folate replete conditions, 18 μM, and low-folate conditions, 9 μM) to understand how the variants function within the FOCM network as a function of folate cofactor availability. The results demonstrate that the changes in the percentage of 5mTHF (as well as other major one-carbon forms of folate) are more pronounced in the TT genotype than in the CC genotype when cellular folate levels are decreased ( Table 6). The percentage of 5 mTHF in CC homozygous does not change in the folate replete and deficiency states, whereas the accumulation of 5 mTHF in TT homozygotes differs between the deficient and replete states ( Table 6).
Fluxes through FOCM pathways are affected by both the MTHFR C677T polymorphism and folate levels. The most sensitive pathways to folate deficiency are the MTCH and MTD activities of MTHFD1, MTHFR, MTR, DNMT, DHFR, and TYMS (Tables 8-10, row showing absolute flux differences between folate levels). Flux through the dTMP synthesis pathway (DHFR and TYMS) is highly sensitive to folate status for both the MTHFR CC and TT genotypes (   Table 6. Distribution of folate (in percentage of total folate) for replete (18 µM) and low (9 µM) levels of total folate and for the CC and TT case of the C667T MTHFR polymorphism.
also highly sensitive to folate status in the CC homozygotes, whereas GNMT flux in TT homozygotes was insensitive to folate status (Table 10). Similar but less pronounced effects were seen for flux through SHMT (Table 9). To understand if MTHFR genotype affects the stability of the FOCM network at steady state, the deterministic simulation was coupled with stochastic simulation using a hybrid simulation strategy (see Methods and Supplementary Material). Model steady states were obtained under four different conditions that differed by MTHFR 677 genotype (CC and TT case) and intracellular folate levels (replete and low). Interestingly, the most stable steady state (the one with lowest total sum of reaction propensities a 0 (x), see Methods for details), was the CC case with folate replete concentrations, consistent with numerous epidemiological studies associating the MTHFR C677T genotype with folate-related pathologies (Table 11) 43 . The enzyme that exhibited the greatest level of stochasticity in response to folate levels and/or the MTHFR C677T polymorphism was SHMT1 (Table S6).
The molecular basis of uracil misincorporation into DNA. Mouse models implicate SHMT1 and impaired de novo dTMP synthesis in NTD risk. Impaired de novo dTMP synthesis causes an increase in dUMP, which when converted to dUTP causes uracil misincorporation into DNA because DNA polymerases do not distinguish between dTTP and dUTP 44 . The dTMP biosynthesis pathway enzymes (MTHFD1, SHMT, TYMS, and DHFR) are present in both the cytosol and recently have been found to function in the nucleus. In the nucleus, they comprise a multi-enzyme complex at sites of DNA synthesis that may be critical to limit rates of uracil misincorporation into DNA, but regulatory mechanisms remain unknown 45 . These enzymes are modified by the Small Ubiquitin-like MOdifier (SUMO) protein at the G1/S boundary, which permits their nuclear translocation during S-phase of the cell cycle 46 45 . In this model, SHMT1 protein levels were elevated several fold in the liver, yet its localization was restricted to the cytoplasm and nuclear SHMT1 levels were depleted compared to wild-type mice 45 . Furthermore, nuclei isolated from SHMT1 overexpressing mice exhibited lower rates of de novo dTMP synthesis compared to nuclei isolated from wild-type mice 45 . This suggests that de novo dTMP synthesis occurs when the enzymes are present in the multi-enzyme complex within the nucleus in mammals. However, no definitive experiment has been performed that identifies the relative contribution of nuclear and cytosolic dTMP synthesis to overall dTMP synthesis. Interestingly, S. cerevesiae do not import the dTMP synthesis pathway into the nucleus 47 .
To determine if nuclear import of the de novo dTMP pathway was required to meet cellular demands for dTTP during DNA replication, rates of dTMP synthesis were modeled for mammalian cells using standard Michaelis-Menten kinetics (Table 12; Table 4). Based on the number of A-T base pairs in the human genome and an 8-hour S-phase in embryonic stem cells (S-phase in L1210 cells is also 6-10 h 48, 49 ), the rate of dTMP synthesis required for faithful cell replication is 7.8 µM/min (calculations are in Supplementary Material, see also Table 12) 50,51 . In the current model, which does not account for SHMT1/TYMS/DHFR/MTHFD1 nuclear localization nor complex formation, cytosolic dTMP synthesis rates are 4.4 µM/min (Table 4, DHFR and TYMS flux, 263.4 µM/h, assuming MTHFR 677 CC genotype). This computational deficit between dTMP requirements and dTMP synthesis rates suggests that dTMP synthesis as currently modeled in the cytosol where the enzymes are not present in a complex cannot meet cellular needs. Nuclear localization and complex formation of the de novo dTMP synthesis complex seem to be unique to mammalian cells. In S. cerevesiae, TYMS is not SUMOylated and localizes to the nuclear periphery 47 . The measured rate of dTMP synthesis in S. cerevesiae is 1.8 µM/min 52 (Table 12). The rate of dTMP synthesis required to replicate the S. Cerevesiae genome over the course of an S-phase (less than one hour 52, 53 ) is 0.5 µM/min, indicating that yeast synthesize dTMP at a rate that is more than 3-fold greater than necessary for adequate dTMP synthesis (calculations are in Supplementary Material). Furthermore, in response to DNA damage, yeast increase dNTP concentrations 6-8 fold 54 and E. coli increase dNTP concentrations 1.8-3.7 fold 55 , but dNTP concentrations do not increase after DNA damage in mammals 56,57 .

Discussion
Understanding the dynamics of FOCM and its responsiveness to both genetic and environmental perturbations is the key to understanding the etiology of folate-related pathologies. Computational models and related simulations permit an identification of the most sensitive reactions within the network that exhibit the greatest degree of stochastic behavior leading to variability in network outputs. Furthermore, computational models allow an understanding of how both genetics and environmental factors can enhance or repress stochastic behavior at defined locations within the network, accelerating the development of diagnostics to identify those at risk for folate-related pathologies as well as lead to the development of targeted nutritional interventions for disease prevention.
The Shmt1 knockout mouse model (Shmt1 +/− , Shmt1 −/− embryos) exhibits impaired de novo dTMP synthesis in the absence of perturbations of homocysteine remethylation. It also recapitulates risk for NTDs in humans.  Specifically, the mouse model exhibits folate-responsive NTDs that occur with minimal perturbation in FOCM, and exhibit low and variable penetrance 33,34 . In fact, most if not all, folate-related pathologies whose etiology involves interactions among genetic risk variants and nutrient exposures also exhibit low and/or variable clinical presentation. Understanding the stochastic behavior of the various reactions within FOCM that results in increased variability in FOCM network outputs is essential to understand which enzymes in the network contribute to folate-related pathologies.
Existing FOCM models rely on the limited quantity of kinetic data present in the literature, and the performance of the model will be dependent upon the kinetic parameters chosen to include in the model. Much of the available kinetic data for FOCM enzymes present in the literature was collected using the commercially available monoglutamate folate substrates, with few studies using the physiologically relevant polyglutamate forms of the cofactor. In the cell, newly transported monoglutamate folates are converted to folate polyglutamates, containing 3 to 7 polyglutamate moieties, though the action of folylpolyglutamate synthetase 58 . The polyglutamate chain (N = 3 glutamate and higher) increases the affinity of folate cofactors for many folate-dependent enzymes by one to two orders of magnitude 39,40 . Models that include kinetic parameters derived from the use of folate monoglutamates can limit model reliability. Here we established a hierarchy of criteria to select a more homogeneous set of kinetic parameters (i.e. K m and V max ) by referring, when possible, to L1210 cells because of the richness and quality of the data used to derive kinetic parameters. Furthermore, our preference was to select kinetic coefficients generated using folate polyglutamate cofactors and purified proteins from animal models closest to humans, as the variability in kinetic parameters among mammals is much less than the differences observed between folate monoglutamate and polyglutamate cofactor substrates.
The current model was validated by demonstrating that it recapitulates empirical observations regarding the impact of intracellular glycine on behavior of the FOCM network (Fig. 2). The validated model was then used to understand how the MTHFR C677T polymorphism, a known genetic risk factors for NTDs in humans, affects FOCM. This model shows that the lower levels of 5mTHF associated with the MTHFR 677 T variant are accompanied by elevated levels of 10fTHF, which has been observed in animal models and in humans [27][28][29] (Table 1). The model also indicates that 10fTHF accumulates in TT homozygotes as a result of increased flux through both the synthetase activity of MTHFD1 (FTS activity, Table 3), but also due to increased flux through MTHFD1 activity in the direction converting CH2F to 10fTHF (Table 3). Therefore, the model accurately predicts perturbations in FOCM that have been observed in human clinical and epidemiological studies. A recent study suggested that the risk of the MTHFR C677T polymorphism for NTDs was due to its known effect on lowering intracellular folate concentrations, rather than its role in providing 5 mTHF for homocysteine remethylation 32 . This model demonstrates that the MTHFR C677T polymorphism elevates levels of 10fTHF, which is known to be a chemically unstable form of folate that is susceptible to oxidative degradation, providing a mechanism by which the MTHFR C677T polymorphism depletes intracellular folate levels. Importantly, the model reported here demonstrates that the de novo dTMP biosynthesis enzymes are the most sensitive to low intracellular folate concentrations, with both DHFR and TYMS activities being repressed by 63% (Table 9). This finding is consistent with the finding that mouse models with impaired de novo dTMP biosynthesis are susceptible to NTDs in folate deficiency 34 . The hybrid stochastic simulation also reveals that both folate deficiency and the MTHFR C677T polymorphism create overall instability in the network (Table 11), consistent with a vast body of literature demonstrating an association of both folate deficiency and the MTHFR C677T polymorphism with various pathologies 24,30 . Interestingly, SHMT1 exhibits the greatest increase in stochastic behavior as a result of the MTHFR C677T polymorphism (Table S6); the SHMT1 enzyme is the only FOCM enzyme that when disrupted results in folate-responsive NTDs 34 .
The primary findings of this study are that the FOCM network is destabilized as a result of folate deficiency and the MTHFR677T polymorphism, and that SHMT is the most sensitive enzyme within the network to this network instability. This finding nicely connects the MTHFR genetic variant, a known risk factor for human NTDs, and SHMT1, the only folate enzyme whose disruption results in folate-responsive NTDs in mice. Furthermore, this model predicts that de novo dTMP synthesis rates in mammals are about half of what is required to meet DNA replication demands for dTMP (Table 12). Although mammals contain two pathways for dTMP synthesis, the folate-dependent de novo dTMP synthesis pathway described here and a salvage pathway catalyzed by thymidine kinase 1, the salvage pathway activity is insufficient to meet cellular needs based on observations that folate deficiency results in elevated uracil accumulation in DNA. In mammalian cells, the de novo dTMP synthesis enzymes form a multi-enzyme complex that interacts with DNA replication enzymes 46 . The discrepancy between de novo dTMP synthesis rates required to replicate the genome and the rate of dTMP synthesis currently predicted by the model indicates that the model should be extended to include multi-enzyme complex formation and substrate channeling in the nucleus to model more accurately determinants of FOCM and dTMP synthesis. The inclusion of the dTMP multi-enzyme metabolic complex in the model is expected to limit substrate diffusion and increase the rate of dTMP synthesis.

Methods
Description of the Model and of the Simulation Techniques. The model was constructed as a closed system using the subset of reactions that describe the FOCM pathways and homocysteine remethylation in cytoplasm 19 (Fig. 1). For the simulation, we employed a hybrid stochastic approach using deterministic simulation to compute the initial phase of the dynamics until a model steady state was reached, and then we assessed the stability of the achieved steady state by relying on exact stochastic simulation. We adopted a hybrid approach, rather than one entirely based on exact stochastic simulation 59 , because of the intensive computational effort introduced by the stiffness of the system during the simulation. The deterministic simulation was based on ODEs, where reactions were described in terms of Michaelis-Menten equations consistent with the original model of Reed 19 and computed using the MATLAB integrator ode15s, whereas parameter estimates were derived from literature or calculated by nonlinear least squares optimization. The kinetic constants were obtained from folate polyglutamate cofactors and their interaction with enzymes purified from L1210 cells where possible and otherwise from other mammalian tissue (see Supplementary Material for a detailed discussion of the model and parameter estimates).
The set of ODEs was further translated into a stochastic reaction-based model and a hybrid simulation approach 60 was employed to quantify the level of stochasticity in the considered FOCM steady states. We refer to the Supplementary Material for a detailed description of how the ODE model has been translated to a reaction based stochastic one. According to the seminal work of Gillespie 59 , exact stochastic simulation allows simulation of each reaction event asynchronously when such reaction event is most probable to occur. To allow this, the simulation algorithm computes a propensity function a j (x) at each simulation step for each modeled reaction R j , where x is the current state of the system providing the abundances of all modeled species at the considered time. The propensity value of a reaction has a direct link with the probability of its execution, that is, reactions with higher propensity are more likely to be fired in the near future. To evaluate when the next reaction event will occur, also the total sum of propensities = ∑ a x a x ( ) ( ) R j 0 j is computed, because this quantity is linked to the number of reaction events occurring in the next time unit, that is, with increasing total propensity the number of reaction events per unit of time also increases. In Table 11, we provide the total sum of propensities for the considered steady states. On average, we would need to generate up to 10 15 -10 16 reaction events per unit of time during the stochastic simulation, due to stiffness of the system. To circumvent the problem, we relied on the concept of total propensity to evaluate the stability of the steady state, by assuming that a steady state is more stable when it has a lower value of a 0 (x), that is, a lower averaged number of reaction events that can perturb the equilibrium of the steady state.

Model validation.
To validate the FOCM model, in silico experiments were performed to determine if the model could recapitulate empirical data generated in MCF-7 cells by Herbig et al. 10 focusing on the effect of glycine on FOCM (Fig. 2). Glycine is important because, as a second substrate, it has a direct influence on the reaction catalyzed by the enzymes GNMT as well as on the reversible reaction transforming CH2F to THF catalyzed by the enzyme SHMT. The purpose is to understand how the steady state of FOCM is affected by altering intracellular glycine concentrations. This was achieved by running several model simulations starting from different glycine concentrations and comparing the corresponding steady states with empirical data from Herbig et al. 10 . This study examined the effect of exogenous glycine at concentrations from 0 to 10 mM on the relative distribution of folate one-carbon forms as well as S-adenosylmethionine (SAM) and S-adenosylhomocysteine (SAH) levels. In summary, the empirical data revealed that as glycine concentrations increase intracellular 10fTHF levels increase at the expense of 5mTHF levels that decrease. Furthermore, as glycine concentrations increase SAM levels are depleted and SAH levels rise. These changes were interpreted by the effects of glycine concentration driving the reversible SHMT reaction in the direction of serine synthesis 10 .
We simulated the effect of glycine on folate distribution, SAM, and SAH concentrations using the computational model for values of glycine ranging from 0 to 10 mM (Fig. 2). The trends obtained by the model simulations were in agreement with the literature (green arrows in Fig. 2), confirming the coherence between model outcomes and empirical data. We observed only one exception related to the total % of 5 mTHF at 10 mM glycine (red arrow in Fig. 2). This discrepancy could be mainly due to two reasons: (1) 10 mM glycine is an extreme and non-physiological intracellular glycine concentration that could cause pharmacological effects, (2) the large magnitude of the experimental error in the Herbig et al. study at this glycine concentration.