Introducing an Optimization- and explicit Runge-Kutta- based Approach to Perform Dynamic Flux Balance Analysis

In this work we introduce the generalized Optimization- and explicit Runge-Kutta-based Approach (ORKA) to perform dynamic Flux Balance Analysis (dFBA), which is numerically more accurate and computationally tractable than existing approaches. ORKA is applied to a four-tissue (leaf, root, seed, and stem) model of Arabidopsis thaliana, p-ath773, uniquely capturing the core-metabolism of several stages of growth from seedling to senescence at hourly intervals. Model p-ath773 has been designed to show broad agreement with published plant-scale properties such as mass, maintenance, and senescence, yet leaving reaction-level behavior unconstrainted. Hence, it serves as a framework to study the reaction-level behavior necessary for observed plant-scale behavior. Two such case studies of reaction-level behavior include the lifecycle progression of sulfur metabolism and the diurnal flow of water throughout the plant. Specifically, p-ath773 shows how transpiration drives water flow through the plant and how water produced by leaf tissue metabolism may contribute significantly to transpired water. Investigation of sulfur metabolism elucidates frequent cross-compartment exchange of a standing pool of amino acids which is used to regulate the proton flow. Overall, p-ath773 and ORKA serve as scaffolds for dFBA-based lifecycle modeling of plants and other systems to further broaden the scope of in silico metabolic investigation.

metabolite concentrations, reaction fluxes, and system biomass 14,15 . In general, there are two approaches to dFBA. The first approach, the Static Optimization-based Approach (SOA), has been applied to E. coli 14 , mammalian cells 16,17 , Saccharomyces cerevisiae (bakers' yeast) 17 , Hordeum vulgare (barley) 15,18 , and Arabidopsis thaliana 19 (in addition to other systems). The second, the Dynamic Optimization-based Approach (DOA), has been applied to E. coli metabolism 14 and signaling networks is S. cerevisiae 20 (to name a few applications). These approaches have proven useful for investigating aspects of plant-scale metabolism, such as resource partitioning in Arabidopsis 19 . These works have inspired the development of our new approach to perform dFBA named as Optimization-and explicit Runge-Kutta -based Approach (ORKA). ORKA significantly improves upon the SOA by utilizing the step-by-step solution approach of the SOA (as opposed to simultaneous solution of all times in the DOA) with increased accuracy and solution stability. These improved characteristics are due to both the implementation of a Runge-Kutta method (a multi-step numerical method for the solution of ordinary differential equations) to replace the first-order Taylor series approximation used by SOA and by replacing the assumption that the reaction rate is constant over each time interval with a trapezoid rule-based integral approximation.
Arabidopsis thaliana (hereafter Arabidopsis) has been selected as a test system for the application and demonstration of the ORKA framework, due to the fact that Arabidopsis is a model plant species with a highly characterized knowledgebase. The choice has also allowed demonstration of ORKA in a dynamic, multi-tissue system. To date, many stoichiometric models of plant metabolism, including Arabidopsis, have been developed. Some of these models including models of Arabidopsis thaliana 13,15,21,22 , Zea mayz (maize) 23 , Sorghum bicolor (sorghum) 12 , Brassica napus (rapeseed) 24 , and Oryza sativa (rice) 25 have treated plants as single metabolic units. These models have sought to analyze metabolic maintenance, response to abiotic stimuli, enzyme regulation changes, and metabolism as a whole 12,13,15,[21][22][23][24][25] . Tissue-specific models have been reconstructed for various Arabidopsis tissues 26 , a maize leaf 27 , and a barley seed 28 to better understand how present metabolites, metabolic pathways, and nutrient (generally carbon and nitrogen) availability differ between tissues. Multi-tissue models have also been developed to characterize whole-plant metabolism for Arabidopsis 13,19 and barley 18 and subsequently to study whole-plant metabolic response to the diurnal cycle and the source-to-sink relationship of leaves and seeds 13,18 . These studies have considered metabolism at a single point (often in the exponential growth phase 12,18,[21][22][23]25 ) or a single diurnal cycle 13 or have modeled only a portion of the Arabidopsis lifecycle 19 . The most complete dFBA work, in terms of modeling the full Arabidopsis lifecycle, models two tissues, leaf and root, across 30 days of vegetative growth (from 6 days to 36 days) 19 . Here, we have developed a core-carbon metabolic model of Arabidopsis, named p-ath773 (plant-scale core-metabolism Arabidopsis thaliana model with 773 genes included), to model the full lifecycle of Arabidopsis from germination to senescence by being embedded in the ORKA framework which captures metabolic interactions between four major tissues: leaf, root, seed, and stem. These four tissues have been chosen for model reconstruction to represent core plant functions: the root for nutrient uptake and growth; the leaf for photosynthesis, carbon fixation, and as a source tissue for plant nutrition; the seed for metabolite storage and a sink tissue for metabolic investment; and the stem for metabolic transport and acting as a conduit for all metabolic interactions between other tissues. Core-metabolism pathways that are included but not limited to photosynthesis; the citrate cycle; starch and sucrose synthesis; fatty acid synthesis and degradation; and amino acid synthesis. The p-ath773 model consists of 1251 total (and 631 unique, defined as having the same identifier across any number of subcellular compartments) reactions (R), 1155 total (and 276 unique) metabolites (M), and accounts for 773 genes (G) including 42 chloroplastic and 11 mitochondrial genes. Each of the modelled tissues including leaf (R: 517, M: 463, and G: 666), root (R: 149, M: 149, and G: 324), seed (R: 418, M: 390, and G: 577), and stem (R: 167, M: 154, and G: 291) has been reconstructed individually to allow for the different tissue mass ratios found across the lifecycle of the plant. A summary of the p-ath773 model is shown in Fig. 1. The ORKA framework determines biomass, metabolite concentrations, reaction flux, change in biomass, and changes in metabolic concentration (collectively defined as a metabolic "snapshot") hourly across the lifecycle of Arabidopsis as modeled by p-ath773 under 12 hour light and 12 hour darkness growth conditions accounting for changes due to diurnal metabolic differences; changes in plant mass; metabolite storage and uptake (particularly carbohydrates); changes in plant tissue mass ratios; and changes in metabolism with respect to plant growth Figure 1. The p-ath773 system model. This figure emphasizes the individual nature of each of the four core tissue models (leaf, root, seed, and stem), formally defines the modeled system boundary (dashed black line), defines cross-boundary exchange reactions, intra-tissue exchange reactions, and gives the generic formulation for Flux Balance Analysis (FBA) applied to the seed tissue model.

Results
Development of the ORKA to perform dFBA. The Optimization-and explicit Runge-Kutta-based Approach (ORKA) has been developed to make more accurate and stable estimates of the changes in biomass and metabolite concentration in a dynamic Flux Balance Analysis (dFBA). The basis of the ORKA is the same as SOA, to model a dynamic (i.e. time-dependent) metabolism across multiple time points, where each time point solution builds upon previous solutions. The pseudocode describing how the ORKA works can be found in Fig. 2A. Symbols are defined as follows: t n is the current time, t 0 is the initial time, Δt is the time step, c n are the steps in the independent variable made by the Runge-Kutta method chosen to use, Y t is the current biomass concentration, Y 0 is the initial biomass concentration, a na is the weight of Runge-Kutta derivative estimate steps (k n ) in the next derivative estimate, b n is the weight of the Runge-Kutta derivative estimate steps in the full Runge-Kutta derivative estimate, dY dt rk est is the Runge-Kutta derivative estimate for the current timestep, Y t + Δt is the biomass concentration at the next time step, z i,t is the concentration of metabolite i at time t, z i,t + Δt is the concentration of metabolite i at the next time step, S ij is the stoichiometric coefficient of metabolite i in reaction j, Γ j,t is the trapezoid rulebased integral estimate of the flux of reaction j at the current timestep, v j,t is the rate of reaction j at time t, v j,tn is the rate of reaction j at Runge-Kutta time step n, set N is the number of steps in the Runge-Kutta solution method (with n as the index), and c nf is the final c n value in the Runge-Kutta method. Greater detail on the definition of each symbols used can be found in the "Symbols Used" section in Text S1. ORKA expands upon the SOA approach 28 by replacing the Taylor-series approximations (details in the methods section) used to advance biomass concentration in the SOA (Y t in Fig. 2A) with a Runge-Kutta-based estimate for increased model accuracy and solution stability. The ORKA framework in this pseudocode formulation is left generic enough so that a variety of Runge-Kutta methods can be used, as long as c n values are evenly spaced. Here examples of Runge-Kutta methods which such c n values include those shown in Butcher Tableaus in Fig. 2. A detailed formulation of ORKA can be found in the Materials and Methods. A summary of the ORKA formulation is as follows. Begin with an SM; a set of time points over which to solve that SM; an initial condition related to the biomass of the system and metabolite concentrations; and a chosen Runge-Kutta method to use in the solution. For each time step, solve the SM using linear programming at the beginning of that time step and define the initial conditions (time, biomass, and metabolite concentrations). The chosen Runge-Kutta method is used to solve the change in those initial conditions over the time step. This is done by solving the SM using linear programming and saving all reaction rates for each solution for the given time step to determine the mass step estimate of the given Runge-Kutta step. Once all Runge-Kutta steps are complete, the final mass step estimate for the given time step is made. To advance metabolite concentration, the integral from the start of the time step to the end of the timestep is estimated using the multi-application Trapezoid rule, which in turn is used to estimate the change in metabolite concentrations. This is followed by applying that mass step and concentration change estimates and repeating the process for the next time step. This process is shown more technically by a pseudocode described in Fig. 2A and explained in full detail in Materials and Methods. extracellular space, cytosol, non-green plastid, inner mitochondria, and outer mitochondria subcellular compartments in accordance with literature evidence (see list of works cited in Data S1). Next, transport and exchange reactions have been added to the model based on literature evidence (see list of works cited in Data S1) or to increase model connectivity 7 . The biomass composition of the seed has been determined from literature 29,30 . The resultant model is charge and element balanced, and has undergone multiple iterations of curation consistent with well-established GSM reconstruction protocols 7 . Once the seed model has been reconstructed, metabolic pathways common to both the seed and leaf tissue have been used as the starting point for reconstructing the leaf tissue model. To this model has been added additional amino acid syntheses (for xylem and phloem loading), photosynthesis, and gluconeogenesis as well as chloroplast and thylakoid subcellular compartments. The biomass of the leaf has been adapted slightly from that of a previously published Arabidopsis model, iRS1597 23 , by refocusing the biomass composition on primary metabolites. Similarly, by having extracted common reactions/pathways from the seed and leaf models as a starting point and adding functionalities particular to these tissues such as nitrogen reduction in the root and the transport of metabolites through the extracellular space of the stem, the root and stem models have been reconstructed. Root and stem models have been reconstructed with metabolic differences between the two such as the presence of amino acid synthesis and the conversion of ammonium to nitrate both in the root for xylem loading. Root and stem tissues are, however, largely focus on basic carbon metabolism and metabolite uptake (root) and transport (root and stem). In the absence of Arabidopsis-specific estimates, the dry weight compositions of switchgrass (Panicum virgatum) root and stem 31 have been used to define root and stem biomass compositions. Both these models contain necessary transport/exchange reactions to ensure model connectivity and to facilitate their roles in the transport processes. The stem and root models have all the subcellular compartments present in the seed model. Once initial reconstructions have been accomplished, thermodynamically infeasible cycles in addition to atom and charge imbalances have been resolved 7 . Figure 3 shows the iterative process of model curation for tissue-specific model reconstructions used in this work (yellow arrow) and for the whole-plant iterative model curation (orange arrows). Figure 4 shows a summary of the distribution of model reactions across KEGG-defined pathways of each tissue model and an overview of reasons for reaction inclusion through confidence scoring (see Method section) 7 . Figure 4A summarizes the pathways common to all tissues and Figs. 4B through 4E graphically summarize the sources of reactions in each tissue model through confidence scores (see methods section) 7 . Once each tissue model has been reconstructed, these four models have been linked by the ORKA framework, and the lifecycle of the plant has been simulated. We have addressed the incongruities between these in silico simulation results and in vivo experimental data by adjusting their metabolism of individual tissue-specific models, tissue-tissue interactions, or by adjusting parameters (such as biomass yield, plant maintenance, and plant senescence) associated with the plant-scale behavior of the p-ath773 model. This portion of the workflow is illustrated in Fig. 3 (orange arrows).

Development of constraints defining tissue-tissue interactions in the p-ath773 model. Once
these core tissue models have been reconstructed and curated, sets of constraints have been defined to enforce logical links between tissues to facilitate the simulation of tissue metabolism. For instance, these links include ensuring that water travels from the root (source) to the leaves (sink) and that literature-supported amino acids travel from the leaf and root (sources) to the seed (sink) through the stem tissue (the link between these tissues). In addition, other constraints include environmental interactions such as with atmosphere and soil. These constraints include gas exchange in all tissues; uptake of micronutrients and water by the roots; and use of light by the leaves. These constraints are discussed in detail in the Materials and Methods. In summary, these constraints ensure that micronutrients and water are transported from the root tissue to other tissues via the stem; that sugars and amino acids travel from the leaf tissue to other tissues via the stem; that patterns of starch and sucrose storage in leaf and stem tissues are included in the model; and that the rates of tissue growth are linked in such a way that tissue mass ratios are preserved or changed in accordance with how these quantities change in an Arabidopsis plant as it passes through various stages of growth.
Simulating stages of plant growth using p-ath773 and ORKA. As discussed more extensively in Materials and Methods, the growth rate for an SM is an exponential growth rate. Due to this exponential nature of the growth rate, seed mass becomes problematic to model as there are points in the growth of the seed tissue where its mass is zero, is advanced from zero to a non-zero value, and is advanced from a non-zero value to zero. These changes are impossible to capture using an exponential function. Therefore, plant mass as a whole is tracked and advanced by the ORKA, and individual tissues masses are determined by multiplying total plant mass by Kutta time step n, set N is the number of steps in the Runge-Kutta solution method (with n as the index), and c nf is the final c n value in the Runge-Kutta method used. Greater detail on the definition of each symbols used can be found in the "Symbols Used" section in Text S1. In (A), there are two control loops (brown text with brown left-handed braces), one looping over each time point in the set of times over which to apply ORKA (t ∈ T), and the other looping over each step in the selected Runge-Kutta method (n ∈ N). The former control loop is used to solve the model at the time point, define the starting points for the Runge-Kutta method, and, after the Runge-Kutta loop is finished, advance biomass and metabolite concentrations in the model. The inner control loop determines the values of the Runge-Kutta-based concentration and biomass step estimates. The various estimates used rely on evenly spaced points at which the estimates are made, limiting the selection of Runge-Kutta method. Some allowable Runge-Kutta methods are shown in (B). For this work, Heun's Third Order Rule was selected. In (C), an overview of the workflow used to integrate the p-ath733 model (red) in the ORKA method (blue and purple) is shown. tissue mass fraction. Since there is no whole-plant biomass function, this approach requires an approximation which defines the error floor by a second order backward difference approximation of the first derivative (see Materials and Methods for details) with an error order of O((c 2 − c 1 )h 2 ). Therefore, any Runge-Kutta method with . Workflow for p-ath773 model reconstruction. This figure shows the workflow used in the reconstruction and curation of individual tissue models (yellow arrows) and the integrated p-ath773 model as a whole (orange arrows). The reconstruction procedure begins by consulting published 'omics' data which helps identify which metabolic functions are present in a given tissue, followed by element-and chargebalancing the reactions representing those functions. A biomass equation is defined from literature evidence, and a stoichiometric model of the reconstruction is created. This is repeated for each tissue until a plant-scale model can be created. This model is then placed in the ORKA framework, and is used to simulate plant growth throughout its lifecycle. The results are compared with in vivo experimental results, such as those shown in Fig. 6. Incongruities are addressed at the tissue-level by re-consulting 'omics' level data. This process is repeated until an acceptable model is achieved. shows the types of reactions included in each of the four tissue models by counting the number of transport reactions, exchange reactions, and categorizing the remaining reactions based on the KEGG pathway(s) to which they belong. As shown here, the leaf model is the most complete and contains the most reactions is almost every category. Importantly, the leaf is the only tissue which contains reactions related to the photosynthetic electron transport chain (labeled "Photosynthesis ETC"). Figures (B) through (E) shows the rational for the inclusion of each reaction in each model using confidence scoring (see Thiele and Palsson for a definition and discussion of confidence scores). To summarize these figures, most reactions are included because there is evidence in the genome for these metabolic functions. The next most common reason for inclusion is being supported by biochemical literature data (e.g. a study has specifically identified the protein and determined its mechanism). The next most common reason for inclusion was modelling necessity (score of 1). No knockin/knockout studies where consulted in this work (score 3). error order less than that will suffice. In this work, Heun's third-order Runge-Kutta rule is used. This is in part because of the limitation just described such that a higher-order Runge-Kutta method is not necessary. Further, this method has the advantage over Kutta's third-order Runge-Kutta rule in that the step size between c n values is one third (e.g. c 2 − c 1 = 1/3) as opposed to one half (e.g. c 2 − c 1 = 1/2) (see Fig. 2B), giving slightly lower error for trapezoid rule-based integration and backward difference approximation estimates. A simplified workflow of how p-ath773 is integrated into the ORKA framework is shown in Fig. 2C and a more detailed explanation is included in Materials and Methods. In summary, the p-ath773 model includes the four tissue models and tissue-tissue interactions, whereas the ORKA to perform dFBA is the approach used to simulate the model form one timepoint to the next. The simulations of the p-ath773 model has been advanced through several growth stages using time www.nature.com/scientificreports www.nature.com/scientificreports/ points for changes in growth stage taken from experimental data 32 , see Fig. 5. Fig. 5 highlights the timepoints spread out through the seven growth stages modeled here including seed germination, seed germination to leaf development transition, leaf development, leaf development to flower production transition, flower production, flower production to silique ripening transition, and silique ripening. Fig. 5 further provides sketches of the in silico and in vivo representations for each of these growth stages. In the seed germination stage, the uptake of fatty acids, sugars, and amino acids from seed storage (endosperm, see Seed Germination stage in Fig. 5) has been modeled as a rate of usage which results in all stored fatty and amino acids being depleted by the end of the seed germination to leaf development transition 30 . This rate has been determined such that it is constant in mmol/h (see Data S1) yet needed conversion to the mmol/gDW·h units used throughout the p-ath773 model. Therefore, the rate at which the endosperm is utilized is scaled by the gDW of the leaf tissue (as the leaf tissue is modeled as interacting directly with the endosperm). This scaling advantageously results in a gradual decrease of the rate of nutrients uptaken from the endosperm stores (in mmol/gDW·h), as would happen in a seedling as the plant mass begins to far exceed the mass of the endosperm. A 12:12 hour light:dark diurnal rhythm has been chosen to match experimental conditions for the studies on starch and sucrose storage/uptake dependence 33 . Diurnal metabolism affects the model at all growth stages except for Seed Germination, when the cotyledons (embryonic leaves) are shaded from light by the soil and/or seed coat. In growth stages when plant tissue ratios are constant (i.e., the vegetative stages such as Seed Germination through Leaf Development), the tissue mass ratio values have been taken from values typical for herbaceous plants (0.511 gDW leaf:0.0.267 gDW root:0.211 gDW stem after adjusting from fresh weight to dry weight) [33][34][35] (See Data S1). In growth stages when the ratios between tissues change 32 (i.e., seed production or dispersion stage), a linear biomass "slider" is used, where a single parameter, seeding (s), is used to progress tissue mass ratios (see Fig. 5). This ranges from s = 0 (normal vegetative tissue mass ratios) to s = 1 (mass ratios when maximum amount of seeds have been produced and have not yet been dispersed) and is linearly incremented from the point at which the first flower is produced to when all flowers are produced then decremented to when all silique (seed pods) are shattered, thus dispersing all seeds (see Data S1). A workflow showing how ORKA is applied to the p-ath773 model can be found in Fig. 2C. In addition to using ORKA to perform dFBA, Flux Variability Analysis (FVA) has been performed, at twelve points throughout the Arabidopsis lifecycle, selected to represent each growth stage and diurnal status in those stages (save the Leaf Development to Flower Production transitions which includes a single time point, see Fig. 5), subject to all growth constraints, and a growth rate equivalent to the optimal growth rate to evaluate the variability in the balanced flux estimates. Flux Variability Analysis is performed at 1 Hour(s) After Germination (HAG, seed germination stage, dark), 70 HAG (seed germination to leaf development transition, light), 90 HAG (seed germination to leaf development transition, dark), 177 HAG (leaf development stage, light), 181 HAG (leaf development stage, light), 770 HAG (flower production stage, light), 810 HAG (flower production stage, dark), 1155 HAG (flower production to silique ripening transition, light), 1170 HAG (flower production to silique ripening transition, dark), 1190 HAG (silique ripening stage, dark), and 1199 HAG (silique ripening stage, light). In summary, we incorporated the p-ath773 model in an ORKA framework to simulate Arabidopsis metabolism across the lifecycle of an individual plant.
Design-build-test cycling of the p-ath773 model in the ORKA framework. Once growth stages have been implemented with the p-ath773 model and the ORKA framework, the design-build-test cycle (shown in Fig. 3) has been used to iteratively improve and refine the p-ath773 model. The data points used to determine how well the model fits experimental literature include the mass of the whole plant at certain benchmark times and peak mass yields of leaf, seed, and stem tissues 32,34 . At 17, 24, and 31 Days After Germination (DAG) the total dry plant mass should be between 0.5 and 2.0 mg; 2 and 8 mg; and 10 and 30 mg, respectively 34 . Upon the completion of multiple iteration of design-build-test cycle, the p-ath773 model has been adequately refined, the p-ath773 model has shown a total dry plant mass of 0.676 mg at 17 days (408 hours), 4.20 mg at 24 days (576 hours), and 25.9 mg at 31 days (744 hours) after germination. Furthermore, mass-based growth targets include The first three checkpoints were in the leaf development phase as 17 Days After Germination (DAG), 24 DAG, and 31 DAG, with in vivo experimental ranges for whole-plant mass and in silico whole-plant mass of the p-ath773 model shown in the callouts. The final image is for total tissue yield, where the reported in silico value is the maximum mass of each tissue during the entire lifecycle, and the in vivo value is the mean dry weight of the specified tissue at harvest plus or minus one standard deviation.
the peak dry weights of the leaves, the seeds, and the stems which have been reported as approximately 163.7 mg (standard deviation 52.0 mg), 127.9 mg (standard deviation 52.7 mg), and 188 mg (standard deviation 39.3 mg), respectively 32 . As the p-ath773 captures both plant growth and loss of seed (and other) mass in the silique ripening stage, the peak mass of each of these tissues has been comparted to this data. In the refined p-ath773 model, the peak masses of the leaves, seeds, and stems have been determined as 153 mg, 100 mg, and 151 mg, respectively, all of which are within one standard deviation of the experimental value 32 (see the methods section for how tissue masses are determined). These comparisons are summarized in Fig. 6. In summary, in silico tissue and plant mass values are similar to in vivo data, thus showing strong agreement with respect to plant-and tissuescale growth trends. This agreement has been achieved by tuning the rate of carbon dioxide and light availability to the plant system 34,36 which the modeled plant is allowed to utilize as well as by tuning the plant biomass yield (defined as the fraction of plant growth that adds to the plant mass with the remainder addressing litter, tissue repair, and degradation) 37,38 . We have defined both carbon dioxide and light uptakes based on literature, with the former from the carbon assimilation rate 39 and the Leaf Area Ratio (LAR) of Arabidopsis 40 and the latter from the transmission spectrum of fluorescent light bulbs (used in in vivo experiments utilized in the p-ath773 model reconstruction) 36 , the absorption spectra of chlorophyll 37 , and the Leaf Area Ratio of Arabidopsis 40 . However, the value of biomass yield (for a given plant across its full lifecycle) has been experimentally identified as between 0.7 and 0.85 37 . Here, to achieve the best alignment between in silico and in vivo growth patterns, biomass yield has been defined as 0.51. There are several possible reasons for this, which are included in the Discussion section. All files necessary for p-ath773 have been included in the GitHub p-ath773 repository (https://doi.org/10.5281/ zenodo.3735103). The in silico results of the final p-ath773 model can be found in Data S2.
flow of water across plant lifecycle. Important to the life of a plant is the flow of water. Water carries various dissolved nutrients for transport (sugars, amino acids, nitrates, sulfates, et cetera) in addition to meeting the metabolic needs (such as photosynthesis) and physiological needs (such as transpiration) of tissues. Water flow through the plant has been selected as a case study which shows tissue-level insight into the general metabolic and transport processes modeled in p-ath773. The results of this analysis are shown in Fig. 7, where each bar graph represents a specific stage of growth as shown in Fig. 5. As can be seen in Fig. 7, the stem tissue is the center of water transport, accepting water from the root and its own metabolism, and transporting this water to the leaf for its use in photosynthesis and to meet the physiological demands imposed upon the leaf by transpiration p-ath773 model by plotting the average reaction rate for each growth stage and each diurnal status of that growth stage, darker bars indicating growth at night and lighter bars indicating growth during the day, to highlight not only the stage-by-stage differences but also the diurnal differences. Flux rates are in units of mmol/gDW·h where gDW (grams Dry Weight) is in units of the dry weight of the individual tissue, rather than the plant as a whole causing incongruity as metabolites are exchanged between tissues as the flux rates must be scaled by the different tissues masses so none of a metabolite is gained or lost between tissue. Further, there are some hydrolysis reactions which occur in the extracellular compartment of each tissue, which accounts for the incongruity in the balance of water in tissue extracellular compartments (such as the in the seed tissue). This is generally a very small amount and therefore was not included in this figure. Further, logarithmscale y-axes were used where possible (indicated by a small black star) because the day and night flux rates were generally orders of magnitude different. www.nature.com/scientificreports www.nature.com/scientificreports/ in addition to transportation to the seed tissue to meet its metabolic demands. Arrowheads indicate the most common direction of water flow, and negative reaction flux indicates flow in the opposite direction. The p-ath773 model shows that the primary driving force pulling water through the plant is transpiration, and that this driving force results in water flow rates during the light periods of two orders of magnitude higher than that which occurs in the dark periods. This in silico observation replicates the physiological water potential gradient along which water flows in plants which is driven by transpiration 40 . Further, the pattern of water flow in the stem tissues being orders of magnitude higher during periods of light is consistent with in vivo data of other plant species 41 . While the role of transpiration in plant hydraulics is well known, the p-ath773 model framework in conjunction with the ORKA provides the opportunity to study the contribution of metabolic water to the flow of water in the plant system. In general, as modeled by p-ath773, it appears that root, stem, and seed tissues take up water and utilize it for their own metabolism, acting as water "sinks". The leaf is however the largest water "sink" in the system since larger amount of water is transpired by the leaf tissue in comparison to that which is used by the metabolism of other tissues. However, the leaf cytosol is a net producer of metabolic water, and the water transported from the cytosol to the extracellular compartment where transpiration is modeled to occur contributes between 60% and 80% of water which is transpired. Major metabolic contributions to the cytosolic water pool appear to be related to various metabolic processes not contained in other tissue models such as nitrate reduction, fatty acid metabolism, and a large number of other metabolic transactions which involve water.
Sulfur metabolism across plant lifecycle. In addition to tracking the flow of water through the plant, the p-ath773 model has also been used to study and track sulfur metabolism and transport across the tissues and the lifecycle of the plant to provide an example of reaction-level window into the p-ath773 modelled plant metabolism. This has been done to provide unique insight into the core metabolism of a single micronutrient which is not as extensively studied as carbon and nitrogen metabolism 19,27,42 , yet sulfur still is important to plant Flux rate values (black patterned bars) are in mmol/gDW·h, where gDW (grams Dry Weight) is in units of the dry weight of the individual tissue, rather than the plant as a whole causing incongruity as metabolites are exchanged between tissues as the flux rates must be scaled by the different tissues masses so none of a metabolite is gained or lost between tissue. Further, as sulfur-containing compounds are allowed to be stored in the model by building concentration, reaction rates may not balance.
growth. The results of this analysis are shown in Figs. 8 and 9, where the former reports mean reaction rates and the latter reports mean concentrations for each specific stage of growth as shown in Fig. 5. Sulfur is modeled as passing through the root and stem tissue and being distributed to the leaf and stem tissues. Some sulfur which has been distributed to the leaf tissue will be returned back to the stem, in the form of amino acids for distribution to the seed tissue, with the remainder being used to produce biomass. The seed accepts amino acids and sulfate from the stem tissue to produce biomass. Here it is evident that, in terms of sulfur metabolism, the seed serves as a "sink" tissue, the root as a "source" tissue, and the leaf as an intermediary. As is shown in Fig. 8, the demand by the plant for sulfur is highest in the latter three stages of growth, where seed tissue is present and growing rapidly, or being loosed and metabolic demand from the seed corresponds to the increased maintenance and senescence of the seed and leaf tissues. The presence of seed tissue as a sulfur "sink" also leads to a high flux rate through many reactions in the sulfur metabolism in the leaf as well as transport of sulfur-containing amino acids through the stem tissue. These observations are largely as expected. Unexpected results are those related to the generally high rate of flux through portions of the sulfur metabolism in the leaf during the seed germination growth stage, and the corresponding low fluxes through these pathways in the seed germination to leaf development transition. From closer observations of metabolite concentrations and reactions rates as shown in Figs. 8 and 9 (see Data S3), it appears that there are seemingly random switches between production, storage, and consumption of various metabolites such as L-homocysteine, methionine, and cysteine in the leaf in the early growth stages.
At some places in the Figs. 8 and 9 metabolic maps, there appear some metabolites which have no initial concentration, yet a high mean concentration in the first stage of growth and mean fluxes away from that metabolite. This seems counter-intuitive. For one such metabolite cysteine in the leaf tissue in the first 60 hours after germination (the seed germination stage), the reaction converting hydrogen sulfide to cysteine has positive flux (average positive flux of 2.72E-3 mmol/gDW·h) for 13 of those hours, negative flux (average negative flux of −1.69E-3 mmol/gDW·h) for 22 hours, and no flux for 25. It appears that the no flux points in particular are positioned such that they occur when cysteine concentration is high, skewing the mean concentration upward. Notably, when the www.nature.com/scientificreports www.nature.com/scientificreports/ stores are used, a number of negative flux rates occur in a row. This skews the average reaction rate downward. It is also shown that cytosolic and extracellular cysteine have high concentrations. This is achieved by near constant interchange of cysteine position through a proton antiport. In the first 60 hours after germination (the seed germination stage) this antiport flows in the direction of the extracellular space 21 of those hours (average flux 0.001337 mmol/gDW · h), in the direction of the cytosol 38 of those hours (average flux −0.00098 mmol/gDW·h), and has no flux only at the first hour when there is as of yet no concentration of cysteine in the cytosol.

Discussion
In the current work, a novel Optimization-and explicit Runge-Kutta-based Approach (ORKA) to dynamic Flux Balance Analysis (dFBA) has been developed. Inspired by the Static Optimization Approach (SOA) to perform dFBA, it seeks to achieve higher levels of model accuracy and solution stability. ORKA differs from the SOA in that it replaces first-order Taylor-series approximations for biomass and concentration steps with Runge-Kuttaand Trapezoid rule-based integration. This provides lower error floors, from O(h 2 ) in the SOA to O(h 4 ) in the ORKA, depending on the Runge-Kutta method used in the ORKA. ORKA has been developed to be general enough that several different Runge-Kutta methods could be applied to biomass step estimates ( Fig. 2A,B) dependent on the error level desired or which could be achieved in the modelled system. As a test system for ORKA, a multi-tissue core metabolism stoichiometric model of Arabidopsis thaliana has been reconstructed (Fig. 3), which includes individual leaf, root, seed, and stem tissues models with unique metabolic roles (Fig. 2). This model, named p-ath773, has defined intra-tissue interactions, interactions with the environment, and certain growth-based parameters defined based on growth stage in an effort to model Arabidopsis growth across its lifecycle by defining several growth stages (Fig. 5). Once p-ath773 has been reconstructed, ORKA has then been applied (Fig. 2C) using Heun's Third Order Rule. When the p-ath773 model using the ORKA (to perform dFBA) is compared to another Arabidopsis model utilizing the SOA (to perform dFBA) 19 , the p-ath773 model in theory has at least a three order of magnitude lower error floor due to the smaller step sizes, increased accuracy of the dFBA approach used, and inclusion of two more tissue types. However, similar comparison with the most recent dFBA work on Arabidopsis lifecycle 19 is not entirely possible since these models are quite different in structure, goals, and results. For instance, the mass of the plant for the 6 to 36 days window of time is quite different between p-ath773 and the model produced by Shaw and Cheung (2018) 19 (see Data S2 for details). In addition, comparing the rate of glutamine synthase in p-ath773 to that of Shaw and Cheung (2018), we find marginal agreement between the two models. One of the primary differences between the models is the direction of the flow of amino acids in the models. While Shaw and Cheung (2018) show nitrate flow from the root to the leaf and then amino acid flow from the leaf to the root, the p-ath773 model synthesizes some amino acids in the roots and those amino acids being transported to the leaf tissue for consumption. Therefore, the direction of amino acid flow is reversed from that of Shaw and Cheung (2018), yet is similar to what is reported in literature [43][44][45] . Further, as the biomass equations are different between the two models, the p-ath773 model has a greater demand for amino acids and nitrogen atoms in its biomass composition than does Shaw and Cheung's (2018). Therefore, by these models having different biomass, different flows of nitrogen, and different biomass composition and demands, it is very difficult to make a worthwhile comparison between the two models on the basis of accuracy as the structure is so different without strongly adapting one model or the other to be more similar to the other. Even though the p-ath773 model lacks a similar model in literature for the purposes of comparison, possibly because different literature sources and goals are used in model reconstructions, it is certain that when ORKA will be applied to a modeling framework comprising of all major tissues and can recapitulate and analyze real plant phenotypes. Further, these differences do not invalidate one model or the other, but rather might consider different metabolic states due to different growth conditions, thereby representing the flexibility of biological systems. In future, OKRA can be applied either by developing more tissue models (e.g., stem and seed) and adding to Shaw and Cheung's model or extending the p-ath773 model to capture the secondary metabolism, and either approach, carefully informed by literature, could greatly add to knowledge of Arabidopsis metabolism.
Using the ORKA to perform dFBA, p-ath773 is able to simulate seven stages of Arabidopsis growth (Fig. 5) and showed agreement with literature on plant-scale growth (Fig. 6) and on some reaction-level metabolic characteristics such as transpiration being a driving force of water flow through the plant system (Fig. 7). One point on which there is lesser agreement between p-ath773 and in vivo plant-scale data is biomass yield, which is 51% in the p-ath773 model but for most species the value is between 70% and 85% in vivo 46 . This disparity is likely due to a few factors. The first is that the literature in vivo data generally accounts for factors such as harvesting and animal grazing 37,38 , which is beyond the scope of the p-ath773 model, allowing for more growth. Further, the metabolic costs of root exudates (metabolites exported by the root to support the root microbial community) are not modeled. This is another potentially considerable drain on plant resources which is not modeled in the p-ath773 model.
The modelled flux rates have been used to study the flow of water through the plant system, and in particular to investigate the contributions of metabolic water to that transpired (Fig. 7) and to investigate the whole-plant core metabolism of sulfur (Figs. 8 and 9). In the former case study, the p-ath773 model has showed that metabolic water may contribute significantly to the amount of water transpired, somewhere between 60% and 80% of the total, and that transpiration drives a strong diurnal pattern of water flow. We hypothesize that the metabolic contribution to the amount of water transpired in vivo is unlikely to be as significant as shown by the p-ath773 model but is still likely to make some contribution. This is because not all water dynamics are accounted for in the p-ath773 model, including factors such as the amount of water necessary to keep new biomass turgid (since what is modeled is dry weight not wet weight) and the amount of water produced or consumed by the plant's extensive secondary metabolism. The former shortcoming is common to all SMs rather than to the p-ath773 model in particular, as all such models only model dry weight. (2020) 10:9241 | https://doi.org/10.1038/s41598-020-65457-4 www.nature.com/scientificreports www.nature.com/scientificreports/ For the sulfur metabolism case study, it has been shown that part of the patterns of sulfur metabolism are as expected such as increased use of and metabolic demand for sulfur when the seed tissue is present. However, some unexpected behavior has also been observed such as higher fluxes through sulfur reactions and comparatively larger concentrations of sulfur-containing metabolites at early growth stages. It is nearly impossible to pinpoint a single cause for the unexpected metabolic behavior of the sulfur metabolism in the early growth stages. This is due to the links between sulfur and energy metabolisms, in that many steps use some type energy molecule. Sulfur metabolism is also closely linked to the proton budget of the plant, in that many transports are proton-coupled. Through links to both the energy metabolism and proton budget, sulfur metabolism is strongly connected with the rest of plant metabolism. Hypothetically, this unexpected metabolic behavior might therefore be advantageous to the plant in energy metabolism and the control of the flow of protons. Particularly in the first two growth stages when the seedling's endosperm and cotyledons are not fully utilized and are therefore providing some amino acids (though notably not cysteine or methionine), fatty acids, and sugars. As modeled, these stores interact with the extracellular space of the leaf tissue, and often require facilitated transport (usually proton-coupling) into the cytosol for use or catabolism. It is therefore possible that these unexpected behaviors aid in the transport of nutrients from the endosperm, by having standing pools of metabolites which participate in proton-coupled transport to better regulate the cell's proton budget. This hypothesis is supported by the fact that these unexpected metabolic behaviors are reduced in magnitude as the amount of nutrients uptaken from the endosperm are reduced, and indeed the concentration of metabolites such as cysteine sharply decrease. These unexpected behaviors then appear to cease all together when the endosperm is fully utilized. While the metabolic network of p-ath773 is too convoluted to prove this theory, it does highlight the usefulness of stoichiometric modelling to identify interactions which may be too complex to deduce through non-systems approaches.
While there are a number of constraints applied to the model, such as biomass yield; maintenance and senescence costs; and enforcing mass ratios between tissues, these constraints apply mostly to plant-scale behaviors. These behavioral constraints generally fall into two categories: whole-plant and tissue-tissue interactions. The former generally ensure that the pattern of modeled plant and tissue growth fits that of in vivo data. The latter generally ensure that mass balance is maintained when metabolites are transported between tissues since each flux rate is in units of mmol/gDW tissue·h and each tissue is of a different mass. Hence, such conversions are necessary. Other constraints which fall in the category of tissue-tissue interactions ensure that nutrient flow is in a logical and well-known direction (e.g. micronutrients travel up from the roots). Few constraints, with the exception of the enforced diurnal patterns of carbon storage, apply on the reaction rate-or metabolite concentrationlevels, leaving a large number of system degrees of freedom at the micro-scale. Therefore, by constraining the macro-scale behavior to what is known, the p-ath773 model can be used to determine what is, or may be, occurring in the plant system with respect to reaction rates or metabolite concentrations. From the allowed uncertainty at the micro-scale level, a study of this level allows investigation of what metabolic processes support and explain the known macro-scale behavior.
This work provides the basis for much future development and sophistication, both in broadening the range of approaches which can be taken to dFBA, and in the potential to use p-ath773 as a basis for modeling other plant systems. Appying ORKA to perform dFBA may provide the framework for other step-by-step dFBA approaches utilizing other ODE solving methods such as Taylor Series, Linear Multistep, or even adaptive step size methods depending on the needs of the modeled system. The current p-ath773 model could be further sophisticated by adding the secondary metabolism of the plant system, which constitutes a significant portion of metabolism in many plant systems. Further, several simplifications have been made regarding tissues, particularly related to seed tissue, at present. For instance, the model currently assumes when the plant is flowering, that flower biomass and metabolism is roughly equivalent to that of the seed. While this results in a simpler model, this model cannot be used to investigate certain metabolic hypothesis such as the cost to the plant resulting from flower pigmentation, pollen, and nectar production. Future work will include developing models for other plant tissues, such as flowers. In addition, as this is a core carbon metabolism model, it is likely quite similar to the core metabolism of other plant systems. Therefore, the p-ath773 model can serve as a basis for the development of lifecycle models for other plant systems, particularly annual eudicots which are of agricultural interest, such as rice (Oryza sativa), potatoes (Solanum tuberosum), tomatoes (Solanum lycopersicum), and soybeans (Glycine max).

Development of the optimization and explicit runge-kutta -based approach to Perform dFBA.
Static Optimization-Based dFBA Approach (SOA). The Static Optimization-Based dFBA Approach (SOA) was first introduced in 2002 and is a method for solving for dynamic changes to a model system on a point-by-point basis (where those points are time), as opposed to the Dynamic Optimization-based Approach (DOA) which solves all points simultaneously 28 . The SOA, and variations thereon, have been applied to Arabidopsis 19 and barley 47 plant models, making it of particular interest for the study of plant metabolism. The SOA is defined as follows 28 : where symbols used are defined in the caption of Fig. 2, in the Results section, and in the "Symbols Used" section of Text S1. Note that v biomass indicates the rate of reaction for the biomass production and is equivalent to the growth rate μ t which will be used hereafter. It should also be noted that biomass concentration, Y t , is actually an element of the I′ set (the set of metabolites whose concentration is tracked). However, Eq. (5) is included here for Y t to be consistent with definitions of SOA in previous works 28 . This is also necessary due to the fact that biomass concentration is of particular interest in stoichiometric modeling efforts. Therefore, even though Eq. (4) simplifies to Eq. (5), Eq. (5) is still explicitly stated. This simplification is accomplished by first recognizing that there is only a single S biomass,j that is non-zero, namely S biomass,biomass which has a value of 1. Making the substitution, the RHS of Eq. (4) reduces to z biomass,t + v biomass z biomass,t Δt. Secondly, by recognizing that Y t is equivalent to z biomass,t and making this substitution on both sides of Eq. (4), Eq. (4) reduces to Eq. (5). The mass step taken at each time point in the SOA method, as shown in Eq. (5), is derived from the Taylor series expansion of e μx around 0. The exponential formulation comes from the fact that the growth rate determined by a SM of metabolism is an exponential growth rate defined by the following differential equation 8 .
whose solution can be represented as follows: to derive Eq. (5) that is used to advance biomass in the SOA, the first-order Taylor series expansion of e μt is used.
Recall that the Taylor series expansion of f(t) around point a is defined as follows.
a t a f a n t a it is well known that f(x) = e μt then f (n) (x) = μ n e μt ∀n, this Taylor series expansion may be simplified as follows.
a a a a n a n 2 2 3 3 finally, since this Taylor series expansion is around a = 0, the Taylor series expansion becomes what follows (knowing that e 0 = 1).
n n 2 2 3 3 in the standard SOA formulation, only the first term is used in the approximation of e μt , therefore: 2 multiplying through by Y t gives the biomass steps which are used by the SOA, namely Eq. (5). Therefore, the error in mass step estimates in the standard SOA method is on the order of the timescale squared, e.g. O(h 2 ). One additional assumption is made in Eq. (4). The differential equation which describes the change in concentration of a specific metabolite is shown below.
by using the separation of variables as the solution method, www.nature.com/scientificreports www.nature.com/scientificreports/ Optimization-and explicit Runge-Kutta-based dFBA Approach (ORKA). The dFBA method developed in this work is similar to SOA in the sense that both solve time points in a step-by-step and cumulative fashion. The Optimization-and explicit Runge-Kutta-based dFBA Approach (ORKA) differs in that different approximations are used in the solutions attempts to increase the accuracy of the estimation of the concentration and mass steps.
, , represents the Runge-Kutta based estimate for the mass step of the model, and Γ j,t represents an estimate of the integral of v j,t . Equation (15) expands on the accuracy of metabolic concentration estimates by leaving the integral term present and by removing the assumption that the reaction rate is time-independent in the time step concerned. This equation will estimate the integral using the multiple-applications Trapezoidal Rule of integration with the generalized explicit Runge-Kutta method. Equation (16) makes a more accurate estimate of the change in mass by using a Runge-Kutta method to better estimate the size of the mass step. With the notation altered to be more consistent with this work, the generic n th order Runge-Kutta method is presented below.
where, for an explicit Runge-Kutta method, the above method constants (a nm , b n , and c n ) are represented in a triangular Butcher tableau such as shown below in Fig. 2B. As previously stated, biomass growth is exponential as shown in Eq. (6).
Therefore: www.nature.com/scientificreports www.nature.com/scientificreports/ points at which k n occurs be equally spaced, then the multiple-application Trapezoidal Rule (i.e., the area of multiple trapezoids is used to estimate the integral rather than a single trapezoid) might be applied to solve the integral presented in Eq. (15). Evenly spaced points at which k n values are calculated is defined that if = c 2 −c 1 must also be true. This may seem like highly specific criterion; however, several specific Runge-Kutta methods or rules fit these descriptions. These include the explicit midpoint method, Heun's Method, Heun's Third Order Rule, Kutta's Third Order Rule, and the 3/8-Rule Fourth Order Method, among others. The Butcher tableaus for these methods can be found in Fig. 2B. Using any of the above-mentioned methods, Eq. (15) could be restated as follows.
t t a n na a 1 1 n 0 the only unknown quantity above is the Runge-Kutta estimate for the mass step (v j,t + Δt ). As the exact value of z i,t + Δt may be necessary to calculate v j,t + Δt , it will be assumed that v j,t + Δt is equal to the arithmetic mean of the reaction rates in the n time points used in the Runge-Kutta method selected plus the starting point. Therefore, Eq. (26) may be rewritten as follows.  the advantage of using a multiple application Trapezoidal rule in estimating the integral in Eq. (15) is that the assumption of a constant reaction rate over the time step may be relaxed and that the error for the multiple application trapezoidal rule is O(Δt 3 ). From this, the ORKA method can be represented as follows. www.nature.com/scientificreports www.nature.com/scientificreports/ n n n f 0 other constraints such as mass balance and nutrient uptake which are necessary for modeling For each Runge-Kutta Step n ∈ N, denoting the current time as t c k f t Y n N ( , ) (20) n n t n Limitation of ORKA to explicit Runge-Kutta Methods. Generally, Runge-Kutta methods are defined as either implicit or explicit. A method is defined as explicit if a ij = 0∀j ≥ i, and implicit if this is not the case. Recall that a ij defines the dependence of one derivative estimate step on another, as shown below.
if a method is explicit, each estimate depends only on previous estimates, whereas if a method is implicit, it may rely on future estimates or even itself. Implicit Runge-Kutta methods therefore are more difficult to implement, requiring the solution of a non-linear system of equations. In this work, f(t,Y) is the solution of a large system of under-defined linear equations (the stoichiometric model) where the best solution is selected by optimization. Using an implicit Runge-Kutta method would require the full model to be included in an under-defined non-linear system of equations and then to be solved by non-linear programming approaches. These approaches neither guarantee that a solution would be found nor a solution found would be optimal 48 . The sharply increased computational costs of using an implicit method combined with the complexity of implementation and the non-guarantee of an optimal solution has made implicit Runge-Kutta methods not worthwhile or attractive for implementation.
Overview of the reconstruction of core metabolic models of leaf, root, seed, and stem tissues.
The seed tissue has been modeled primarily based on a published MFA work 29 allowing an accurate reconstruction of the central carbon metabolism of the seed. Next, the leaf tissue has been reconstructed as a phototrophic tissue to supply carbon to the seed tissue. We next have reconstructed the root model to provide a mechanism for the uptake of water and micronutrients necessary for plant growth. Finally, we have reconstructed the stem model to provide a logical link between the tissues. Additional works that have been used in the reconstruction of tissue models can be found in Data S1.
The seed tissue model. The general workflow which has been used for the development of the four core tissue models is illustrated in Fig. 3. The seed model has been developed first, with the central metabolic pathways based on a Metabolic Flux Analysis (MFA) of four seed genotypes published previously 29 . We then manually have filled gaps in this model with reactions based on literature and genomic evidence 7,49 or with reactions being necessary for ensuring model connectivity. The stoichiometric coefficients of biomass precursors have been determined using sink reactions, dry biomass weight composition, and amino acid mass ratios provided in a previous work 29 (see Text S1). The resultant seed tissue model focuses on storage, respiration, and growth, and consists of 418 reactions, 577 genes, and 390 metabolites GitHub p-ath773 repository (https://doi.org/10.5281/zenodo.3735103) for this work. The leaf tissue model. Next, we have reconstructed the leaf model by taking common reactions/pathways from the seed model and adding metabolic pathways for amino acids that are not synthesized in the seed. In addition, other leaf-specific pathways such as photosynthesis, carbon fixation, and gluconeogenesis and necessary transport reactions have also been added. We then have developed the biomass equation for the leaf tissue using that of a previously published Arabidopsis model 23 (see Text S1), with minor adjustments. First, since the p-ath773 model is designed to focus on core metabolism, secondary metabolites were removed from biomass equation. Second, it was noticed that the amino acid histidine was missing as a primary metabolite from biomass composition. Histidine was added into the leaf biomass in proportion to other amino acids (See Data S1) 29 . The resultant leaf tissue model has focused on photosynthesis, respiration, gas exchange, fatty acid synthesis, and growth, (2020) 10:9241 | https://doi.org/10.1038/s41598-020-65457-4 www.nature.com/scientificreports www.nature.com/scientificreports/ and contains of 517 reactions, 666 genes, and 463 metabolites. We have included the leaf model in the GitHub p-ath773 repository (https://doi.org/10.5281/zenodo.3735103).
The root and stem tissue models. We have constructed the root and stem models, similarly, by extracting common reactions/pathways from the seed model and adding necessary and root-/stem-specific transport and exchange reactions. Then exchange reactions have been added to allow the root to be linked to micronutrient uptake processes from the soil and the stem to be involved in inter-tissue transport processes. In the absence of Arabidopsis-specific estimates, the dry weight composition of switchgrass (Panicum virgatum) root and stem 31 have been assumed to be equivalent to the biomass composition of these tissues in Arabidopsis. Thus, we have found the biomass of root and stem tissues to be composed entirely of carbohydrates. The resultant root tissue model focuses on nutrient uptake, transport, and growth, consisting of 149 reactions, 324 genes, and 149 metabolites, while the stem tissue model focuses on transport and growth, consisting of 167 reactions, 291 genes, and 154 metabolites. We have included the root and stem models in the GitHub p-ath773 repository (https://doi. org/10.5281/zenodo.3735103).
Confidence scoring. We have defined reaction confidence scores in a manner consistent with a previously published protocol 7 . Confidence scores are integer values between 0 and 4, with higher values corresponding to higher confidence in the inclusion for a given reaction. In the scoring system used, 0 corresponds to an unevaluated reaction; 1 to a reaction included for modeling necessity; 2 to evidence from physiology or a genome annotation; 3 from knock-in knock-out in vivo experiments; and 4 for direct biochemical data giving evidence for that metabolic function. The distribution of confidence scores in the component tissue models of p-ath773 can be found in Fig. 2B through 2E. As is shown in these figures, score 3 evidence was not used as score 2 evidence was considered sufficient for model reconstruction and, if greater confidence was required, direct biochemical data could be found since Arabidopsis is a model system. Additional information on confidence scoring of the p-ath773 model can be found in Text S1.
Curation of these four tissue models. All reactions in all four models have been balanced both in terms of elements and charge. Thermodynamically infeasible cycles have also been resolved by removing reactions, breaking composite reactions, and adding metabolic costs to transport reactions. For all four tissue models, GPR links have been established through a largely automated workflow utilizing the KEGG API 50 for the majority of reactions using the code included in the GitHub p-ath773 repository (https://doi.org/10.5281/zenodo.3735103). This has been followed by having manually curated the GPR links and/or inclusion rational of reactions with non-KEGG identifiers. This information can be found in Data S1. The count of tissue model reactions present in KEGG-defined pathways is shown in Fig. 2A, giving an overview of each tissue models' metabolic capabilities. The code developed to create these figures is included in the GitHub p-ath773 repository (https://doi.org/10.5281/ zenodo.3735103). The results of this automated workflow can be found in Data S2. Sources for reactions included in leaf, root, seed, and stem models are shown in Fig. 4B through 4E, respectively through confidence scoring (see Text S1).
Linking tissue models utilizing metabolic constraints and oRKA. Application of ORKA to the p-ath773 model. The application of ORKA to the p-ath773 model is complicated by the fact that there is not a single biomass reaction, but rather four separate reactions, one for each tissue modeled: leaf, root, seed, and stem. Therefore, for the mass of the whole plant, the basic differential equation which defines the change in system mass with time is stated below.
where here Y t indicates whole-plant mass. This could require some complex hand calculations to determine the value of the RHS of Eq. (29) since μ plant,t is not calculated in the p-ath773 model, instead only individual tissue biomass growth rates are determined. This leads to a branching point in how to apply the ORKA method to the p-ath773 model: whether to use whole plant mass or individual tissue masses as the basis of biomass calculations. On one hand, as already stated, the biomass of the whole-plant system could be tracked, which would result in a more complex RHS and formulation of f(t, Y t ). On the other hand, the biomass of each plant tissue can be tracked individually as stated in the following equation.
, , it has been decided to use the former method of tracking biomass because solving Eq. (30) at Y θ,0 = 0 yeilds only 0 as a solution. This can be shown in that the generic solution to Eq. (30) is formulated as follows.
by the multiplicative identity rule, Y θ,t + Δt = 0 if and only if Y θ,t = 0 since no exponential function can take the value of 0. This presents two issues if this is the method of advancing tissue biomass: (i) no tissue can either appear in the system that is not there from the beginning, and (ii) no tissue can be removed from the system. This is particularly problematic for a plant system since certain tissues appear and are removed after the plant reaches certain levels of maturity, perhaps most notably flowers and seeds. Therefore, while more complex, determining (2020) 10:9241 | https://doi.org/10.1038/s41598-020-65457-4 www.nature.com/scientificreports www.nature.com/scientificreports/ the value of the right-hand side of Eq. (29) is preferable. Text S1 details the calculation of the RHS of Eq. (20). The end-result of this calculation is as follows.

( )
Note that the trapezoid rule estimates also rely on even step sizes. This allows for smaller error in both of these estimates. When calculating the derivatives for the first two time points, it is assumed that the parameter values are the same as that for the first time point, e.g. it is assumed that φ −2h = φ −h = φ 0 . This derivative estimate, particularly the equidistant points requirement, is an important consideration in choosing the particular Runge-Kutta method used in this work. These calculations can be found in the "calculate parameters needed to solve next step equation" lines of the pseudocode in Fig. 2A.
Given that the steps taken for the Runge-Kutta method have equally spaced values of c n and that O(h 2 ) where h = (c 2 − c 1 )Δt is the order of error for the estimation of the backward derivative, this limits the order of the Runge-Kutta methods which can be chosen for increased accuracy benefits. For instance, a third order Runge-Kutta method has a global error of O(h 3 ), less than that of the backward derivative estimate. In this case, choosing any Runge-Kutta method which is higher than third order would merely add complexity with no benefits in terms of the error of the solution. Therefore, a third order Runge-Kutta method has been chosen for implementation with the p-ath773 model. Two commonly-used such methods are Heun's and Kutta's third order rules shown in Fig. 2B. We have chosen Heun's third order rule for this application as it provides greater accuracy in the integral estimates (e.g. = ⋅ Δ h t 1/3 as opposed to = ⋅ Δ h t 1/2 ) and there are no negative values in the matrix of parameter a ij of the Butcher tableau which have caused errors in earlier implementations of Kutta's third-order rule to the p-ath773 model (but would not in the current model).
Given that the limiting order of error in this system is the error in parameter derivative estimates, this will be the order of error for ORKA calculations for the p-ath773 model. As the value of h is one third of that of the time step (Δt), we can state that the error would be on the order of ⋅ Δ O t (1/9 ) 2 . This is a significant improvement in the error over previously implementation of dFBA on the Arabidopsis models including that of Shaw and Cheung (2018) 19 , which by reason of using SOA and a time step of one day (as opposed to one hour) results in much higher error potential. Calculating on the basis of hours the big O error ratio between ORKA and the SOA used by Shaw and Cheung (2018) 19 the is approximately 1:5184. This will provide two distinct advantages to the p-ath773 model. First, higher accuracy for calculations due to smaller step size and lower errors associated with approximations used. Second, increased solution stability, so that more solution steps may be taken without ballooning error.
Other constraints in p-ath773's ORKA framework. The tissue models were linked using techniques similar to a well-known computational framework known for modeling microbial communities 51 . This involved specifying how metabolites are allowed to move between tissues in logical ways, which will be described in greater detail later in this section. This framework includes a whole-plant objective which specifies fluxes in each tissue to maximize or minimize. Next, literature information including embryo mass 30,34 , initial tissue masses 33 , growth stages 32 , time points at which growth stages occur 32 , constraints to link tissue growth rates to appropriate tissue ratios, transpiration 52-54 , leaf surface area 40 , usability of provided light 36,37 , and defining changes in tissue mass ratios 32,34 has been integrated into these models, which are typically overlooked in most other SMs. In this work, we have decided to simulate Arabidopsis biomass across 61 days (1464 hours) of growth, as all plant seeds are dispersed by approximately day 61, and after which in vivo data on plant growth and mass is sparse 32 . More specific details can be found in the following sub-sections. The full optimization-based framework used in this work has been provided in the GitHub p-ath773 repository (https://doi.org/10.5281/zenodo.3735103) associated with this work.
Enforcing mass balance. As concentration is tracked using the ORKA method, the mass balance for the system need not be a strict equality, but rather metabolites should be allowed to be stored and that store should be allowed to be used up. To this end, the mass balance has been defined as follows: these equations ensure that metabolites might be stored (i.e., more metabolite is produced than consumed) in all cases or available metabolite concentrations might be utilized. When such a concentration is utilized, the LHS of Eq. (41) becomes negative. Hence, the metabolite is consumed at a greater rate than it is produced, yet still obeys the law of concentration of mass by using up the present stores of that metabolite to account for the difference. These equations do not, however, guarantees against the model producing infeasible reaction rates, as large rates of metabolite production are allowable under Eq. (41). To limit the amount of any metabolite stored at a given time point, the following constraint is implemented.
this limits the rate of any metabolite's storage to 10 mmol per gDW tissue per hour and represents that maximum of the allowable violation of the mass balance in the direction of metabolite storage. Such an allowable violation is allowed in all dFBA models where changes in metabolite concentration are allowed. This value is an arbitrary limit on the rate of allowed metabolite storage per hour since it seemed logical to create some limit on metabolite storage rates. While each metabolite likely has its own individual rate, this rate is not reported in literature for many metabolites, therefore a universal, arbitrary number was chosen.
Constraints on metabolite flow. As has already been suggested, it is not necessarily logical for metabolites to flow between tissues without suitable constraints on that flow. For instance, water is taken up by roots and transported first to the shoot, then to the leaves and seed tissue (if present). It would not, for instance, make sense for water to travel directly from the root tissue to the seed tissue. Therefore, instead of a single metabolite pool connecting tissue, there are metabolite pools connecting each pair of tissues. These pools are shown in Figs. 3 and 5 as arrows and circles between tissues. The following subsections describe how individual metabolites or groups of metabolites are constrained to logical flow through the system.
Water. Mathematically, for the flow of water these logical metabolite links take the following form.
root t root water r oot water bound , , , where x tissue has been defined as the tissue mass fraction with respect to the total mass of the plant, c tissue is defined as the change in tissue mass fraction with respect to seeding, and x tissue is defined as the initial mass fraction of each tissue. The gain in the seeding parameter has been assumed to be linear with time and is fit to experimental time point describing the fraction of flowers produced 32 (see Data S1 and Text S1).
Defining the availability of light. The amount of light available to the model to use for photosynthesis has been defined initially by literature sources used for other constraints 56 , and scaled by the transmittance of that light source (fluorescent lights) 36 and the absorbance of Arabidopsis leaves 36 and surface area to plant mass of Arabidopsis leaves 40 . This has been approximately estimated to be 4.00 mmol·gDW plant −1 ·h −1 . This value has been shown to be 21.50% of the total photons output by the fluorescent light (see Data S1 and Text S1).
Defining model maintenance and senescence costs. An important consideration in any SM is the definition of a maintenance cost, which is typically defined as ATP hydrolysis 7 . Biomass-based maintenance and senescence costs have been defined as they have been suggested as more accurate or applicable for plant systems 38,39 , but have not yet been used in an SM. We have defined maintenance and senescence costs as a biomass drain on each tissue scaled by tissue mass in Eq. (30). A maintenance cost value of k m = 0.03 day −1 has been defined which is in an order of magnitude typical for plant systems 60 , and the same value has been defined for plant senescence, k s , as this parameter appears to be generally of the same order of magnitude 38,39 . These rates are then converted into their per hour equivalent and scaled by tissue mass to enforce these constraints. Only a single constraint has been defined for both phenomena as both are biomass drains whose effect is additive. Literature evidence, including pictorial evidence of plant phenotype at various growth stages, appears to suggest that the rate of plant senescence increases drastically as the flowering production stage finishes and the silique ripening phases begin (in literature, growth stage 0.65 to 9.70) 32 . Further, it appears that the plant no longer maintains current mass, but allows tissues to die and desiccate 32 . This has been included in the p-ath773 model in that plant senescence is increased by an order of magnitude and plant maintenance is set to zero following the end of the Flower Production stage.
Defining model objective functions. For all analyses and results, the objective function of p-ath773 has been to maximize the sum of the biomass production rates for all four tissues according to the following equation (referred to as the default objective). where z has been defined as the objective variable with v growth,tissue being defined as the rate of biomass production, in units of h −1 , of the tissue referenced. The maximization of this objective function is approximately equivalent to maximizing the growth rate (change in mass per unit time) of the plant as a whole. This objective function, in early model iterations, has led to one major issue, namely how to avoid the model producing only the metabolically "cheapest" tissue which could result in the maximum objective value but is biologically unrealistic. This is addressed by Eqs. (23) through (28) and will be further discussed later in the methods section. It has been noted that the maximization of plant biomass has not been the only feasible objective function for plant SM system; for instance, one alternate objective function is the maximization of plant photonic efficiency 13,22 . This objective has generally been framed as minimizing the amount of light used by the plant system, given a required growth rate 13,22 . As the purp authors gratefully acknowledge funding ose of this paper is to showcase the ORKA method, rather than the p-ath773 model, alternative objective functions have not been implemented but are possible to implement.
Flux Variability Analysis (FVA) has also been performed on the p-ath773 model which uses all previously defined constraints and the previously defined ORKA method. All flux bounds and constraints are the same and the FVA has an objective function defined as follows: where the FVA model solution has been iterated for each reaction j, and σ j has been valued at 1 for the current reaction whose maximum and minimum are to be investigated and 0 for all others and is stepped through first maximizing and then minimizing each reaction. Due to restrictions of the time allowed for model solutions, nine points has been selected at which to perform FVA. These points are 1 hour after germination (HAG, seed germination stage, dark), 70 HAG (seed germination to leaf development transition, light), 90 HAG (seed germination to leaf development transition, dark), 177 HAG (leaf development stage, light), 181 HAG (leaf development stage, light), 770 HAG (flower production stage, light), 810 HAG (flower production stage, dark), 1155 HAG (flower production to silique ripening transition, light), 1170 HAG (flower production to silique ripening transition, dark), 1190 HAG (silique ripening stage, dark), 1199 HAG (silique ripening stage, light). These results generally showed narrow ranges for allowable flux rates.
Software platforms used. Some of the files used in this work and included in the GitHub repository are files of programming codes, and all code is not written in the same programming language. Text S1 lists which programming language each file which is a code utilizes. www.nature.com/scientificreports www.nature.com/scientificreports/ been run using the Holland Computing Center at the University of Nebraska, Lincoln. In addition, portions of the GitHub p-ath773 repository utilize the module the LWP (the world-wide web library for Perl) version 6.39 and have been run on a Windows desktop computer.