The intertwined metabolism during symbiotic nitrogen fixation elucidated by metabolic modelling

Genome-scale metabolic network models can be used for various analyses including the prediction of metabolic responses to changes in the environment. Legumes are well known for their rhizobial symbiosis that introduces nitrogen into the global nutrient cycle. Here, we describe a fully compartmentalised, mass and charge-balanced, genome-scale model of the clover Medicago truncatula, which has been adopted as a model organism for legumes. We employed flux balance analysis to demonstrate that the network is capable of producing biomass components in experimentally observed proportions, during day and night. By connecting the plant model to a model of its rhizobial symbiont, Sinorhizobium meliloti, we were able to investigate the effects of the symbiosis on metabolic fluxes and plant growth and could demonstrate how oxygen availability influences metabolic exchanges between plant and symbiont, thus elucidating potential benefits of inter organism amino acid cycling. We thus provide a modelling framework, in which the interlinked metabolism of plants and nodules can be studied from a theoretical perspective.

: Process of compartment localisation. Assigning compartments to reactions was performed in three steps. The first step comprises a BLAST search for Arabidopsis homologues and obtaining proteomic data [5,6]. In a second step, network extension [8] is applied to ensure that every organelle can perform key functions. The uncompartmentalised genome-scale model serves as a reference network. In a third step, it is ensured by manual curation that all compounds are producible which were producible in the uncompartmentalised network.
unique metabolic species in the model. Genetic evidence is present for over 70% of all non transport reactions.

Computational Analysis
Network self-consistency During cellular growth, all intermediate metabolites are diluted and have to be replenished. These metabolites are not included in the biomass, but the necessity to reproduce them results from the fact that cells grow and divide [14,15]. We therefore first tested whether the model is self-consistent in the sense that all metabolites can be produced from the nutrients using constraint-based modelling (see Methods). We found that 1161 of the 1370 metabolites could be produced. The remaining 209 metabolites fall into three main classes.
The first class, containing 121 metabolites, includes those substances which form conserved quantities. Examples are the acetyl carrier proteins linked to fatty acids, macromolecular redox carriers, such as ferredoxins, and tRNAs, which are necessary for the incorporation of amino acids into proteins. Most of these macromolecules are small proteins and RNA molecules which are implicitly contained in the biomass as part of the protein and nucleic acid fractions. Since the model can produce biomass (see below), biosynthesis is possible for over 86% of the compounds in this class. The remaining are small molecules, which cannot be biosynthesized because they are themselves needed for their own biosynthesis. An example is quinate, for which no biosynthetic route is present in MetaCyc, where it is only included as a substrate for degradation. However it is used in one of two routes from coumaroyl-CoA to caffeoyl-CoA, in which it is needed as a carrier compound.
The second class contains 65 metabolites, which are not connected to the remaining network. This group contains many of the biotin biosynthesis pathway intermediates (6 compounds). The reason why this pathway is disconnected is that it is unknown how the pimeloic acid moiety necessary for biotin biosynthesis is synthesised in plants. Most of the other metabolites (37) in this class are produced by specific enzymes, but no pathway for the production of their precursors is known.
The third class consists of 23 metabolites, which result from the degradation of modified macromolecules (e. g. polyfructofuranose), or which are present only in a degradation pathway for xenobiotics (e. g. cyanate).
This phenomenon of metabolites which can only be 'produced' by degradation was discussed in detail earlier [8].
Differences between day and night cont'd Of the essential metabolic reactions (excluding transporters) mentioned for day and night biomass production over 90% have genetic evidence. Our knockout simulations further revealed an additional set of 16 and 18 reactions for growth in the dark and light, respectively, which are essential in the sense that they must be present in at least one compartment, but in the compartmentalised model they have been assigned to several compartments. The two additional reactions for which at least one isozyme has to be present in light conditions are the triose isomerase reaction and phosphoglycerate kinase, which are both necessary to utilise Calvin cycle-derived triose phosphates. Using the objective to minimise the total flux through the system while producing a given amount of biomass (see Methods), we found that a set of 430 (dark) and 423 (light) reactions, of which 72 are transporters, are sufficient to produce all biomass components.
Energy requirements for biomass precursors During night, plants use the stored starch as carbon and energy source. To investigate the nightly energy metabolism, it is interesting to know how much energy and reductant are minimally required to build essential building blocks, such as nucleotides, amino acids and organic acids. The theoretically predicted minimal energy and reductant requirements for selected compounds are depicted in Fig. 3, when using as nitrogen source either ammonium (clear bars) or nitrate (shaded bars).
Obviously, the requirements depend critically on the nitrogen source. Growing on ammonium, most nucleotides and some amino acids, along with some organic acids and sugars can theoretically be directly produced from starch without additional ATP obtained through respiration (a full list including the reductant and energy costs is provided in the Supplementary Table S2). In fact, some of the compounds can be synthesised from starch while even producing a small surplus of energy equivalents. In contrast, with nitrate as nitrogen source, the synthesis of all amino acids and nucleotides requires additional ATP and NAD(P)H, for which some of the starch needs to be respired. This is not surprising because the reduction of nitrate to ammonia requires four reducing equivalents. Using nitrate exclusively, only organic acids are still producible from starch without additional energy demands. Because they do not contain nitrogen, their requirements are independent on the nitrogen source. Our calculations are in agreement with earlier observations that heterotrophic cell cultures of A. thaliana grown on nitrate tend to produce higher levels of organic acids and sugars and lower levels of reflect experimentally observed differences in growth rates on nitrate and ammonia [17] and comparison to real growth rates during the night (see Discussion) may be helpful to understand the quantity and nature of maintenance metabolism.
Nitrogen metabolism in the two tissue model cont'd The scan presented in Fig. 4 visualises the gradual change of fluxes with changing nitrogen source, presented in Figure 2 of the main text. As explained, the amount of required starch (dash-dotted dark blue) drops with increasing availability of ammonium. This directly reflects the higher costs for nitrate assimilation. However, with more available ammonium, the activity of the mitochondrial respiratory chain and, concomitantly, the mitochondrial ATP synthase (dotted light blue) are increased. This observation can be explained by considering the changes in redox balance: More nitrogen available as ammonium results in a reduced reductant requirement to reduce nitrate. Therefore, more of the reductant produced in the TCA cycle (dotted green) can be used to generate ATP. The plot simultaneously shows the change of proton (dashed orange line) and carbonate (dashed purple) export to the soil.

Metabolic Model Construction
The model construction procedure is summarised in Figure 5. Depicted are key steps in model development and the required data source to perform each step. Initial Network Generation. The model is based on version 3.5v5 of the Medicago truncatula genome annotation [1]. The initial construction was performed using PathoLogic enzyme name matching from the PathwayTools program [4] and makes use of the MedicCyc (v1.0) database [2], which is based on the BioCyc [3] suite of databases. Information from MedicCyc (v1.0) was incoporated based on homology analysis between gene information in MedicCyc and the 3.5v5 genome annotation. This resulted in a list of enzyme-catalysed reactions converting mainly small molecules. However, the MetaCyc database also includes a number of reactions which contain macromolecules. In general, metabolic network models are limited to reactions acting on small molecules. Macromolecules, such as proteins or glucans, are not specifically included because of the enormous number of theoretically possible polymers. However, some macromolecules play important roles in metabolism and require special treatment. Among these are in particular small carrier proteins such as ferredoxins (for charge transfer reactions) or acetyl carrier proteins (for fatty acid biosynthesis). As essential cofactors, these molecules have to be explicitly included in the reaction stoichiometries but their synthesis pathways are outside the scope of a metabolic network model. To verify mass balance of reactions involving such molecules, we exploit that these macromolecules in their various modified forms form conserved moieties, i. e. that their core structures are not changed by any reaction in the network. For reactions involved in synthesis or degradation of polymeric sugars, such as starch, we replaced the compounds representing the polymers of a non-specified degree of polymerisation (such as starch(n)) by the monomers in such a way that the overall reaction is correctly balanced. To this extent, a xylan polymer for example is assumed to consist of dehydrated xylose subunits (C5H8O4).
Some reactions added to the model by PathwayTools had poor evidence. The name matching algorithm adds reactions if the assigned enzyme names in MetaCyc match those in the genome annotation, which leads to problems with unspecific annotations such as "alcohol dehydrogenase" or "aldehyde dehydrogenase". Such descriptors resulted in the addition of many reactions that were disconnected from the remaining network.
We therefore removed reactions from the model which fulfilled all of the following conditions: 1) a contained metabolite is only present in this reaction, 2) the reaction cannot carry any flux under the condition that all nutrients can be imported and all metabolites can be exported, and 3) there exists no evidence for the presence of the reaction beyond a match of unspecific reaction names.
Gap filling. Gaps in the initial network reconstruction were identified by verifying that all experimentally observed metabolites were actually producible from the nutrients. For this, we collected a list of experimentally verified metabolites from various studies [18,19,20,21,22,23]. We then computed for each of these metabolites whether a stationary flux distribution exists (see below) such that this metabolite is produced while only the nutrients are consumed. When this check failed, the MetaCyc database was searched for plausible candidate reactions which needed to be added in order to allow for a production of the respective metabolite. The identified reactions were manually added to the network.
Atomic and Charge balancing. We intended to create a model which can simulate both photosynthetic and heterotrophic growth. Because proton gradients are essential in both regimes, we needed to ensure all reactions are charge balanced. Using MetaCyc charge information (all compounds protonated to a pH of 7.3) we checked all reactions for their charge balance. Apparently, several MetaCyc reactions have been curated to achieve mass balance by adding protons. This is in particular true for reactions in the phospholipid desaturation pathway, but also in reactions representing dehydrogenase or mixed function oxygenase activities.
Where possible, we curated these reactions by assuming common desaturase, dehydrogenase or mixed function oxidase activities. In general, when protons had been added for mass balance reasons they were replaced by NAD(P)H + H + /NAD(P) + pairs to achieve charge balance. This replacement was done by applying the general rules which are laid out in Table I.
Atomic balance was checked using the method detailed in Gevorgyan et al. [24] and only light was shown as unconserved. This is to be expected, as light does only transfer energy into the system and does not contribute towards atomic balance.

Model Compartmentalisation
In order to realistically describe the metabolism of a plant cell, it is important to develop a fully compartmentalised model. Each compartment fulfils specialised tasks while key processes, such as ATP generation, are only possible due to gradients established over inter-compartmental membranes. Key challenges when building a compartmentalised model are a) to realistically assign compartments to each reaction, b) to define transport reactions over intracellular membranes and c) to ensure that the resulting model is not able to generate and maintain gradients which are thermodynamically infeasible.
Assigning compartments to reactions To overcome the first challenge, we applied the following approach: We first scanned all available protein sequences for annotated genes against the Arabidopsis thaliana genome on TAIR [25] and extracted information about localisation from the SUBA database [7]. We assigned compartments to only those reactions for which experimental evidence, such as GFP-localisation, MS identification, TAIR data, and swissprot data, existed. Because computational predictions are known to result in a large number of false positives, they were not included during this step. In addition, two recent proteomic studies for root plastids [5] and mitochondria [6] in M. truncatula were used to assign compartments to reactions. For this, the provided gene identifiers, which were based on different versions of the Medicago truncatula Gene Index [26], were translated to gene identifiers of version 3.5v5 of the M. truncatula genome annotation using BLAST. All these sources of information were integrated and if evidence for the presence of a protein in a specific compartment was found in at least one source it was added to this compartment. This initial process resulted in a draft compartmentalised network with still a large number of reactions not yet assigned to compartments. In a next step, we used an established network modelling approach [8]   in order to ensure that a number of 'seed' metabolites can be metabolised into a number of 'targets'. While for a whole organism, the seed can directly be inferred from defined growth media, and the targets can be obtained by metabolomic data, their definition is not straight-forward for single compartments. We obtained seed and target sets for each compartment by an extensive literature research for metabolites which are known to be imported into the compartment or produced by the compartment. This step was performed for all compartments except the cytosol and the vacuole. This procedure is difficult for the cytosol, because it is in direct contact to all other compartments. Similarly, the vacuole is a main location for degradation and storage, making a meaningful definition of seeds and targets difficult. For the vacuole we therefore used the gene localised reactions for the initial compartment. We manually assigned additional reactions only if they were missing steps in degradation pathways for which a majority of reactions was already present in the vacuole. Whereas in network extension for whole organism networks, usually a network containing all reactions stored in a biochemical database is used as a reference, we here use all reactions of the uncompartmentalised network as reference, because our goal is to assign each reaction a particular compartment. The lists of seed and target metabolites used for this process and the resulting reactions added to each compartment can be found in Supplementary Table S1. All reactions which remained without assigned compartment after applying network extension to all compartments were subsequently assigned to the cytosol, with the exception of the sterol metabolism which was assigned to the ER.
Transporters Describing transport processes of metabolites over membranes is a challenge in all compartmentalised metabolic networks. The main difficulty is that exact knowledge about transported metabolites is still missing. Further, the existence of several transporters has to be assumed without experimental evidence due to localisation of biosynthetic enzymes. Recently, Linka et al. reviewed many characterised transport systems for plants [13]. However, even this comprehensive review must necessarily miss transporters for which no molecular characterisation exists yet, but which are essential for production of different metabolites. We added transporters between compartments to ensure that all metabolites which were producible in the uncompartmentalised network were still producible in the compartmentalised network. For this, we used the information given in Linka et al. [13] complemented by several other sources [10,11,12], and otherwise included transporters which seemed biologically most plausible. In the presented model, we assume that most transporters do not carry a net charge. This is true in many instances, but more detailed information on the charges carried by transporters would further improve the model description and provide further insights [27].
A full list of added transporters is given in Supplementary Table S4.
Maintaining thermodynamic integrity Another challenge arises with protonation states of different compounds. This is particularly important for the mitochondrial compartment which harbours the electron transfer chain for ATP synthesis. Because there are plenty of transporters over the mitochondrial membrane, we had to ensure in the model description that these transporters do not allow a net-charge exchange over the membrane, because otherwise proton gradients might be generated within the model by a simple shuttling of metabolites. This would result in the generation of ATP uncoupled from energy provided by exergonic reactions, thus rendering the model thermodynamically infeasible and predictions regarding ATP generating fluxes meaningless. All transporters over the mitochondrial and plastidial membranes were therefore balanced for charges by adding protons in a way that a net charge of zero is translocated. The only exception to this were reactions from the electron transfer chain for which proton translocation is known. Similarly, import and export of metabolites over the cellular membrane is critical for charge balancing the system. We include uptake and export of charged particles by importers and exporters. Additionally, we introduce a proton and bicarbonate exchanger. In our simulations, we applied the constraint (see below) that the net charge exchange is zero. This resulted in the model behaviour that protons and bicarbonate were used to balance excess uptake or export of charges. The biomass reaction itself was charge-balanced by adding protons to ensure that the net charge of the biomass is zero.

Generation of a two tissue shoot-root Network
To more realistically describe processes taking place in root cells, we have constructed a tissue-specific subnetwork. We used the presence calls provided in Benedito et al [28] to determine core reactions sets for shoots and roots. We assume a gene to be present if it is determined as present in a majority of experiments. Shoot core genes are the combination of all genes present in stems and leaves. Root core genes are those genes determined as present in the root. The core reaction sets used for the model generation were determined by matching the determined genes using the gene-protein-reaction association rules in the model, and determining which reactions could be activated by the present genes. The model was then duplicated as a root and a shoot model and exchangers between the two tissues were added similar to the concept detailed in A full list of tissue exchanges is provided in Supplemental Table S3. All uptakes from the external medium except for CO 2 , light and starch were removed from the leave model, and glucose, starch, light and sucrose exchangers were removed from the root model. This combined model was made consistent, i.e. all reactions which could not carry flux were removed using FASTCC [29]. The consistent model was then subjected to the FASTCORE algorithm [29]. FASTCORE extracts a small consistent network from the original network, in which all reactions from a given core set can be activated. We used those reactions as core for which the models gene-protein-reaction relationships returned true, if the core genes for the respective tissue were set as active. In addition, some reactions were turned off to generate the individual models. This step was repeated four times with slightly modified models to obtain four individual conditions: 1. Growth using ammonium as sole nitrogen source during the day.
2. Growth using ammonium as sole nitrogen source during the night.
3. Growth using nitrate as sole nitrogen source during the day.
4. Growth using nitrate as sole nitrogen source during the night.
For day conditions, the biomass reaction without starch in the leaves, and starch import were turned off, while night conditions had the starch containing biomass and light import turned off. For growth on nitrate, the ammonium uptake was shut down and for growth on ammonium, the nitrate exchanger was set to 0, respectively. Day conditions implied production of a biomass containing starch along with availability of light, while night conditions only allowed starch as a carbon and energy source while producing a biomass without starch. Finally, all reactions present in at least one of these four models were combined to yield the two tissue model. Performing only one FASTCORE calculation could have led to disconnected subnetworks, where e.g.
light could be used to fix carbon but the carbon might not be linked to the remaining network. We employed multiple constraints for the analysis of the model. An important factor, atp maintennce, was estimated based on data from DeVries et al. [30] for Trifolium repens, white clover, the closest relative for which data is available.
This plant needs 15 mg glucose per gram dry weight per day for maintenance [30]. With a regenerating ability of about 32mol ATP per mol glucose, this translates to an hourly maintenance cost of about 111.1 µmol/(g h), a number which has to be distributed between the two tissues. With an approximate 1:2 ratio (root:shoot) the atp maintenance for shoot was set to 74 µmol/(g h) and the root maintenance set to 37 µmol/(g h) Similarily the biomass production combines two thirds of a shoot biomass and one third of a root biomass to one µmol of biomass, with a weight of 1 g. We assumed a normal growth rate of 0.1 g/(g d) based on data for M. sativa [31] for simulations where a fixed biomass production was required. The maximal starch consumption in leaves was set to 46.32 µmol/(g h), based on the starch storage in our biomass experiments during the day. The maximal rate of photon uptake was set to 1000 µmol/(g h).

Building a symbiotic system
To be able to compare fluxes in the rhizobium with fluxes in the plant, we had to adjust all fluxes to use a common unit, which we based on plant dry weight. To achieve this we used the following data: The ratio between nodule dry weight and plant dry weight is in the range between 0.01 g/g [32] and 0.035 g/g [33]. In addition, there are roughly 8 · 10 10 bacteroids per gram nodule [34] and a bacteroid weights approximately 3 · 10 −12 g [35]. Thus the total amount of bacteroids per gram of plant dry weight is in between 2.40 mg/g and 8.40 mg/g. Table II lists the original and adapted ranges of fluxes. We assumed a symbiont maintenance Table II: Flux constraints for metabolite exchange between rhizobium and plant. For conversion between bacteroid and plant weight a ratio of 8.4 mg/g was assumed.
* : This constraint was used to limit the total amino acid exchange (uptake and export). This is the maintenance energy commonly used in E.coli [36]. After combining the models, we noticed, that our biomass formulation has a very large asparagine fraction. This is likely due to the fact, that we fed our plants with excessive amounts of nitrogen to avoid any nodulation. Therefore, we reduced the amount of free asparagine in the plant biomass formulation to 5% of the original amount. Without this adjustment, the maximal growth rate with maintenance energy is at about 0.04 g/(g d), which is about 53% lower than with the adjusted amount. In addition, the rhizobium spotted a citrate lyase reaction, which by acting in reverse allowed the production of some ATP. Since this protein commonly catalyses the forward reaction, we set its lower bound to zero. For our simulations, the symbiont was assumed to have finally differentiated and therefore stopped growth, i.e. not consume the nitrogen it fixes, de facto becoming an additional compartment of the plant.

Network analysis
Genome-scale networks are most conveniently described by the stoichiometry matrix N, in which columns represent biochemical reactions and rows correspond to metabolites, and the coefficients define whether a metabolite is consumed (negative) or produced (positive) by a particular reaction. The dynamic behaviour of the metabolic network is governed through the stoichiometry matrix by where S is the vector of metabolite concentrations, v is the vector containing all reaction rates and p is a vector with all kinetic parameters. Assuming that the system is in stationary state, the dynamic equation is reduced to the stoichiometric equation which allows to derive statements about the flux distribution v from the stoichiometry matrix alone. This system is usually under-determined, meaning that many flux distributions fulfil the equation. To restrict the solution space, further constraints are defined. These reflect thermodynamic constraints restricting the direction of some reactions, as well as experimentally observed constraints, such as maximal uptake rates and measured growth rates combined with biomass composition. Such limitations are in the most general form described by inequalities where l j and u j are lower and upper bounds restricting the flux through a reaction or transporter j. In our case, an important constraint also reflects the fact that the net charge accumulation must be zero. This constraint is written as where q i is the net charge translocated over the cellular membrane by transporter i. Network analysis was performed using ScrumPy [38] (mudshark.brookes.ac.uk/ScrumPy) and the COBRA toolbox [39] with CPLEX as solver.
Producibility of metabolites To verify whether a particular metabolite X is producible, a reaction v k : X → ∅ is added to the network. Then, it is tested whether a solution vector v exists, such that the stationarity condition 2 and the constraints 3 and 4 are fulfilled, for which v k > 0. If the system is solvable, there exists a stationary flux distribution which consumes only nutrients and produces (at least) metabolite X.
Energy and redox requirements To find a theoretical minimum of energy and reductant requirements to produce a certain metabolite k, we introduce as above a reaction v k : X → ∅ and two reactions consuming Flux Balance Analysis The identification of special solutions, which optimise some plausible objective function, is subject of Flux Balance Analysis (FBA, [40]). The choice of the objective function is an unsolved problem. A common assumption is that fluxes are arranged such that biomass yield is maximised [41]. However, this assumption is experimentally supported only for E. coli [42,43,44]  is split into glucose and glucose phosphate and unphosphorylated glucose is exported, a slightly lower overall flux is obtained than for the more realistic solution in which the extra ATP is produced which is required to phosphorylate the second glucose.
Performance scans of metabolism Some demands on the metabolic network are unknown. It is, for example, unknown how much energy is required for maintenance or how many reductants are required to maintain the redox balance. By fixing fluxes representing these requirements to a certain value and systematically varying its quantity we can investigate how the system responds to external challenges. For this, an additional constraint, v i = α, is introduced and the equation system is solved for a certain range of values of α. To determine varying energy and redox demands, we introduce two additional reactions, an ATP hydrolysis Protein extraction, precipitation and hydrolysis Protein extraction, precipitation and hydrolysis was performed according to Williams et al. [48] using a Urea/Thiourea extraction. Analysis was performed by GC-MS. twice with 4 mL of ice-cold acetone and again centrifuged as above. The pellets were allowed to stand in a flow hood to evaporate any acetone left. Protein weights were taken after drying overnight. The protein pellets were re-suspended in 2 mL of 6M HCl and incubated in a heat block at 100 • C for 24 hours for hydrolysis. Samples were cooled to room temperature and an aliquot (20 µL to 50 µL) of the hydrolysates (containing amino acids) were transferred to eppendorf tubes and dried in a vacuum dryer overnight to remove HCl. The dried samples were subjected to GC-MS analysis.
Starch Starch was extracted according to an established protocol [48]. The insoluble pellet obtained after the methanol chloroform extract was used as starting point. Each sample was gelatinized by autoclaving the residue in 3 mL of 25 mM sodium acetate for 3 h at 121 • C, 1.04 bars pressure followed by enzymatic digestion with 20 U α-amylase (Sigma-Aldrich) and 5 U amyloglucosidase (Sigma-Aldrich) for 16 h at 37 • C.
The samples were centrifuged and the supernatant containing the glucose monomers was collected, freeze dried and stored at −80 • C until further analysis. Starch amounts were determined with an enzymatic assay.
Gas chromatography-mass spectrometry GC-MS analysis was done as in Masakapalli et al. [16] and Williams et al. [48] using an Agilent 79890 GC coupled to an Agilent 5975 quadrupole MS detector, electron impact ionisation (70 eV) equipped with an Agilent HP 5-ms column (30 mm, 0.25 mm inner diameter) at the facility in the Department of Plant Sciences, University of Oxford, UK.
The protein hydrolysate extracts were derivatised by TBDMS [50] whereas the soluble extracts were derivatised by MeOX TMS [46]. To obtain the TBDMS derivatives, the dried samples were first dissolved in 25 µL