## Introduction

Biological heterogeneity is a challenge even in the most ideal growth conditions in a laboratory. Most of the commonly used methods to describe or model metabolism assume that he average cell in the population is an adequate representation of the culture if the medium is well-mixed. While these assumptions might be valid for fast growing heterotrophic bacteria and yeast, they are not as easily justified for more complex organisms or growth conditions such as filamentous bacteria or biofilms. In these cases, environments are highly variable from cell to cell; there can be wide ranges of nutrient or light availability, temperature, pH, and other important growth parameters1,2,3,4,5,6. These small differences in growth conditions can lead to large differences in phenotypes even amongst cells grown in the same culture. This presents a challenge both in experimental design as well as computational modeling of biological processes and organisms to ensure that what we are measuring or predicting is representative of the entire population.

One computational tool that has been used widely in investigating the metabolism of microorganisms are stoichiometric metabolic models7. The most widely used stoichiometric metabolic models are constraint-based linear programming models which vary in complexity from the relatively simple flux balance analysis (FBA) to more complex FBA models which integrate regulatory and/or thermodynamic constraints8,9,10,11,12 or time-dependent responses13,14. In cases where growth occurs over a longer period, dynamic FBA (dFBA) can be used to visualize changing constraints and media concentrations13. Unfortunately, however, the way these models are formulated limits their use to modeling a single cell or an average cell in a population. These models have proved to be incredibly useful for strain design of heterotrophic bacteria in well mixed batch reactors11,13,15. Stoichiometric metabolic models have also been extended to model more heterogeneous populations, such as binary or tertiary bacterial consortia16,17,18,19,20 by adding additional compartments for each species and allowing instantaneous metabolite exchange. This is not representative of what is actually occurring in these environments though as diffusion of metabolites is an incredibly important limitation to cell growth and interactions. The current benchmarks for modeling more heterogeneous populations are OptCom21 and d-OptCom22 (the dynamic version). This modeling approach uses multiple objective optimization technique which allows the model to capture any type of interaction (synergistic, antagonistic and neutral) for any number of cells or distinct organisms. While each of the metabolic modeling approaches above do move toward capturing more heterogeneity in the population at the cell level, they leave out important phenomena occurring outside the cell, such as diffusion of metabolites and nutrients or movement of cells.

We have developed Multiscale Multiobjective Systems Analysis (MiMoSA), an advanced metabolic modeling framework in order to more accurately model the metabolism and cellular interactions of complex systems. This approach uses a multi-scale multi-paradigm metabolic modeling approach to leverage the ease of implementation of stoichiometric metabolic models while integrating the spatiotemporal tracking of cells, nutrient diffusion, cell-cell interactions and cell-environmental interactions. This approach requires the use of both continuous and discrete variables as well as several different mathematical formalisms to reflect the multilevel behavior in populations. Therefore, we use an agent-based modeling (ABM) framework to allow direct interaction of different levels through the encapsulation of physiological, environmental, and metabolic models. ABM is a bottom-up modeling approach; the model is made up of a set of agents, which are allowed to act independently as long as they follow distinct rules of behavior defined by the user, this allows us to simulate emergent behavior of complex communities that arise from individual agent behaviors23,24,25,26. The system behavior emerges as a result of the many (tens, hundreds, thousands, millions) individuals, each following their own behavior rules, living in a defined environment, interacting with each other and the environment24. The integration of multiple modeling formalisms to represent disparate sub-systems is a trend common in engineering and science domains27,28,29,30,31 and has recently seen some developments in the systems biology area10,12,14,32,33,34,35. Agent-based modeling has been previously applied to both intercellular8,36 and multi-cellular processes37,38 but has not previously been used to model metabolic fluxes. This multi-scale multi-paradigm approach represents a novel method of integrating individuals (through agents) with previously leveraged dFBA formulations13,39, thereby discretizing and separating variables for computational efficient solutions with low a priori knowledge.

As a proof-of-concept study, we chose to model Trichodesmium erythraeum, a filamentous diazotrophic cyanobacterium. T. erythraeum is a major contributor to the global nitrogen cycle; it is responsible for fixing an estimated 42% of all marine biological nitrogen40 and it leaks 20–50% of the nitrogen it fixes41, providing surrounding organisms with a biologically available nitrogen source. Unlike other diazotrophs, which either spatially or temporally separate the oxygen sensitive nitrogenase enzyme from the water splitting reaction of photosynthesis (oxygen production), T. erythraeum is unique because it simultaneously carries out nitrogen and carbon fixation during the day in different cells along the same filament (trichome) with metabolic as opposed to physiological control. We also have previously studied major metabolic differences between the two cell types42. Therefore, it is the ideal model system for the development of MiMoSA: it has structurally identical cells that are prone to two subsets of metabolic constraints yielding two major metabolic subsets (photoautotrophic and diazotrophic), a published genome scale model42, transcriptome data, and a plethora of in situ and laboratory data to both train the model and validate predictions. We use this organism to highlight the advanced capabilities of the MiMoSA framework to predict emergent behaviors of the cell and to investigate rules of cellular physiology.

## Results

### Model formulation

We developed MiMoSA by integrating an updated version of the genome-scale metabolic model42 (Table S1 for updated reactions) with nutrient diffusion, light diffusion, cell/cell interaction and cell/environment interactions (see Fig. 1) using an agent based modeling framework. We have also implemented the use of multiobjective optimization to account for the dual cellular objective of producing biomass and the metabolite which is transacted between cells (glycogen or β-aspartyl arginine, depending on cell type) with the capability of a full range of exchangeable metabolites that are not part of the objective function. Constraints were imposed on the model as reported previously42 with two notable exceptions. First, the ultimate product of nitrogen fixation was changed from ammonium to β-aspartyl arginine, which is the monomer used to create cyanophycin, a nitrogen storage polymer in T. erythraeum and other diazotrophic cyanobacteria43,44,45. Second, the two major storage polymers, glycogen (modeled as maltose, or two linked glucoses) and cyanophycin (modeled as β-aspartyl arginine), were decoupled from the biomass formation equation so that they could freely accumulate or be metabolized. More detail about the formulation of the model is provided in Methods and Supplemental Text.

### Tracking changing cellular objectives

MiMoSA evaluates the cellular objective for each cell for each time step based on the changing environmental conditions. As an example of this, we have tracked how the Pareto front changes for both photoautotrophic and diazotrophic cells over time (Fig. 2). The front is selected according to Methods: Shifting Scalar Objectives with each point corresponding to a specific scalar arrangement of variables. With increasing time, diazotrophs shift their objective away from biomass toward the production of cyanophycin as carbon becomes more available (Fig. 2A). In contrast, photoautotrophic cells see a maximum production of glycogen at 9 hours after the onset of light and then their productivity decreases (Fig. 2B). It is notable that every cell in the population is performing these decisions in parallel and Fig. 2 is for a single representative cell of each cell type. Cell optimization changed based on environmental conditions and agent rules and the Pareto Fronts representing this behavior in these contexts is visualized in Fig. S1.

### Model validation

In order to test the predictive accuracy of the model, we predicted growth rate for a variety of different light intensities (Fig. 3A) and compared to other published models for T. erythraeum46,47 as well as other experimentally measured growth rates1,2,4,42,46,48 exhibiting light saturation at higher light intensities. Ultimately, our model is a metabolic model, so it is important that it can also capture the metabolic changes that occur in response to changes in the environment. Therefore, we compared predictions of metabolite accumulation, representing part of the major changes to data collected in our laboratory for growth in different light intensities (see Fig. 3B). The model was trained on data collected in 100 μE light and was validated with data collect in 50 μE light over a twelve-hour light period.

### Cells alter their microenvironment

An advantage of the modeling approach we have developed is that we can track nutrients in the environment. Carbon dioxide (CO2) is typically the limiting substrate in aquatic photosynthetic growth due to low ambient concentrations and low solubility; for ambient CO2, Henry’s law defines an equilibrium concentration of 2.3 μM in the ocean. It is well known that photosynthetic microorganisms use carbon concentrating mechanisms (CCM) to concentrate CO2 near the carbon fixing enzyme, ribulose-1,5- bisphosphate carboxlyase/oxygenase (RuBisCO) to overcome low selectivity49; our simulations imply that cells also increase the local concentration of CO2 immediately surrounding the cell (Fig. S2A) and the release of nitrogen to the media including at more frequent time steps (Fig. S2B). The simulation covers 150 cells and 10 filaments in a model 0.625 mm3 environment, corresponding to a filament density of 16 × 106 trichomes m−3, well within the in situ ranges of free trichome density50. This illustrates that the simulation corresponds well quantitatively to realistic local environments. At the end of our simulation, the cells on average can create a microenvironment that is roughly 2 fold higher in CO2 than the surrounding ocean. By looking at flux through major pathways, it appears that the CO2 is derived from high fluxes through the oxidative PPP and TCA Cycle in diazotrophic cells (Fig. 4).

### Modeling a heterogeneous cell population

One of the main advantages of this new modeling approach is that individual cells can be tracked in space and time so the heterogeneity of the population can be quantified (in terms of metabolic flux distributions). As an example, we tracked 150 cells over a 12-hour time period with time steps of 6 seconds which results in a total of 18,000 metabolic flux maps. Since these are an overwhelming amount of data to visualize, we have chosen to focus on a few representative flux maps (see Fig. 4). In the left column, we track how the ammonium composition of the environment surrounding the cells changes with time from the initial seeding of cells at 0 hours to the middle of the daytime period (6 hours) to right before the onset of night (12 hours). These panels depict the release of ammonium into the environment as time progresses, and it is higher in areas where the cell density is highest. This agrees well with in situ data which report that T. erythraeum leaks 30–50% of the nitrogen it fixes41; our simulations predict that approximately 20% of the nitrogen fixed by the community is excreted into the medium. It is also important to note that the majority of ammonium is released by the cells in the second half of the day; during the first 6 hours, the cells release ammonium to a total of 2.56 μM midday compared to 9.22 μM concentration of ammonium at the end of the day. Again, this agrees with previous literature reports that the rate of nitrogen fixation peaks at midday51, therefore we would expect more secretion of ammonium after peak nitrogenase activity. Select flux maps of cells growing in areas of low ammonium (top), medium ammonium (middle) and high ammonium (bottom) are depicted in the middle column of Fig. 4. At the beginning of the simulations, cells are seeded in an environment that is identical to the defined marine medium YBC-II and because of this, they have identical flux maps as shown by the distribution graph in the right column. At time 0, we have a bimodal distribution because there are two cell types: photoautotrophic and diazotrophic. Photoautotrophic cells have high flux through the Calvin Cycle and the diazotrophic cells are operating in a more respiratory mode, with high flux through both the oxidative pentose phosphate pathway (PPP) and tricarboxylic acid (TCA) Cycle. As the cells grow and start to experience more heterogeneity in their environment, they respond by differentiating their metabolism within the filament (Fig. S3). First, this is evident in the frequency distribution plot, where they are both diverging in terms of total metabolic flux distributions and moving toward achieving optimal flux in terms of the objective function for both t = 6 hours and t = 12 hours. By comparing the changes that occurs in metabolic flux between areas of low, medium, and high ammonium, we can learn a few things about cellular physiology. In all cases, photoautotrophic cells have high flux through the Calvin cycle and an incomplete TCA Cycle, which has been widely reported in cyanobacteria grown phototrophically52. In the case of T. erythraeum, succinic semialdehyde is derived from the nitrogen storage compound cyanophycin and is fed into the TCA Cycle to support the production of biomass precursors and glycogen (through gluconeogenesis). When external ammonium is high, photoautotrophic cells have less flux to glycogen, presumably because they do not need to provide as much to the diazotrophic cells to obtain fixed nitrogen in return. Investigations into imbalances in both metabolites and relative cell quantity display mechanisms of ammonium loss to the environment. Figure S4A illustrates how a lack of glycogen flux results in a higher loss of ammonium (with the exceptions of recently divided cells which metabolize glycogen with high ammonium loss) while Fig. S4B visualizes a clear minimum ammonium release in the recorded range of percent diazotrophs per filament (between 15 and 30%). Diazotrophic cells have high flux through both the oxidative PPP and the TCA cycle which still utilizing carbon fixation reactions such as Ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) and phosphoenolpyruvate (PEP) carboxylase and carbon conserving reactions like the glyoxylate shunt. Flux through the glyoxylate shunt increases as the availability of ammonium increases outside the cell, which is likely in response to the lower glycogen transfer from the photoautotrophs.

### Elucidating rules of cell physiology

A key feature of agent-based modeling is the ability to model emergent behaviors in populations. We do not know all the rules of behaviors that define T. erythraeum a priori but by comparing simulations to observed in situ data and iterative improvement of the model, some rules can be elucidated. One trait that is widely variable in nature is filament length. It has been widely accepted that the average filament length is 100 cells53 but more recent studies have suggested that they are typically much shorter, with a geometric mean of 13.2 ± 2.3 cells per filament, but with a mean range of 1.2 to 685 cells per filament in situ54. Conditions for in situ sampling are widely variable so we hypothesized that filament length plays a role in maintaining growth in different environments: low light, low CO2 and low N2. We used the model to investigate which conditions might favor shorter or longer filaments (Fig. 5). For each simulation, 150 total cells were seeded but in different trichome lengths (10, 30, 75, and 150 cells/filament) with and a ratio of diazotrophs to photoautotrophs of 3:7. In terms of growth rate, across all conditions we tested the shorter filaments had faster growth. This implies that diffusional limitations of nutrients into the cell and metabolites within the filament between different cell types start to hamper growth rate at longer filament lengths. The relative decline in growth rate is less dramatic for 25 μE when comparing across filament length, but when compared to other light conditions, there is a dramatic drop in growth rate for shorter filaments at low light. This indicates that longer filaments are capable of compensating for less light better than shorter filaments, perhaps due to increased surface area. Next, we examined the effect of filament length on cyanophycin composition for the same growth conditions as above. In every condition except low nitrogen, filaments with 75 cells appear to have more cells with above average cyanophycin content than other filaments lengths. Smaller nitrogen compounds (NH4+, amino acids, urea, etc.) can theoretically be used to support growth, permitting cyanophycin to be a longer-term storage compound. This is a possible explanation for the increase of cyanophycin in longer filaments. As filaments are longer, diffusive limitations become more pronounced, meaning that nitrogen gradients will remain in nitrogen replete cells longer and will be remade into cyanophycin as opposed to being metabolized for growth. This makes intuitive sense: not only is there a final drop-off at 150 cells, the distribution of cyanophycin content within the cells becomes larger, suggesting that some cells are starved for nitrogen and some are nitrogen replete. It is probable that filaments have adapted to leverage diffusion to both sequester nitrogen and to mitigate futile cycling of carbon and nitrogen compounds when diatomic nitrogen is available. The pattern of cyanophycin content diverges for cells in nitrogen limited environments due to overall shortages of nitrogen within the filament.

Finally, we investigated how glycogen content of cells changes due to filament length. The first pattern to note is that as length increases, the heterogeneity of the filament in terms of glycogen content also increases. This illustrates the importance of tracking individual cells because they are experiencing different environments and responding in different ways. Longer filaments also appear to be able to maintain glycogen content more readily than shorter filaments in all stress conditions we tested. Finally, nitrogen limited growth results in increased glycogen content as seen in other cyanobacteria55. It appears that longer filaments in N limited growth can accumulate more carbon, perhaps again due to higher surface area and hence more energy from light harvesting. Our simulations agree well with published studies; it has been reported that growth rate and light intensity are both inversely correlated to filament length56. These data indicate that filament length is largely determined by external cues rather than genetically.

## Discussion

MiMoSA enables the most detailed and accurate metabolic modeling of complex systems to date by allowing coupling of several different mathematical formalisms describing natural phenomena, behavioral rules, and metabolism into a multi-scale multi-paradigm model. In constructing MiMoSA, we have added several features to enable us to more accurately predict phenotypes. A key feature of MiMoSA is the use of a multi-objective optimization approach. Unlike fast growing bacteria, which have successfully been modeled using a single objective function of maximum biomass15, slow growing organisms have more complex objectives. In our simulations, T. erythraeum cells must achieve a delicate balance between biomass formation and the production of either glycogen or cyanophycin due to the symbiotic relationship between two cell types in the same filament. Photoautotrophs cannot function optimally without a biologically available form of nitrogen from the diazotrophs and the diazotrophs cannot support their metabolism without reduced carbon from the photoautotrophic cells. The use of multi-objective optimization allows us to describe this trade-off more accurately and by calculating the Pareto Front a priori we can also reduce computational effort. We have also accounted for changes in biomass composition that occur in response to changes in the environment or as a result of building carbon and nitrogen reserves during the day by decoupling the biomass equation. This allows the model to respond more fluidly to changes in the environment, which more closely mimics what cells experience in nature; for example, macro- and micro-nutrient stresses have been well known to cause changes in metabolism such as lipid and carbon accumulation57,58,59,60,61,62. As such, the inclusion of metabolite and nutrient diffusion to augment metabolic optimization is a critical aspect of the model.

The influences of nutrient and energy availability in conjunction with population characteristics were studied to determine community and cellular adaptations to environmental perturbations. The model allows us to quantify the changes in the microenvironment around the cell compared to the bulk properties of the environment (Fig. S2A) as well as to see how these changes affect the distribution of carbon and nitrogen inside the cell (Fig. 4). These can be supplemented with “zooming in” on specific time steps to enhance investigation to rapidly occurring phenomena (Fig. S2B). Not only did our predicted growth rates quantitatively match the experimental data, it was better able to capture effect of light saturation on growth rate; light intensities above 100 μE have little to no effect on growth rate47,56,63,64,65. Our simulations agree well with the experimental data, however, there are differences that can be explained by the differences between our experimental conditions and our simulations. The main difference being the effect of diurnal light; T. erythraeum will not grow without diurnal day/night patterns, therefore the experimental data were collected from cells that were grown in 12 h:12 h day/night cycles but the model is for a single 12-hour day time period. The addition of diurnal light patterns in future iterations of this model will help to improve the light dependent growth phenotype. Even so, the model is able to visualize community coordination and development during the 12 hour light period, exhibiting the increased release of ammonium to the media in the afternoon, consistent with the observation that nitrogenase activity peaks midday51. Moreover, the individualized resolution of metabolic optimization can probe the nuances of intercellular, intracellular, and cell-environment interactions. Analysis of metabolic flux reveals a spontaneous partial/linear TCA Cycle in photoautotrophic cells consistent with previous reports52. Cells also naturally coordinate to provide glycogen and cyanophycin transfer between cells, yielding oxidative behavior in diazotrophic cells through glycolysis with the possible side effect of oxygen consumption as a mechanism to protect nitrogenase as suggested in experimentation64. Meanwhile, photoautotrophs naturally perform reductive carbon fixation coupled with utilization of the lower TCA Cycle to degrade arginine. These metabolic functions are affected by extracellular forces which are integrated into this model. For example, high ammonium environments result in declining gluconeogenesis in photoautotrophs (12 hours in Fig. 4), likely since these cells are energetically limited and use cyanophycin as an energy source instead of light. Diazotrophs are prone to these environmental cues as well as low ammonium environments enhance light TCA cycle to enhance recycling of amino acid byproducts from a lack of nitrogen. By integrating modeling of other phenomenon with constraints based metabolic models, we were able to simulate T. erythraeum cultures that more accurately represent both in situ and laboratory data.

One of the many advantages of using this multi-paradigm framework is that we can simulate emergent behavior of a population. In situ data report a wide mean range of trichome length from 1.2 to 685 cells54; we used the model to investigate possible causes because this is a difficult phenotype to investigate experimentally. Our simulations suggest that even though longer filaments suffer from diffusional effects that limit growth, they are better able to handle stress (Fig. 5) consistent with literature. Increased surface area in longer filaments minimizes the effect of lower light because the filament can harvest more light per volume. Also, the larger filaments are better able to maintain the average composition of storage compounds despite low carbon or low nitrogen conditions. Therefore, we would expect in areas of nutrient or light stress, the filament length would be longer.

One of the other unusual phenotypes of Trichodesmium that we were able to investigate using MiMoSA was leaking 30–50% of the nitrogen it fixes. Nitrogen fixation is an incredibly energy intensive process, costing the cell 8 ATP per ammonium, so it is not clear why T. erythraeum would excrete 30–50%. Despite using optimization to solve for fluxes, which should minimize energy losses, our simulations predict approximately 20% of the fixed nitrogen is excreted into the medium (Fig. S5) which implies that this is a metabolically driven phenomenon. Further investigation has led us to develop three hypotheses on why this occurs: carbon limitation in diazotrophs, energy limitation in photoautotrophs, and imbalances between photoautotroph: diazotroph ratios. In the first case, photoautotrophs are unable to create glycogen chains and instead must start from a higher energy substrate than carbon dioxide (like succinic-semialdehyde) or must perform glycolysis on arginine derivatives to achieve energetic viability (Fig. S3A). Second, population imbalances cause nitrogen to be produced faster than it can be anabolized into β-aspartyl-arginine chains and is released into the media, meaning there is an optimal ratio of cell types (Fig. S3B). It is also possible that carbon limited diazotrophs are unable to manufacture full β-aspartyl-arginine chains and proton imbalances require ammonium release to the medium instead of passage to surrounding photoautotrophs.

MiMoSA enables the tracking of cellular-level environmental changes and the impact that they have on a metabolic model, opening the door to more accurate modeling of multi-cellular systems and the in silico investigation of the complex interactions between different cell types within an organism, and different species in a community. This is the first report of a metabolic model that integrates nutrient and light diffusion, cell/cell interactions and cell/environment interactions and we have used it to accurately predict growth, cellular composition and to investigate the unique physiology of T. erythraeum, which has filaments of both diazotrophs and photoautotrophs in close proximity. It establishes that this organism can effectively adapt to different conditions at three levels: the genetic level through division of labor in separate cell types, the metabolic level through relatively open-ended metabolic capabilities as well as further division within types, and at the population level to harness diffusional and physical interactions with the environment. MiMoSA is also a readily adaptable modeling framework – the addition of additional species to the model only requires the availability of a genome-scale metabolic and a few rules of behavior to be added. While we focused the proof-of-concept study of T. erythraeum, MiMoSA is a modeling framework that can be used to model a variety of more complex systems including applications in ecology, human health and metabolic engineering.

## Materials/Methods

### Cell culture conditions

Cells were grown as described previously42. Trichodesmium erythraeum IMS101 cells were acquired from the Bigelow Laboratory for Ocean Sciences (East Boothbay, ME, USA). Cells were grown in a New Brunswick (Hamburg, Germany) with 100 and 50 μE in 12 h light/12 h dark cycles. Cells were grown in artificial seawater YBC-II medium66 at pH 8.15–8.20. CO2 was maintained at atmospheric concentration. All chemicals were obtained from Sigma-Aldrich (St. Louis, MO). Growth rate was monitored by measuring chlorophyll absorbance67 from 50 mL of culture every two days. Cyanophycin and glycogen were measured every four hours from the beginning of the light cycle (9 AM) to its end (9 PM). Total biomass mass was determined by dry weight analysis, cells were filtered with a Whatman 0.22 μm cellulose-nitrate filter and dried overnight at 100 °C.

### Biomass quantification

Carbohydrates were measured colorimetrically using the anthrone method68 against glycogen as a standard. Cyanophycin was extracted by disrupting 740 μL of 250 mL cells concentrated to 2 mL via filtration and rinsing with TE buffer with 2.70 mg/mL lysozyme overnight at 37 °C, centrifuging at 16,100 × G for 5 min, and resuspending the pellet in 1 mL of 0.1 M HCl (in which cyanophycin is soluble) for 2 h. The extraction was repeated on the pellet, the supernatant fractions were combined, and cyanophycin was quantified colorimetrically using the Sakaguchi reaction69.

### Mass balance constraints

Constraints based metabolic models are based on mass balances, therefore it is imperative that we develop accurate accounting of each element. Therefore, we used training data (Table S2) to estimate normal cellular consumption (Table S3). Average objective fluxes were estimated using mass balances around biomass and metabolite production with the formulation given in SI: I.C Estimating Mass Balance Constraints. Note that these formed training objectives: uptakes and consumption scale with nutrient molar contents around and within cells.

### Development of agent based model

Repast Simphony70 in Java was used as the agent-based modeling framework in which differentiated multi-objective metabolic models of Trichodesmium erythraeum are contained. It contains three agent types – Ocean, Cells, and Filaments. The Cells agent contains two sub-agents representing each cell type: photoautotrophs and diazotrophs and is responsible for intracellular processes and decisions. The Ocean agent defines and calculates the extracellular environment and the Filaments agent organizes the Cells and modulates their transactions.

#### Cells

Cell agents (cells) are generated for each individual cell in the model. These contain two subtypes, photoautotrophs and diazotrophs, but contain several consistent elements between the two. Simulation variables are summarized in Table S4. All cells reproduce according to the same rules: cells divide according to sampling from the weighting distribution described above if that sample is bigger than the cell mean cell, cells only extend from the ends, and cells can only divide into diazotrophs if there is a diazocyte under development (decided at the filament level if the filament is nitrogen limited). When a cell is large enough, it converts to fully stationary growth, producing only metabolites and creating a larger and larger metabolic gradient between cells without de novo biomass synthesis. This prevents a cell from becoming excessively large in the center of the filament. Cells will die if they cannot produce the requisite maintenance ATP through metabolism or catabolism.

Cells allow metabolites to diffuse through the lipid bilayer using permeabilities reported in the literature (Table S5). This mechanism represents a non-zero leakage scenario that was nevertheless much slower than intrafilamental diffusion (Table S6). Scavenging from the environment for compounds which carried no evidence of active transport followed these same rules and was therefore prone to concentration gradients. Active transporters, on the other hand, allowed the cell to uptake whatever concentration of compound was necessary subject to its presence in the local ocean grid. Allowable exchange of metabolites between cells is illustrated in Fig. 6. If several cells compete in that grid space, access to the available molecule was divided equally among those cells.

#### Subclasses: photoautotrophs and diazotrophs

Both subclasses define the uptake constraints and send to a Python file that decides whether the cell metabolizes or catabolizes based on those constraints using the multi-objective metabolic model previously described (see Supplemental Information: Routine Metabolic Optimizations). The cell then updates its internal metabolites based on the optimization results, diffuses metabolites, divides if possible, and uptakes from its local environment. Model bounds are calculated using local concentrations to calculate maximum flux bounds excepting β-aspartyl arginine which is further limited to 8% of available nutrients (See Fig. S6). These methods are handled by three ScheduledMethods that Repast Simphony schedules in specific progression. Together with the Ocean Agent’s updates, the individual cell actions (as dictated by the metabolic model) form the core of the simulation. A more detailed flow chart for cell decision making can be found in Fig. S7. Progression through these steps is identical for cell types, but the metabolic static variables (objectives, gas uptake, etc.) are different between the two subclasses, necessitating separate methods.

#### Ocean

The ocean agent is responsible for tracking cells and modeling external nutrients. Its main task is facilitating diffusion between cells and locations as well as approximating an uptake radius for cells. Each ocean represents a uniform, static, abstract area of the overall grid space with a uniform dimension space of δ × δ where δ is a user defined parameter. This set of simulations was conducted with time steps of 0.1 hours as a moderate value between diffusion phenomena (on the order of seconds along the length of a filament) and doubling time (on the order of 50 hours). Metabolites are assumed to freely diffuse in a dilute seawater environment between cell filaments (Table S6) and assumed to be uniform within the grid, given the relatively long time step compared to the rate of diffusion over such small dimensions. If the impacts of metabolic diffusion limitations were of interest, the time step within the framework could be made appropriately small to more accurately track metabolites, at cost of increased computational burden. Each ocean gridcell diffuses molecules into its adjacent ocean gridcells assuming discretized slab diffusion in two dimensions. This is done using a previously developed discrete algorithm for diffusion in a grid71:

$$\Delta \bar{f}({x}_{i})=A\mathop{\sum }\limits_{j=1}^{n}\,(f({x}_{i}^{j})-f({x}_{i})){e}^{-{d}_{j}^{2}/\eta }$$
(1)
$${d}_{j}=|{x}_{i}-{x}_{i}^{j}|$$
(2)
$$A\mathop{\sum }\limits_{k=1}^{n}\,{e}^{-{d}_{j}^{2}/\eta }=1$$
(3)

where $$\Delta \bar{f}({x}_{i})$$ is the change in concentration of metabolite in grid space xi over a time step, A is the normalization constant to be calculated by solving the third equation within the entire neighborhood to ensure conservation of mass within the neighborhood, dj is the distance between grid space xi and its grid neighbor xij and η is the diffusivity control of the system over the time step. As η → 0 diffusion halts and as η → ∞ diffusion becomes instantaneous. In this study, $$\eta =4 {\cal{D}} \Delta t$$ as in the original Fick’s Law.

Diffusion is calculated using two steps, one forward and one reversing the order of gridspace calculation, to mitigate the effect of order on estimating the concentration gradient (Fig. S8). Excess ammonium is secreted into the environment using a membrane diffusion coefficient. Cells are allowed to uptake any metabolite/nutrient in YBC-II medium; the only extracellular products allowed in simulations are small molecules, such as CO2 and NH4+ which diffuse through the membrane, as well as compounds that have experimental evidence of transporters from proteomic analysis or transcriptomic analysis (estimated using membrane diffusion outwardly and free diffusion for gases or active transporters for ions/molecules inwardly) in Table S772,73.

The Ocean Agents also manage diffusion of metabolites from marine sinks and through the gas-liquid surface interface with the atmosphere. This is done assuming equilibrium concentrations of dissolved gases defined by Henry’s Law and mono-directional slab diffusion for CO2, O2, and N2. Table S6 lists the free diffusivities of compounds and Table S8 lists the Henry’s Constants for atmospheric compounds, and Fig. S8 demonstrates the movement of diffusive molecules through the simulation.

Furthermore, light diffusion to cells is defined as a function of their y coordinate according to the equation:

$$I={I}_{0}{e}^{-ky}$$
(4)

where I is light intensity, k is the extinction coefficient of light in seawater, and y is the depth below the surface of the individual cell.

#### Filaments

Filament Agents are responsible for organizing cells, managing movement, splitting to promote diazotroph development, and defining cell type after division. Random walk movement (to simulate the lack of control cells have over lateral motion) is simulated by generating a random direction that has an empty grid space for every cell in the filament. Cells move within a user defined interval of time or if their growth is impeded by another filament, in which case growth is halted until the cells move away from each other. The filament forces splitting into two separate filaments when nitrogen is limiting growth and neither filament end is undergoing diazotroph development (meaning that another diazocyte is required). Filament Agents decide the next cell type using this inequality:

$$\frac{{\sum }_{i}\,{\varepsilon }_{i}^{DZ}}{{n}_{DZ}} > \frac{{\sum }_{i}\,{\varepsilon }_{i}^{PA}}{{n}_{PA}}$$
(5)

where ɛ is the Pareto Efficiency of the given cell type and n is the quantity of that cell type in the filament. The Pareto Efficiency is quantified as the sum of the objective fluxes divided by their Pareto Optimum (from experimental results) divided by the number of objectives.

$${\varepsilon }_{i}^{c}=\sum _{j}\,\frac{{\nu }_{j}}{{\nu }_{j}^{exp}}$$
(6)

If inequality (20) is satisfied, the cell prioritizes diazotroph development, otherwise it prioritizes photoautotroph development. If a diazotroph region is currently under development, the filament adds another cell to that region. If there is no diazotroph under development, or if the C:N ratio becomes higher than physiological bounds, the filament splits to expose a region where diazotroph development may begin. A photoautotroph can be placed at any open site. Since there are two ends on every filament, up to two of these decisions are being made during each simulation time step. After filament splitting, if the split results in a homogenous region of either diazotrophs or photoautotrophs, the missing cell type is preferred. Filaments split in the middle of the longest region of homogenous cells and are prevented from splitting to result in a single cell, meaning that the shortest possible resulting splits are two cells in length. Cell division completes within one time step when metabolites and biomass are equally divided between parent and daughter cell and the filament updates to contain the cell at its end. This decision is a memoryless process conducted each time step. This means that cell division is completely metabolically motivated (which is affected, in turn, by diffusion and physiological processes).

### Parameter estimation

As described previously, to improve the accuracy of simulations, the model was fit to experimental data for cells grown in 100 μE light in YBC-II medium. Maintenance energy, in the form of the ATP hydrolysis reaction, is the main parameter that is adjusted in FBA formulations18,42,74,75,76 to match simulations with growth rate. Since maintenance energy at 100 μE was higher than the energetic capacity of the model for growth at 50 μE, a linear correlation was interpolated from experiments at 100 μE and 80 μE with ATP maintenance flux fit to multiple light intensities without specifically training on 50 μE:

$${\nu }_{ATP}(I)=mI+{\nu }_{0}$$
(7)

where m and ν0 are calculated using the point-slope equation for a linear equation:

$$m=\frac{{\nu }_{ATP,{I}_{1}}-{\nu }_{ATP,{I}_{2}}}{{I}_{1}-{I}_{2}}$$
(8)
$${\nu }_{0}=-\,{I}_{1}m+{\nu }_{ATP,{I}_{1}}$$
(9)

where I1 is 100 μE and I2 is 80 μE. The estimated values of the linear equation are recorded below in Table 1. If the model is unable to satisfy its maintenance demand (through any metabolic process, including catabolizing its own biomass), the cell dies. L0 is the energy required in zero light to maintain the cell without active metabolism.

### Multiobjective optimization

Unlike typical formulations of flux balance analysis13,75,76,77,78,79,80, which use a single objective function to predict fluxes, our model uses multi-objective optimization to more accurately approximate the true objectives of the cell: to optimize biomass while also producing the metabolite they exchange between cell types. The multiobjective scalar function is of the form:

$${r}_{obj}=aX+(1-a)m$$
(10)

where a is a scaling parameter between 0 and 1, X is non-metabolite biomass (proteins, lipids, carbohydrates, nucleic acids, and pigments) and m is the transactional metabolite (β-aspartyl arginine for diazotrophs and maltose for photoautotrophs). The value for a was varied between 0 and 1 over 1000 steps to generate a Pareto Front, optimizing the new objective function subject to the training constraints and ATP maintenance flux. Each point on that front corresponds to an arrangement of scalarized variables. Flux through robj can be then reinterpreted into direct biomass and metabolite fluxes via multiplication of their coefficient by the objective solution:

$$\mu ={\nu }_{X}=a{\nu }_{obj}$$
(11)
$${\nu }_{m}=(1-a){\nu }_{obj}$$
(12)

Visualization of this approach can be seen in Figs 2 and S9. For each point along the Pareto Front, Euclidean distance was used to determine the relative weight of each objective function, which was then used to generate a single, scalarized reaction. Each cell in the simulation calculates its scalar objective function separately during each time step based on its internal constitution and requirements.

#### Shifting scalar objectives

To investigate the mutability of the scalar objective equation, and the effects of intra- and inter-cellular conditions on cell objectives, an algorithm to shift the objective function along the Pareto Curve is implemented based on the biomass of the cells (assuming that structural biomass is relatively intransigent in these conditions). That means that, as biomass accumulates, a cell will behave more stationarily, i.e., it will prioritize secondary metabolites. This is adjusted assuming a normal distribution in cell sizes:

$$f(X) \sim N(\mu ,\sigma )$$
(13)

where μ is the average biomass (assuming cubic shape and density near water) or metabolite concentration (from experimental data) and σ is the standard deviation. As the standard deviation decreases, the model is less tolerant of deviations from mean values and will more readily adjust the scalar weights. As it increases, it will be more tolerant of deviations from mean values and will cause smaller perturbations to scalar weights. For this study, the standard deviation was assumed to be 0.433 so that 95% of cells would fall within a factor of seven from minimum to maximum and a mean of 1.029 × 10−9 g calculated using an assumed cubic shape with dimensions of 10 μm per side and a density of seawater81.

The correction occurs by using the cumulative distribution function for the normal distribution. Given a Pareto matching algorithm resulting set of weights, the weights are shifted (inversely) as:

$${w}_{1}=a{\hat{w}}_{1}$$
(14)
$${\hat{w}}_{1}={\bar{w}}_{1}[1-F({X}_{1}={x}_{1})]$$
(15)
$${w}_{j\ne 1}={w}_{j\ne 1}F({X}_{j\ne 1}={x}_{j\ne 1})$$
(16)
$$a\sum _{i}\,{w}_{i}=1$$
(17)

where w1 is the new, shifted weight, $${\hat{w}}_{1}\,\,$$is the non-normalized and shifted weight, and $${\bar{w}}_{1}$$ is the Pareto matched weight. X is the variable and x is the cell’s quantity. Meaning, in each case, that as the objective increases, it responds by producing the corresponding objective: as biomass increases, it expends more resources on biomass.

### Implementation of mutable objective functions

Previous studies have used static objective functions, where production is consistent during every phase of growth. However, organisms accumulate and digest metabolites during growth and development. To reflect this, we inserted a “mutable” objective function where relative preferences of storage compounds and biomass production can be tailored by the agent based on cell biomass. The scalarized objective equation was thus broken into two main components: storage compounds (cyanophycin modeled as β-aspartyl arginine and glycogen modeled as maltose) and biomass (lipids, proteins, DNA, RNA, chlorophyll, phycoerythrin, etc.). We assumed that biomass remained relatively stable throughout the day while the amount of storage compound was allowed to vary. The scalar weights, or production priorities, were manipulated assuming cells do not grow beyond twice their average cell without dividing: lower biomass prioritizes growth and higher biomass prioritizes vegetative storage compound production. Mathematically, this is modeled such that the scalar objective equation’s biomass coefficient was inversely adjusted by cumulative probability of a cell’s biomass in the distribution. The normal distribution was formulated assuming cubic 10 μm cells with density of water81 as the average mass and a narrow distribution with a standard deviation of 0.433 times the mean size. This value was chosen to promote switch-like bistable behavior between cell phenotypes: either cells are biomass driven (exponential) or they are metabolite driven with combinations of probabilities in between. This is because a single sample of a cell from a distribution of cells would have a probability of 99% to fall between 0 and twice the mean size. The final distribution is:

$$f(X) \sim N(1.029\,ng,\,0.433\,\cdot \,1.029\,ng)$$
(18)

Calculation of new objective coefficients was done by first finding the cumulative probability (z) of another randomly selected cell’s non-metabolite biomass being less than or equal to the objective cell’s biomass at each time point for each cell:

$$z=F(X\le {x}_{i})$$
(19)

This is used to adjust the average, experimentally matched objective coefficient ($${\bar{w}}_{b}$$) for biomass by multiplying that coefficient by the probability of the cell being larger than that size, a value that represents the probabilistic expansion space ($${\epsilon }$$) of the cell:

$$\epsilon =1-z$$
(20)
$${\hat{w}}_{b}=\epsilon \,\cdot \,{\bar{w}}_{b}$$
(21)

Major metabolite coefficients for the scalarized objective equation were also adjusted using this probability, increasing as the cell’s size increased:

$${\hat{w}}_{m}=z\cdot {\bar{w}}_{m}$$
(22)

Finally, the coefficients are normalized such that:

$$a\sum _{i}\,{\hat{w}}_{i}=1$$
(23)

Or:

$$a=\frac{1}{{\sum }_{i}\,{\hat{w}}_{i}}$$
(24)

Which yields final objective coefficients of:

$${w}_{k}=\frac{{\hat{w}}_{k}}{{\sum }_{i}\,{\hat{w}}_{i}}\,\forall \,k\in {\mathscr{O}}$$
(25)

where $${\mathscr{O}}$$ is the set of all objective metabolites in the original scalar equation.

Performance evaluation of the mutable objective function, validation of O the mutable objective function versus the static version, and justification of non-metabolite biomass as the independent objective are provided in SI Methods and Fig. S10.