A Combinatorial Algorithm for Microbial Consortia Synthetic Design

Synthetic biology has boomed since the early 2000s when it started being shown that it was possible to efficiently synthetize compounds of interest in a much more rapid and effective way by using other organisms than those naturally producing them. However, to thus engineer a single organism, often a microbe, to optimise one or a collection of metabolic tasks may lead to difficulties when attempting to obtain a production system that is efficient, or to avoid toxic effects for the recruited microorganism. The idea of using instead a microbial consortium has thus started being developed in the last decade. This was motivated by the fact that such consortia may perform more complicated functions than could single populations and be more robust to environmental fluctuations. Success is however not always guaranteed. In particular, establishing which consortium is best for the production of a given compound or set thereof remains a great challenge. This is the problem we address in this paper. We thus introduce an initial model and a method that enable to propose a consortium to synthetically produce compounds that are either exogenous to it, or are endogenous but where interaction among the species in the consortium could improve the production line.

If. Consider a k-clique K of G. Write E ′ for the set of k 2 edges used in K. It is easy to verify that A ′ = {a v | v ∈ K} ∪{a e | e ∈ E ′ } is a valid solution: starting from the source s, each x v , v ∈ K can be reached using the corresponding arc a v ∈ A ′ . Then for each 1 ≤ i < j ≤ k, pick the edge e = {u, v} having class {i, j} in E ′ : both vertices x u and x v have already been reached, hence the hyperarc a e allows us to reach t i, j . Overall, all targets have been reached with exactly k + k 2 hyperarcs. Only if. Consider a solution A ′ of DSH for (H , S, T ). Write K = {u ∈ V | a u ∈ A ′ } and E ′ = {e ∈ E | a e ∈ A ′ }, that is, the set of vertices and edges of the original graph for which the corresponding hyperarc of H is used in A ′ . We will show that (K, E ′ ) forms a clique. The first observation is that |K| + |E ′ | ≤ k + k 2 , since this is the maximum total weight of the solution. For every 1 ≤ i < j ≤ k, since t i, j ∈ T , A ′ contains an hyperarc ending in t i, j , so E ′ must contain some edge e having class {i, j}. This already shows that |E ′ | ≥ k 2 , which in turn yields |K| ≤ k. Write {u, v} for the endpoints of any e ∈ E ′ . Since x u and x v have in-degree 1 in H , the arcs a u and a v must also belong to A ′ , and u, v ∈ K. Hence all the endpoints of edges in E ′ are in K.
To sum up: E ′ is a set of at least k 2 edges with a total of only k endpoints: they are the edges of a clique of G.
Proposition 2. The problem is NP-hard even when |T | = 1 and A contains only one tentacular hyperarc.
Proof. The reduction is from the Directed Steiner Tree problem. Consider an instance of this problem, i.e. a directed graph

Metabolites removal
When creating the hypergraphs from the reconstructed metabolic networks, common cofactors and co-enzymes were removed. They were identified using the BRITE functional hierarchies of Kegg. The list of all filtered metabolites is given below: This list has been made for general applications. Hence some of the metabolites may not be present in the networks used in the paper.

Graph transformation
The initial networks are obtained using the SBML file of the organisms. Each is modelled as a directed hypergraph (as described in the main text), H = (V, A).
Since we only want to take into account the hyperarcs of spreadness > 0 (i.e. such that |src(a)| > 1), then all the hyperarcs of spreadness 0 (i.e. such that |src(a)| = 1 and |tgt(a)| > 1) are replaced by simple arcs in the network.
For all a ∈ A such that |src(a)| = 1 and |tgt(a)| > 1, a vertex that will play the role of a pseudo-metabolite u a is added to the network. The hyperarc selected is then removed and some arcs are added, one going from the tail vertex (substrate) to the pseudo-metabolite and the others from the pseudo-metabolite to the head vertices (products). To summarise, one hyperarc (the original one) is removed and 1 + |tgt(a)| simple arcs are added: the arc (src(a), u a ) and for each v ∈ tgt(a), an arc (u a , v).
This step is not required when using the ASP solver.

Graph filtering
We call a reaction a a sink reaction if it has no product, hence is such that |src(a)| = 0. Similarly, an import reaction a has no defined substrate (|tgt(a)| = 0). The filtering rules for the merged network are twofold: 1. Filtration of the sink and import reactions; 2. Filtration of the source and target metabolites that are not part of S and T .
The first step removes any arc (reaction) a ∈ A such that |tgt(e)| = 0 or |src(a)| = 0. The second step removes any vertex (metabolite) v ∈ V such that deg + (v) = 0 or deg − (v) = 0. If deg + (v) = 0, then all the outgoing reactions are removed. If deg − (v) = 0, then the entering reactions e such that v ∈ tgt(e) (that is, which have v as product) are removed if |src(a)| = |tgt(a)| = 1. Otherwise the reaction is simplified by removing v from its products (tgt(a) = tgt(a) \ v).
These two steps are repeated until the network is stable (no vertex or arc fits anymore the requirements of the filter). In Supplementary Figure S3a, the hyperarcs of the type S → G + H will be divided into 3 arcs with the introduction of a node u ′ . The three created arcs are (S, u ′ ), (u ′ , G), (u ′ , H ′ ). The filtering step is applied twice. The first time, using rule 2, the vertices A, C, and F are removed, as are the reactions A + S → B and E → F. The reaction B → C + D is simplified into B → D. Using rule 1, the sink reaction of D (D →) will be removed. The second time, according to rule 2, vertex E is deleted as is the reaction S → E. The resulting network can be seen in Supplementary Figure S3b.
In Supplementary Figure S4, vertex I would be removed along with the two reactions I → and F → I. (b) Network after filtering. All arcs kept their weight, S → u ′ has weight w worker but u ′ → G and u ′ → H will have a weight of 0. Figure S3. Toy example, before and after filtering. Here only one network is considered. All arcs have a weight of w worker in S3a. In S3b, all arcs have a weight of w worker except for u ′ → G and u ′ → H that have each a weight of 0.

Graph insertion
As introduced in the main text, once the networks of the members of the consortium (set O w of the workers to be used to synthetically produce the compounds in T ) are obtained, we add to each network the reactions taking place in the other organisms called reference (set O o ). A hyperarc is introduced if it is not present in the original network.

Transition
Transitions are added between all pairs of vertices that represent the same metabolites in different species with a weight of w t .

Pseudo-sources and targets
We do not force the production of T in all organisms, one producer only is necessary. Hence we create pseudo-sources and pseudo-targets. We connect every target t i, j corresponding to the metabolite i of an organism/network j to a pseudo-target t ′ i by an arc ((t i, j ,t ′ i ) with a weight that is negligible compared to the other (regular) weights used (e.g. 10 −6 was applied in the biological examples described in the main text). Reaching t ′ i guarantees that at least one t i, j is reached. The same procedure is applied to the sources.
An example of the steps described above is depicted in Supplementary Figure S4.  Figure S4. Toy example for the transition reactions and addition of pseudo-sources and pseudo-targets.

5/7 3 Application: Alternative Results
We present here the minimum solutions that were not presented in the Figures of the main text.

Antibiotics production
For the antibiotics production, two alternative solutions are represented in Supplementary Figure S5.
Supplementary Figure S5. Representation of two solutions of minimum weights. The circles are compounds. Black hyperarcs are endogenous reactions, that is reactions already present in the organisms forming the consortium, while purple-dashed hyperarcs are the reactions that were inserted. Green arcs represent the transport of pyruvate from Streptomyces cattleya to Methanosarcina barkeri and of L-2-aminoadipate from M. barkeri to S. cattleya. The widths of the arcs are proportional to the assigned weights. Grey dashed arcs represent an alternative path of endogenous reactions in the upper part of glycolysis. Hence, the second solution uses this path instead of the one just below to link β -D-glucose to D-glyceraldehyde 3-phosphate.