Systematic analysis of putative phage-phage interactions on minimum-sized phage cocktails

The application of bacteriophages as antibacterial agents has many benefits in the “post-antibiotic age”. To increase the number of successfully targeted bacterial strains, phage cocktails, instead of a single phage, are commonly formulated. Nevertheless, there is currently no consensus pipeline for phage cocktail development. Thus, although large cocktails increase the spectrum of activity, they could produce side effects such as the mobilization of virulence or antibiotic resistance genes. On the other hand, coinfection (simultaneous infection of one host cell by several phages) might reduce the potential for bacteria to evolve phage resistance, but some antagonistic interactions amongst phages might be detrimental for the outcome of phage cocktail application. With this in mind, we introduce here a new method, which considers the host range and each individual phage-host interaction, to design the phage mixtures that best suppress the target bacteria while minimizing the number of phages to restrict manufacturing costs. Additionally, putative phage-phage interactions in cocktails and phage-bacteria networks are compared as the understanding of the complex interactions amongst bacteriophages could be critical in the development of realistic phage therapy models in the future.

Bacteriophages, or phages, are the viruses of bacteria. They infect and replicate within their host and, subsequently, lyse and kill the bacterial cell to release new phage offspring. This "life" cycle confers phages with an intrinsic ability to act as antimicrobials, making them an interesting approach to overcome the current global antibiotic resistance crisis brought about by widespread antibiotic misuse. Indeed, the application of bacteriophages as therapeutics and biocontrol agents is increasingly gaining popularity, especially as a tool to fight against multiresistant pathogenic bacteria in several fields such as human health, agriculture, and the food industry 1,2 .
Besides their lytic ability, another distinctive feature of phages is their high host specificity. This property would prevent disturbance of the surrounding microbiota when phages are used to remove pathogenic bacteria, which is a significant advantage over antibiotic therapy 3 . Nevertheless, in terms of efficiency, very specific phages (those infecting only a small number of strains) represent a challenge, making it necessary to use mixtures containing different phages with complementing host ranges, the so-called phage cocktails, to increase the number of target strains and become useful for a wider range of applications/patients 4 . Alternatively, a single phage with a broad host range or a polyvalent phage 5 can be used to target multiple strains from a species or even multiple species in a community. However, polyvalent phages frequently show differences in infection efficiency (EOP) on different hosts due to, for instance, differential receptor-phage binding affinity.
From a logistical standpoint, therapeutic regimes involving only one phage would be simpler and potentially less costly. Indeed, phages that are part of a cocktail must be propagated individually, purified, and mixed prior to use, which might result in higher manufacturing costs. On the other hand, cocktails improve the commercial applicability of phage formulations partly due to their wider range of bacterial targets. Thanks to this ability, cocktails have the potential to be used without prior identification of the pathogens involved. Furthermore, phage cocktails may reduce or almost eliminate the chance of selecting phage-resistant bacteria 6 . To achieve this goal, phages in the cocktail should preferably belong to different families, so that strains becoming resistant to one phage can be potentially targeted by another phage in the mixture, as superinfection immunity is not possible between unrelated phages. www.nature.com/scientificreports/ To date, there are no validated guidelines for the design of individual or mixed phage preparations targeting a given pathogen 7 . Therefore, in order to assemble a phage cocktail with optimal therapeutic potential, it is first necessary to identify candidate phages, and then determine their infection profiles against a diverse set of strains spanning the major lineages of the target bacterium. Nonetheless, bacteriophage cocktails should not be random mixtures of different bacterial viruses. For instance, Merabishvili et al. 8 recommended some features to be considered for the selection of phages, which include being strictly virulent, not carrying virulence genes, and exhibiting high killing of the target pathogen, good stability and no transducing ability. In some cases, cocktail design would also involve the choice of bacteriophages with different bacterial cell wall receptor recognition sites 9,10 , thereby minimizing the emergence of resistance mutants. Additionally, phage candidates should exhibit the broadest host range possible against the target bacterium. When necessary, the host range might be broadened by serial passages on different bacterial strains, thus allowing adaptation of the bacteriophages by coevolution with their host 11 . Also, there are specific methods available to isolate phages with broad host range 12 or to broaden the host range through site-directed mutagenesis of the host-range-determining regions (HRDRs) in the phage tail fiber protein 13 . Likewise, synthetic biology can be used to modify the phage host range by engineering phage genomes through modular swapping of phage tail components 14 .
In some cases, it is necessary to target several bacterial species with one formulation. This application would entail the design of even more complex cocktails, containing mixtures of bacteriophages active against each bacterial species. In this case, cocktail design is mainly based on the microbial composition and the frequency of occurrence of different bacterial species in the target infection site. Examples of complex cocktails include the commercial products Pyo-and Intestibacteriophage 15 .
Phage cocktail design must take into account not only interactions between phages and their hosts but also phage-phage interactions. The most well-known example of a phage-phage antagonistic relationship is the superinfection immunity mediated by temperate phages. Indeed, bacteria carrying a prophage are typically resistant to infection by related phages 16 . Moreover, phage satellites or pathogenicity islands can limit propagation of their helper phage, which confers some level of phage resistance to host bacteria 17 . Because of these and other interactions, the host range of a phage cocktail could be different from that predicted theoretically by combining the host range of each individual phage. Indeed, some phages might interact upon coinfection 18 leading to synergetic or antagonistic effects 19 . For example, some cocktails show a narrower host range than the sum of the host ranges of all individual phages independently 20 . This could be due to host infection competition between phages, i.e., coinfection of the same host cell by multiple phages may lead to increased competition for limited cellular resources, thus reducing the fitness of an individual phage 21 . Antagonistic interactions might also occur when phages share the same receptor sites, or due to abortive infection mechanisms 22 . In turn, synergistic interactions between phages may be related to the rate of infection, the production of progeny, or the time between infection and progeny release 23 . More recently, a phage-phage communication system, arbitrium, that allows cross talk among close phages to regulate the lytic/lysogenic life cycles was described by Stokar-Avihail et al. 24 . Interestingly, this system may allow interactions with more genetically distant phages.
Overall, it seems clear that optimizing the number and combination of phages will be critical for designing effective therapeutic and biocontrol strategies. In a previous work 25 , the information provided by host range matrices was analyzed by building phage-bacteria infection networks (PBINs) and calculating the nestedness temperature (host range hierarchy of the phages). Subsequently, an estimator of phage cocktail size (Φ) was proposed considering some global properties of the host range matrices such as fill (fraction of successful infections), nestedness temperature, and number of bacteria. Here, we explore a new method, which considers the host range and each individual phage-host interaction, to design the phage mixtures that best suppress the target pathogens while minimizing the required number of phages.

Results and discussion
Designing phage cocktails from phage-bacteria infection networks (PBINs). In a recent work 25 , a pipeline for designing phage cocktails based on global properties (fill, temperature, and number of bacteria) of the PBINs was proposed. However, host range matrices needed to be processed by two different sorting algorithms (BinMatNest and Ward) and the phage cocktail was not automatically generated (Fig. 1a). Here, we have developed a new approach (see Methods) that considers individually the host range of every phage, designs cocktails using only one sorting algorithm and automatically provides the Minimum Cocktail Size (MCS) (Fig. 1b).
A total of 50 datasets that include bacteria and phages from different sources, such as seafood, plants, livestock, dairy, sewage, clinical isolates and laboratory collection strains (Table 1) were used to build the PBINs and calculate the Expected Importance (EI) of every bacterium and phage (see Methods) in order to design the corresponding phage cocktails. Whereas most datasets (42 out of 50) comprised a single bacterial species, matrices harboring 2, 5 and 15 species were also surveyed ( Table 1). The most frequent species was Escherichia coli, which appeared on 18 matrices, followed by Pseudomonas aeruginosa and Staphylococcus aureus, which appeared on four and three datasets, respectively. However, most bacterial species (43 out of 52) were present in a single dataset. In total, 2,877 bacterial strains, 899 phages and 52,688 phage-host interactions were evaluated. Subsequent processing of the PBINs led to the construction of subnetworks named Phage Cocktail Networks (PCNs), harboring all bacterial strains susceptible to phage infection as well as the phages contained in a hypothetical cocktail (Fig. 1b). Therefore, a processed network displays two types of source nodes (unselected vs. cocktail phages), two types of target nodes (resistant vs. susceptible bacteria) and two types of edges (unselected phages-bacteria vs. cocktail phages-bacteria interactions). Selection of the phages constituting the candidate cocktails was performed by applying both heuristic (Network Metrics) and exhaustive (Exhaustive Search) algorithms from the Cytoscape application PhageCocktail 26 (freely available at https:// apps. cytos cape. org/ apps/ phage cockt ail), seeking the PCN that maximized the expected number of lysed bacteria while keeping a MCS. www.nature.com/scientificreports/ The heuristic algorithm employs EI values and considers phages with a broad host range (high EIp values) as better candidates for the cocktail, and bacteria susceptible to be infected by fewer phages (low EI b values) as more problematic. Contrastingly, the exhaustive algorithm evaluates all the possible phage combinations, which warrants the identification of the smallest theoretical cocktail. However, this algorithm imposes a computational runtime penalty and may take several hours to execute 26 when the number of phages is large. For instance, one matrix (named Korf et al. in Table 1) needed to be processed overnight. It must be highlighted that, even though alternative algorithms, such as agglomerative hierarchical clustering of phages by their host range, are faster, the exhaustive algorithm employed in this work warrants the best results (Díaz-Galián et al., submitted), i.e., the MCS infecting the highest number of bacteria. Since Exhaustive Search outperforms Network Metrics, yielding smaller cocktails for the same number of infected bacteria (data not shown), we selected the PCNs obtained with the latter algorithm for further evaluation (Fig. 2). To compare both algorithms the expected importance (EI) of each node, determined by the Network Metrics algorithm (see Methods), is also shown. It is noteworthy that the Exhaustive Search preferably selected generalist phages displaying high EI values (Fig. 2). Although the current implementation of Exhaustive Search in PhageCocktail 1.1 is limited to combinations comprising up to 12 phages, this did not represent a problem in our search given that the size of the largest candidate phage cocktail predicted by the algorithm was 11.
The complexity of a bipartite network depends on its size, symmetry, and fill (fraction of the maximum number of edges). For instance, square matrices result in more complex networks than similarly sized asymmetric networks because the maximum number of edges (possible phage-host interactions) increases. Additionally, the maximum algorithmic complexity 27 of a PBIN is found for intermediate fill values because broad-host-range phages simplify cocktail design and any phage with an EI p of 100 might be able to infect every host in a dataset. With this in mind, we sought to further characterize the PBINs by comparing the number of bacteria, phages and fill of each matrix (Fig. 3a). Most of the PBINs (80%) comprised more bacteria than phages, and the number of nodes (phages plus bacteria) ranged between 10 and 906. Large matrices were lightly filled suggesting a weak negative correlation between matrix size and fill (see below). Two outgroups harbored a significantly higher than average number of phages (166 vs. 18) and bacteria (896 vs. 58). These results showed the lack of consensus regarding the number of bacteria needed for host range determination, which coincides with the data obtained from a recent survey 28 that found very little agreement in this regard between researchers (most answers stretched between 20 and 100 but one reached 800). Similarly, bias in data gathering, such as not considering narrow-hostrange phages, might skew the fill of the PBINs.   (Fig. 3b) shows that most of the PCNs were indeed single phage formulations, followed by 2-and 3-phage cocktails, whereas larger (4, 5, 7, 8 and 11) cocktail sizes were less common. On the other hand, the number of bacteria in PCNs with a low MCS tended to be smaller. Thus, the minimum number of bacteria in large cocktails (containing four or more phages) was 10, whereas 19% of the PCNs corresponding to smaller cocktails contained nine or fewer bacterial strains. In a previous work 25 , the size of candidate phage cocktails was estimated using a metric (Φ) that considers properties of PBINs such as nestedness temperature, fill and the number of target bacteria. However, Φ does neither evaluate individual phage-host interactions nor provide the minimum possible cocktail size. Comparison of both estimators, MCS and Φ, revealed (Fig. 3c) that Φ was larger than MCS for half (49%) of the PBINs, while being identical for most of the other half (37% of the total). Strikingly, Φ was smaller than MCS for the remaining PBINs (14%) most of which were fairly large networks. Remarkably, Φ takes into account the ecological and evolutionary information provided by the structure of PBINs, but it does not intend to achieve the minimum possible cocktail size. Also, Φ can be even larger than the number of phages assayed 25 suggesting that additional phages should be isolated to perform effective biocontrol of the bacterial population. Conversely, the Exhaustive Search algorithm used to determine MCS examines individual phage-host interactions, thereby providing the optimal theoretical cocktail, rather than global properties of the network, and the expected success of the cocktails can be gauged by comparing PBINs and PCNs.
For fully successful cocktails, all the bacteria in a PBIN should also be part of the final PCN. With this in mind, the expected cocktail efficacy was estimated as the fraction (%) of bacteria susceptible to at least one phage in the cocktail by calculating the ratio of bacteria in every PCN/PBIN pair (Fig. 3d). These calculations revealed that 26% of the PCNs comprised all the bacterial strains from the original PBINs, which corresponds to the fraction (13 out of 50) of fully successful phage cocktails. While the fraction of bacteria lysed is generally high for all cocktail sizes (the median of all datasets is 74%), the variability of the expected efficacy decreases as MCS increases. This, however, is likely due to the asymmetric distribution of cocktail sizes (Fig. 3b). Interestingly, large phage cocktails did not exhibit low efficacy values (Fig. 3d) evincing that the algorithms used to design the cocktails do not spuriously increase the MCS. In the case of the heuristic algorithm, MCS is determined by favoring generalist phages with broad host range over specialists with narrow host range. In this regard, several experiments have shown that host range is a highly evolvable trait that relies on diversity, density and quality of hosts 29 . However, although host range expansion seems advantageous for phages, generalism may come with some tradeoffs such as a lower propagation rate 30 . Taking this into consideration, it would be interesting to develop alternative strategies for designing phage cocktails that maximize phage productivity, for example by selecting phages with large burst sizes and/or short latent periods, rather than those that minimize cocktail size.

Phage-phage interactions: PBINs vs. PCNs.
Although all the datasets evaluated in this work comprise experimental data, the actual performance "in vivo"of the phage cocktails designed here might be affected by phage-phage interactions that could hamper or facilitate coinfection (simultaneous infection of one host cell by several phages). To evaluate these putative interactions, PCNs were grouped by their MCS and then used to calculate the percentage of bacteria lysed by a given number of phages in all cocktails of the same size (Fig. 4a). This analysis revealed the intensity and frequency of phage-phage interactions for different cocktail sizes. Notably, the bulk of the bacterial hosts were lysed by a single phage, even in large cocktail networks, and only in the group comprising 3-phage cocktails there were more bacteria lysed by two phages than by a single one. Additionally, no bacterial host was lysed by 7, 9, 10 or 11 phages in the PCNs containing 7, 8 or 11 phages. These results suggest that, even though search algorithms preferably select generalist phages, the nestedness of the PCNs might be lower than that of the original PBINs 25 . Consequently, despite the fact that phages in nature span the continuum from specialist to generalist 31 , cocktail design might alter this equilibrium by favoring the selection of broadhost-range phages.
To further compare the structure of PBINs and PCNs, we studied the redundancy variation (rv) of phage coinfections. The redundancy of phage-host interactions (r), defined as the fraction of phages lysing a given bacterial strain in a network (see Methods), can increase, decrease or remain constant when comparing PBINs and PCNs (see examples in Fig. 4b). In turn, rv allows comparison of the redundancy found in a PCN compared www.nature.com/scientificreports/ to that of the original PBIN. This parameter ranges between a minimum (rv min ), when only one phage in the cocktail infects the host, and a maximum (rv max ), when all the phages lysing the host are considered. However, for single phage formulations rv = rv min = rv max . Consequently, rv can never decrease (Fig. 4b) and its value depends on how many phages infect the host in the PBIN (see Methods). After grouping the PCNs by MCS the redundancy change (rv) of the different bacterial strains was determined as log 10 (rPCN/rPBIN) (Fig. 4c). It can be perceived that the median values of rv increased (up to 2.5-fold) for all but single phage cocktails. This result implies that most bacterial strains were lysed by all the phages in the PBINs corresponding to single phage cocktails. For the rest of the networks there was a fraction of bacteria showing negative rv values while others exhibited an increase of up to 35 times (1.5 log). However, rv did not reach 0.8 in large (7, 8 and 11-phage) cocktails, i.e., redundancy increased less than 6.3 times. Concurrently, a paired samples t-test revealed (data not shown) a statistically significant 0.17 log difference between the means of rv and rv min and a 0.27 log difference between rv and rv max , evincing the asymmetric effect of phage cocktails on rv. Remarkably, when rv values were grouped by the number of lysing phages instead of the MCS (Fig. 4d), the variability observed for rv decreased as the number of lysing phages increased. This result is partially explained by the low number of bacteria susceptible to a high number of phages in the cocktails, but it also reflects that these host strains are susceptible to be infected by a high number of phages both in the PBINs and PCNs. Conversely, bacteria lysed by a single phage in the cocktails were heterogeneous regarding their susceptibility to the other phages in their respective PBINs. Thus, some host strains susceptible to a single phage in the cocktails can be infected by multiple phages in the corresponding PBIN. www.nature.com/scientificreports/ Recently, Niu et al. 19 found different combinations of phages in cocktails displaying neutral, synergistic or antagonistic effects. Phages coinfecting the same host gain access to a common pool of proteins, which can be considered as intracellular public goods. Subsequently, a plethora of social behaviors may arise including cheaters, which do not contribute to the production of public goods, and cooperators, such as phages producing depolymerases that facilitate the adsorption of other phages to the host 29 . On the other hand, the phagephage arms race might entail various strategies to prevent superinfection, such as triggering the premature lysis of bacterial cells infected by competitors 32 or phage-encoded CRISPR-Cas systems 33 . Therefore, phage-phage interactions are worth examining carefully, as they might be detrimental or beneficial to the outcome of phage cocktail application. With this in mind, redundancy is a useful parameter to be considered in order to formulate cocktails with superior efficacy.
Nonetheless, the intensity of phage-phage interactions also depends, to some extent, on phage-host dynamics. For instance, the bacteriophage multiplicity of infection (MOI), ratio of phages to bacteria, is frequently employed to calculate the amount of phages that should be applied during dosing. Additionally, bacterial densities can change, and target bacteria may not be homogeneous regarding phage access 34 . As a result, redundancy variation www.nature.com/scientificreports/ analysis could be customized by incorporating additional information such as the MOI, the ratio of phageome to microbiome in the original environment, the burst size of different phages or their virulence index score 35 to further improve its value during cocktail design.
Size is not a determining factor for cocktail efficacy. Next, we assessed the potential correlations among the different parameters that characterize the networks by building a heatmap (Fig. 5). Interestingly, the expected success of the cocktails (% of bacteria lysed) displays a moderately positive correlation with the fill of the network, whereas its correlation with the number of phages, bacteria or cocktail size (MCS) is weak (< 0.4).
A previous comparison of 31 datasets (Molina et al. 25 , data extracted from Chan et al. 36 ) had already shown that the observed bacterial load reduction does not increase significantly with the size of the phage cocktail. As shown in Fig. 3C, there is a strong positive correlation (Spearman's r = 0.84) between MCS and Φ. Also, both parameters show a negative correlation with the fill of the PBIN, being particularly strong (> 0.8) in the case of Φ (Fig. 5). Additionally, MCS has a positive correlation with the number of target bacteria and rv max . In contrast, cocktail size is not correlated at all with the experimental rv (Fig. 4c), probably due to its negative correlation with redundancy in both PBINs and PCNs. These results indicate that datasets comprising more bacteria correspond to low filled networks, which leads to larger cocktail sizes and low redundancy values. To fully distinguish whether this reflects the different structure of small and large networks or some bias in data gathering by researchers, untrimmed datasets would be required.

Concluding remarks.
This study provides a simple and straightforward method to tackle cocktail design commencing with host range matrices. However, it must be noted that the use of binary networks, which only consider host range, leads to the loss of valuable information regarding infection efficacy 37 . It has been well established that a broad host range is not necessarily a synonym of success for bacteriophages 38 , especially because niche expansion may involve facing some antagonistic pleiotropic costs 39 , such as the loss of virulence 40 and/ or infection efficacy 41 . Moreover, the co-evolution of phages and their hosts often entails changes in bacterial community composition and structure 42 or a decrease in the phage growth rate 43 . These strategies would ensure preservation of the host bacterial population for long-term exploitation, as well as increase extracellular stability of the newly-formed virions 44 . Likewise, the existence of synergy and/or antagonism among phages co-infecting the same host 19 suggests that the development of new algorithms for phage cocktail design should involve the analysis of quantitative matrices and co-infection values. This method would allow the differentiation between simple short-term and more complex long-term biocontrol of host populations.
Immunity networks, built from CRISPR-Cas sequences, have been shown to influence the structure of complex PBINs due to their effect on virus diversification and host control 45 . Moreover, network proximity analyses have been used for rapid identification of potential drug combinations targeting 2019-nCoV/SARS-CoV-2 46 . Therefore, network-based methodologies can be applied not only to the control of microbes in foods, industrial settings and phage therapy applications, but also to study the dynamics of the microbiome, the evolution of host and viral populations 37 , and even to fight the SARS-CoV-2 pandemic 47 . Notwithstanding these examples of implementation of biological networks comprising viruses, the systematic analysis of PCNs is still far from being widespread. The present work attempts to tackle the potential offered by these strategies towards building a well-defined pipeline for phage cocktail design. However, further work will still be necessary to incorporate additional information regarding phage-phage interactions so that phage-bacteria networks accurately reflect the complexity of synergies and antagonisms amongst phages.   48 , and 2 were randomly selected after searching MEDLINE using the term "phage cocktail" and restricting the date from January 2019 onwards.

Generation of phage-bacteria infection networks (PBINs).
The host range matrices (Table 1) were processed in the form of bipartite phage-host interaction matrices, reducing the lytic spectrum to either lytic or non-lytic interactions. These matrices were later transformed into directed networks (PBINs) by importing them into the Cytoscape platform 49 using the plugin aMatReader. The Expected Importance (EI) of each phage (EI p ) and bacterium (EI b ) within the generated networks was calculated according to Eqs. 1 and 2, respectively: where | · | is the cardinality of the bacteria (B) and phages (P), and out d and in d refer to the outdegree and indegree of the nodes, that is, the number of arrows leaving source nodes (phages) or entering target nodes (bacteria), respectively. These equations consider that phages with a broad host range (higher outdegree values) are initially better candidates to constitute the cocktail, and that bacteria susceptible to be infected by fewer phages (lower indegree values) are more problematic for cocktail design. Additionally, it must be noted that the values generated by these two equations have opposite signs, which simplifies color labelling of source (phages) and target (bacteria) nodes in the networks (Fig. 1). An exception to this are phages that do not infect any bacterium or bacteria infected by every single phage in the dataset because both receive an EI of zero.
Heuristic search of minimum sized phage cocktails. First, bacteria lysed by a single phage (EI b = 1/ |P|) were identified for every network, and the corresponding phages were incorporated into the cocktail. Subsequently, the phage lysing the most bacteria (i.e., with the highest EI p value) was included in the cocktail, unless no additional bacterial strains were lysed by the newly added phage. This process was iterated until all susceptible bacteria were predicted to be lysed by the cocktail, using the Network Metrics algorithm implemented in the Cytoscape app PhageCocktail 1.1. 26 .
Exhaustive search of minimum sized phage cocktails. The Exhaustive Search option of the Cytoscape app PhageCocktail 1.1 was used to perform a combinatorial calculation of all the possible phage combinations from size 1up to a 12-phage cocktail size. Then, it returns the best phage combination (the one that lyses the most bacteria) for each size until reaching the maximum number of lysed bacteria. The last cocktail shown in the generated output file corresponds to the highest percentage of lysed bacteria, and, hence, was selected as the best candidate. The number of phages in this cocktail was then considered to be the Minimum Cocktail Size (MCS).