Robust bacterial co-occurence community structures are independent of r- and K-selection history

Selection for bacteria which are K-strategists instead of r-strategists has been shown to improve fish health and survival in aquaculture. We considered an experiment where microcosms were inoculated with natural seawater and the selection regime was switched from K-selection (by continuous feeding) to r-selection (by pulse feeding) and vice versa. We found the networks of significant co-occurrences to contain clusters of taxonomically related bacteria having positive associations. Comparing this with the time dynamics, we found that the clusters most likely were results of similar niche preferences of the involved bacteria. In particular, the distinction between r- or K-strategists was evident. Each selection regime seemed to give rise to a specific pattern, to which the community converges regardless of its prehistory. Furthermore, the results proved robust to parameter choices in the analysis, such as the filtering threshold, level of random noise, replacing absolute abundances with relative abundances, and the choice of similarity measure. Even though our data and approaches cannot directly predict ecological interactions, our approach provides insights on how the selection regime affects the composition of the microbial community, providing a basis for aquaculture experiments targeted at eliminating opportunistic fish pathogens.


Results
To investigate the bacterial community structure and dynamics in r-and K-selected communities, we assessed the co-occurrence patterns of 1,537 operational taxonomic units 17 (OTUs) observed in the microcosm experiment. This 16S-rRNA gene dataset consisted of 202 samples from 12 microcosms cultivated over 50 days. Note that, 6 of the microcosm were r-selected, and 6 were K-selected. Between day 28 and 29, the r/K-selection regime was switched such that r-selected communities now were K-selected (the RK-group) and vice versa (the KR group). Furthermore, the microcosms varied by the amount of resources supplied, high (H) and low (L). However, exploratory analysis of the dataset did not indicate any relevant effect of the resource supply, and hereafter we will only focus on the r-and K-selection regimes.

Similarity measurements and network inference.
We assessed the co-occurrence patters between the OTUs using two similarity measurements and varying levels of random noise, OTU filtering, and type of abundance (relative/absolute). In contrast to many other 16S-rRNA microbiome datasets, we estimated the bacterial community's absolute abundances using flow cytometry.
Here, we present the results for the Spearman correlation measure with a low level of random noise, low OTU filtering threshold and absolute abundances (see the Methods' section for more details). We decided to focus on the rank-based Spearman correlation because it is widely applied for detecting associations 16,18,19 . We will later discuss the robustness of these results by contrasting and comparing with other similarity measures and parameter choices.
From an ecological perspective, an interaction between two microbes is an effect which one microorganism has on another. This includes cross-feeding, biofilm formation, and parasitism 9,20,21 . However, in further discussion, unless stated otherwise, we will use the term interaction in a network-theoretic perspective, where we apply a "guilt by association" heuristic. This means that, we define two OTUs to have a positive interaction if they co-occur in the same samples to a larger degree than expected by random chance. Conversely, we define two OTUs to have a negative interaction if they co-occur more rarely than expected by random chance 16,22,23 . Even if there cannot be any direct ecological interactions between the bacteria in different microcosms, the network concept of interactions still enables us to infer associations across samples collected from different microcosms.
We wanted to create a network of the pairwise associations between the OTUs and thus had to determine which edges to include. Selecting a hard threshold for the q-value for an interaction to be statistically significant (for instance at q ≤ 0.05 ), is not an easy choice 24 . We, therefore, illustrate the number of significant interactions over a range of threshold q-values up to 0.05 (Fig. 1). From this figure, we see that there is no obvious cutoff.
There were in total 3250 interactions having q ≤ 0.05 , of which 1679 were 0.05 ≥ q ≥ 10 −4 and 639 with q < 10 −10 . Therefore, we determined 500 edges to be a reasonable balance between selecting high-significance edges and a network with lower average node connections. With this setting, the effective q-value threshold became 5.0 · 10 −13 . The resulting network modules were labelled using the walktrap algorithm with 20 steps 25 (Fig. 2).
Phylogenetic clustering within the modules. The co-occurance network analysis clustered the OTUs into four distinct modules (Fig. 2). Also, two OTUs were not assigned any module (shown as colourless nodes inside module 3) as they were connected to the rest of the network with negative interactions only. Module 3 and 4 stood out as the most interesting modules for two reasons. First, they had the largest number of nodes. Second, there were negative links between the modules, suggesting mutual exclusion between modules. For these two main modules we observed a large number of positive links within the modules, but negative edges between them. Therefore we wondered whether the modules were phylogenetically clustered. Indeed, we observed a clear pattern between OTU module membership and phylogenetic classification ( Fig. 3 (Fig. 4). From this time trajectory plot, we compared the panels diagonally and observed that microcosms undergoing K-section converged towards the upper-left area in the plot, whereas microcosms under r-section converged the middle-right area. This effect seemed independent of the experimental period and of pre-existing experimental conditions. A PERMANOVA analysis showed that the current selection regime was most important for the community composition ( R 2 = 0.344 and p < 10 −6 ) compared to the minor effect of the overall selection group ( R 2 = 0.084 and p = 0.017 ), see Methods for details. Consequently, in this respect the communities did not seem to have any memory-effect that gave rise to resistance against changes in composition. As the r/Kselection regimes resulted in clustered communities, we aimed at investigating how the network arose from these dynamics.     Fig. 2. For this, we visualised the rank-based z-scores of the OTU abundances (see Methods for details) for selected days during the experiment for high resource supply (Fig. 5). For low resource supply, the results were very similar and is thus not discussed any further (see Supplementary Fig. S1 for further details). There were some obvious patterns that were apparent when investigating the temporal networks, especially with regards to module 3 and 4. The abundance of the OTUs in module 3 increased during K-selection, while the ones in module 4 had the opposite trend and had high abundances during r-selection. Hence, within each module, the OTUs had coordinated abundance patterns leading to positive inferred interactions. On the other hand, between module 3 and 4, the abundance patterns were anti-coordinated such that we obtained negative interactions.
We expected the dataset to display two time periods of instability: The first at the start of the experiment when the microbial community would adapt to lab-culture conditions, and the second disturbance instability after switching the selection regime, after day 28. During these unsteady periods, we expected more instability and less coordination between OTUs belonging to the same module. This in turn, would contribute to negative  www.nature.com/scientificreports/ interactions or weaken the positive ones. However, this expectation was only partially fulfilled because we observed negative edges within modules in Fig. 5 also outside the two predicted periods of instability. One potential factor contributing to instability at the beginning of the experiment, was the fact that the oligotrophic seawater was introduced to high amount of nutrients, favouring r-strategists to proliferate even if the feeding was continuous.
Network robustness. If our results were different when changing parameters, the conclusions would be less likely to give us any real insight into how the communities actually behave. Therefore, we checked the robustness of the chosen parameters, changing one at a time while keeping all other parameters constant. Increasing the levels of random noise from low to medium (see Methods section) did not give any substantial difference in terms of significant interactions. Some cosmetic changes were visible due to different color labeling of communities and orientations of plots (details in Supplementary Section S2). Exchanging estimated absolute abundances with relative ones gave higher proportion of negative interactions and different assignments of OTUs into modules (see Supplementary Section S3). However, the greater trends in the results stay the same, such as clustering based on phylogeny and the considerable change in the community behaviour after switching selection regime. Selecting a more stringent OTU filtering cutoff only has a minor consequence on the results, at the level of cosmetic changes in the plots (see Supplementary Section S4 for details). On the other hand, we notice a more pronounced effect when replacing the Spearman correlation by Pearson correlation. This is not surprising, since Spearman is non-parametric and Pearson measures degree of linear co-occurrence. In this case (Pearson), we got far fewer negative significant interactions for the same q-value, none of which are among the 500 most significant ones. Still, modules of phylogenetically related OTUs are present, and the selection regime still seems to explain the modules (Supplementary Section S5).

Discussion
In literature, challenges of microbial datasets such as sparsity, compositionality and habitat filtering have been addressed and solutions proposed for finding ecological interactions 22,[26][27][28][29] . Despite the fact that predictions from ecological interaction-inference tools have been successfully validated in some cases 30,31 , any universally accepted gold standard of finding ecological microbial interactions is not yet agreed upon. Furthermore, some reviews assessing existing methods for inferring ecological interactions have demonstrated that current methods have far too low predictive power, and more refined approaches specifically designed to cope with difficulties in microbial datasets have failed to perform better than the basic ones 19,32 . Hence, we believe that our choice   Figure 5. Dynamic visualisation of the network in Fig. 2, for (a) the RK selection group and (b) the KR selection group for high (H) resource supply. Nodes are coloured according to the corresponding OTUs' abundance compared to its overall mean for all sampling days, represented by its z-score. Orange, grey and black nodes mean higher, about the same or lower abundance than its mean, respectively. The edges are coloured by the product of the nodes' z-scores. This means that blue and red edges contribute to positive and negative association across the time series, respectively. The grey edges indicate that no major contribution to neither positive nor negative association is made. As we want to emphasize the orange and black nodes, the nodes with higher absolute z-scores are larger. www.nature.com/scientificreports/ of using the relatively simple ReBoot 22 procedure is reasonable, even though the approach in and of itself is somewhat coarse-grained. We observed that our correlation networks clustered the OTUs according to taxonomy and niche preferences as a result of selection for K-and r-strategists. The finding that taxonomy and niche preferences dominate cooccurrence patterns is in line with work by Chaffron et al. 33 who produced similar results from samples stored in a ribosomal RNA database. Along the same line, Bock et al 34 also noted that many of the interactions in a correlation network occur between closely related species when studying bacterial and protist communities in European lakes. Bacteria with similar niches are expected to be competitors and, hence, have negative interactions with each other. However, the effect of habitat filtering will create positive correlations between species with similar niches that are often stronger than those arising from ecological competition 10,35,36 . The same reasoning goes for taxonomical relatedness, as closely related organisms often belong to the same niche and have similar functions. This favours positive interactions within modules, whereas we, to a lesser extent got negative interactions between modules where the growth requirements are different.
Moreover, we have not undertaken any attempt to deal with indirect interactions. This means that two OTUs can appear with a strong (correlation) link even though they have no direct effect on each other, but instead interact with a third OTU. Consequently, it is challenging to determine causality when working with inferred interactions. Also, such indirect effects can be caused by environmental variables and biological entities not taken into account, such as protists and bacteriophages.
For reasons mentioned above, our results are not meant to directly represent real ecological interactions. Nevertheless, our results are interesting from a fish-health perspective, as they show that selection regime can control community composition. In terms of r-and K-selection, literature consider the orders Alteromonadales and Vibrionales represented in module 4 as r-strategists, whereas the Rhodobacteraceae in module 3 are considered K-strategists 37 . Additionally, Vibrio strains are known to cause disease in fish, whereas Rhodobacteraceae bacteria have been shown to protect against Vibrio infections through competition [38][39][40][41] . This agrees with and extends prior knowledge that K-selection is a potent tools for improving fish health and survival 1,7 .
The long-term behaviour of the community did not appear to depend on its prehistory. Potentially, this means that changing the microbiota from a detrimental to a healthy state in a running aquaculture facility requires the same measures as ensuring a healthy microbiota for a new facility. Furthermore, the trustworthiness of the results is strengthened by their robustness to changes in parameter settings, such as filtering cut-off, amount of random noise, type of abundance, and similarity measure.
This experiment was conducted in an artificial setting without any fish of which the health could be tracked. Furthermore, we do not know whether up-scaling and broader exposure could change the workings of the microbial community. Therefore, follow-up studies could be implemented in realistic aquaculture settings, perhaps such as a RAS facility, to investigate whether switching between K-selection and r-selection will yield the same community dynamics as described in this paper. Additionally, such an experiment would provide opportunity to investigate possible connections between the state of the microbial community and the health of the fish.
We acknowledge that there exist alternative approaches one could follow. For instance, treating the OTUs as discrete units is a bit misleading. As the results show, closely related OTUs often occur together, so it could make more sense to treat the bacteria as a taxonomical continuum. A novel approach based on amplicon sequence variants (ASVs) avoids the clustering of OTUs altogether by considering each individual unique read as an own entity 42 . The phylogenetic relatedness between ASVs could then be used as a constraint for finding co-occurrence patterns. In addition, incorporating environmental information, such as organic nutrient load, salinity, and temperature, would be useful because this allows us to better predict how the desired K-selection should be obtained. Joint Species Distribution Models (JSDMs) 43,44 might have this useful potential to account for both species interaction, environmental factors, and taxonomical relatedness. However, its use in microbial ecology is still in its early stages and time dynamics are not yet embedded into the framework 45,46 .

Methods
Selection-switch experiment. The dataset used for this article is previously published 14 , but we include a brief summary for completeness: Natural seawater was collected and used to inoculate microcosms in a 2 × 2 factorial crossover design with 3 replicates conducted for 50 days, which were sampled 18 times during the experiment. Half of the microcosms were given high (H) resource supply, whereas the other half were given low (L) resource supply. The factor of resource supply level was constant throughout the experiment. The other factor was the selection regime, which meant that the microcosms were either given continuous supply of nutrients (favouring K-selection, and hence the designation K) or being pulse-fed with nutrients after diluting the contents of the microcosms with growth medium (favouring r-selection, designated R). The active selection regime was switched at the experimental halfway point (between days 28 and 29), yielding two selection groups designated as RK and KR.
DNA was extracted from the collected samples, and the V3-V4 region of the bacterial 16S-rRNA gene was amplified with PCR using broad-coverage primers and the index sequences were ligated. The amplicon library was pooled and sequenced with two runs on an Illumina MiSeq machine. The reads are available at the European Nucleotide Archive with accession number ERS7182426-ERS7182513.
The USEARCH pipeline 47 (v11) was used to remove low-quality reads and cluster the reads into OTUs at 97% similarity level. Finally, the taxonomy of the OTUs was determined by the Sintax classifier using data from the RPD training set (v.16) where the confidence threshold was set to 80%.
Alignment and phylogentic tree. The selection-switch dataset was acquired directly from the authors 14 .
This dataset consists of a total of 206 samples. Two of these samples were taken from the communities from which the reactors were inoculated, whereas the other samples were taken from the microcosms with 17 time points x 4 regimes x 3 replicates. We discarded the inoculum samples for further analysis. The OTU reference sequences were aligned with SINA version 1.6.1 48 using the SILVA Release 138 NR 99 SSU dataset 49 . Using this aligment, the phylogentic tree was constructed by neighbour-joining using MEGA X 50 with default parameters.
Filtering and preprocessing. The mean number of reads per sample was 63,460 with standard deviation 31,411. For our analysis, we wanted to estimate the abundance of each OTU as accurately as possible and therefore skipped any correction for unequal sequencing depth. Read counts for each OTU in each sample were divided by the total number of reads for the sample, generating relative abundances. Thereafter, all OTUs having a maximum abundance (across all samples) below a certain threshold, were removed. Three levels of filtering thresholds (as count proportions) were applied: High level at 5 · 10 −3 , medium level at 1 · 10 −3 and low level at 5 · 10 −4 . The purpose of the filtering was to remove rare OTUs in order to avoid noise and spurious correlations 11 . For obtaining estimates of absolute abundances, the relative abundances were scaled by the estimate of total bacterial cell density for each sample. The phyloseq package (version 1.36.0) 51 and the R programming language (version 4.1.1) 52 facilitated this procedure. In addition, we wrote an R-package named micInt (version 0.18.0, available at https:// github. com/ Almaa sLab/ micInt) to facilitate and provide a pipeline for the analysis.
Similarity measures and addition of noise. For this study, we used two similarity measures, the Pearson correlation and the Spearman correlation. A similarity measure, as referred to in this article, can be thought of as a function f : In this regard, f x, y is the similarity of two abundance vectors x and y belonging to different OTUs, where f x, y = 1 indicates perfect correlation, f x, y = 0 indicates no correlation and f x, y = −1 indicates perfect negative correlation. Noise was added to distort patterns of double zeros, which otherwise could result in spurious correlations. Given two vectors x and y of abundances, normally distributed noise was added to each of the abundance vectors, and the similarity measure has invoked thereafter: Given a similarity measure f, the similarity between the abundance vectors after adding noise is given by: where ε x and ε y are random vector where all components are independent and normally distributed with mean zero and variance γ 2 . The level of noise γ was determined by the smallest non-zero relative abundance x min in the dataset and a fixed constant s called the magnitude factor, such that γ = s · x min . For no noise, s = 0 , for low noise s = 1 , for middle noise s = 10 and for high noise s = 100.
Network creation. Significance of the pairwise OTU associations were determined by the ReBoot procedure introduced by Faust et al. 22 and shares the underlying algorithm used in the CoNet Cytoscape package 53 . This approach accepts a dataset of microbial abundances and a similarity measure, and evaluates for each pair of OTUs in the dataset the null hypothesis H 0 : "The association between the OTUs is caused by chance". By bootstrapping over the samples, the similarity score of each pair of OTUs is estimated, forming a bootstrap distribution. By randomly permuting the pairwise abundances of OTUs and finding the pairwise similarity scores, a bootstrap distribution is formed. The bootstrap and permutation distribution are then compared with a two-sided Z-test (based on the normal distribution) to evaluate whether the difference is statistically significant. For this, the z-value, p-value and q-value (calculated by the Benjamini-Hochberg-Yekutieli procedure 54 ) are provided for each pair of OTUs in the dataset. Our ReBoot approach is based on the R-package ccrepe (version 1.28.0) 55 , but is integrated into the micInt package with the following major changes: • The original ReBoot uses renormalization of the permuted abundances to keep the sum-to-constant constraint. Whereas this is reasonable to do with relative abundances, our modified version enables turning this feature off when we analyse data with absolute abundances. • Optimizations have been made to memory use and CPU consumption to enable analyses of large datasets.
• In contrast to the usual ReBoot procedure, networks generated by the different similarity measures are not merged by p-value, but kept as they are.
For our analysis the number of bootstrap and permutation iterations was set to 1000. All OTUs being absent in more than n · 10 − 4 n samples, where n is the total number of samples, were excluded through the errthresh argument but still kept for renormalization (if turned on). The associations were made across all samples, even the ones belonging to a different selection group or resource supply.
(1) f * x, y = f x + ε x , y + ε y , Curtis distance metric between the samples was applied before creating the decomposition. After the ordination was computed, the samples were divided into four facets based on their combination of current selection regime and resource supply. Finally, all samples belonging to the same microcosm were connected by a line in chronological order and the line was given a separate style based on the resource supply and coloured to visually distinguish it from the two other replicate microcosm within the same facet.

Permutational multivariate analysis of variance. Sequential PERmutational Multivariate Analysis of
VAriance (PERMANOVA) of the samples was conducted on the absolute abundances, where only the samples from day 28 and 50 were included. These sample points correspond to time just before the experimental selection-regime crossover and a point at the end of the experiment. These days were selected because they were the most likely to capture the composition of stable communities in contrast to transient ones. The procedure was carried out by the function adonis from the R package vegan (version 2.5-7) with 10 6 permutations. The dependent data given to the function was the matrix of one minus the Spearman correlation of the samples (in order to resample dissimilarity), while the independent variables were the selection group (first variable) and the current selection regime (second variable).
Network visualization. The networks were plotted by the R package igraph (version 1.2.6) 56 . Network modules were found by the walktrap 25 algorithm implemented in igraph with the setting steps=20, including the positive edges only. Later, the negative edges were added and the networks plotted with the community labelling.
The time dynamics of the networks were visualised by taking the former network and adjusting the node colour and size, as well as the edge colour. For this, a certain combination of selection group (i.e RK) and resource supply (i.e H) was chosen. Further, let x i,j,k be the abundance of OTU k at sampling day i in microcosm j. As there are three replicates, we have that j = 1, 2, 3 . If the underlying network was created by Pearson correlation, we denote the day mean x i,.,k as the average over the replicates, this is: The time series mean of OTU k, x .,.,k is the mean of these daily means over all sampling days, where N denotes the number of sampling days. Furthermore, we have the associated standard deviation σ k as given by: The z-value of the abundance of OTU k at day i is then: This value is used in the mapping of the node sizes and colours. The node for OTU k at sampling day i has the size a + b · z i,k , where a and b are constants. Furthermore, the same node is coloured: • Black if z i,k < −1 . This indicates that the OTU that day had a lower abundance than the average. • Grey if −1 ≤ z i,k ≤ 1 . This indicates that the OTU that day had about the same abundance as the average. • Orange if z i,k > 1 . This indicates that the OTU that day had a higher abundance than the average. Furthermore, the edge colour are dependent on the product of the two participating nodes. Hence, the edge between OTU k and OTU l at day i will have the colour: • Red if z i,k · z i,l < −0.3 . This shows a contribution to a negative interaction.
• Gray if −0.3 ≤ z i,k · z i,l ≤ 0.3 . This shows no major contribution of neither a positive nor negative interaction.
• Blue if z i,k · z i,l > 0.3 . This shows a contribution to a positive interaction.
Our approach is motivated by the fact that the Pearson correlation ρ k,l of the day means of OTU k and OTU l is given by: For the Spearman correlation, the visualization is based on the rank of each of the OTU abundance values in a sample. Hence, instead of using the raw abundances x i,j,k in the calculation of the day mean, the ranks r i,j,k (2) x i,.,k = x i,1,k + x i,2,k + x i,3,k 3 .