Introduction

The study of ecological succession, or the process by which biological communities develop over time, has been integral to the development of ecological theory [1,2,3]. Primary succession begins with the colonization and mobilization of nutrients by pioneer communities in a recently exposed environment with little to no pre-existing life, such as after glacier retreat or after volcanic eruptions [3]. Despite the important role of microorganisms as early colonizers in primary succession, most of the studies examining community change during primary succession have historically focused on plant communities [4]. However, primary succession can also occur in microbial habitats, including the surfaces of plant leaves and flowers [5,6,7], exposed rock surfaces [8], glacial sediments [9,10,11], animal guts [12], and biofilms [13]. Furthermore, studying primary succession from a microbial perspective has the added advantage that ecologists can examine community development patterns in a time frame far shorter than what would be required to track primary succession patterns in plant or animal communities [14]. Spurred, in part, by the widespread use of DNA sequencing-based methods to survey microbial communities, there has been a recent increase in the number of studies characterizing the trajectories of microbial communities during primary succession across a wide range of different habitats [14,15,16]. It remains unclear whether similar trends occur in microbial community structure across the diverse array of habitat types in which primary succession can occur.

Patterns in ecological succession across different studies conducted in distinct habitats can be compared by focusing on the changes in the number and frequency of taxa (i.e., alpha diversity) and on the degree of differentiation/variability among local communities within a given region or habitat (i.e., beta diversity) between early and late successional stages within individual studies. Since successional trajectories may be highly irregular and non-linear, depending on the specific characteristics of each habitat and the timing of environmental and/or community changes [4, 17] we do not expect the timing of successional change to be identical across systems. Nevertheless, we can still contrast those microbial communities found in early vs. late successional stages to broadly compare successional patterns across habitats [2, 3, 18]. In general, the number of different species is expected to increase from early to later stages of succession due to an increase in potential niches, resource diversity, and habitat heterogeneity [3, 19, 20]. For example, the number of plant species doubled 2 years after the 1980 eruption of Mount Saint Helens [21] and, in human infant gut communities, bacterial diversity doubled over the first year of life [12]. In addition to changes in alpha diversity, we would also expect the degree of differentiation (i.e., dissimilarity in overall composition) between communities at a given successional stage to decrease with time due to selection of suitable taxa under homogeneous environmental filters [18]. However, this has not yet been empirically evaluated across habitats.

In addition to changes in the taxonomic composition and diversity of communities during succession, we would also expect the distribution of traits within these communities to change. During succession, the ability of both plants [22] and microbes [23] to establish, survive, and thrive is based, in part, on their functional traits that confer colonization potential, competitive advantage, or stress tolerance. Changes in functional groups or community-averaged traits linked to ecological succession can be predicted with a high degree of confidence, and documenting such changes can give added insight into patterns of community assembly during succession [3, 4, 24]. For example, a study in grasslands showed that communities converged in trait composition at late successional stages, although the same plant communities were taxonomically distinct across late-successional plots [25]. Such functional and trait-based approaches have substantially improved the mechanistic understanding of ecological processes affecting ecosystem structure and dynamics [26]. For instance, the colonization sequence of algal surfaces by bacterial communities is not consistent with respect to species composition but there are consistent patterns in the functional attributes of the community members [27]. Despite the potential importance of traits, so far only the rRNA operon copy number (a proxy for maximum growth rate [28]) has been proposed as a genomic trait linked to ecological succession in different ecosystem types [29]. We aim to identify whether there are other microbial traits that shift in a predicable manner during ecological succession across habitats.

We carried out a meta-analysis of 121 16S rRNA gene libraries from 17 different studies to explore how the taxonomic diversity and composition of microbial communities change with succession and whether there are corresponding changes in specific functional attributes (inferred from genomic information) with succession. These data were derived from seven distinct habitats (gut-associated, plant-associated, soil, river biofilm, microbial mats, and saline lakes). Because the temporal dynamics of succession are likely to be distinct across this range of habitats and because the timing of sample collection differed across studies, we focused on comparing communities within each study across “early” vs. “late” stages of succession (with “early” and “late” samples defined separately for each study depending on the data available). Although the selected studies were restricted to those in which succession started with pioneer microorganisms, they differed with respect to the types of communities found in their respective habitats, the environmental conditions, and the pace of succession. We also compared how specific community-weighted microbial traits varied between early and late successional stages, under the assumption that some functional attributes should consistently become more or less important during different stages of succession, regardless of the habitat in question.

Materials and methods

Sample selection and habitat classification

We compiled 61 early and 60 late 16S rRNA gene libraries from the available literature (Tables S1 and S2). “Early” successional stages were represented by those samples selected shortly after the start of community development, while the “late” successional stages were represented by those samples within each study collected at the last time points available. To focus on primary succession regardless of different successional timespans, for the selection of sites/individuals we required external environmental conditions to be stable, if possible without strong perturbations or nutrient input changes. For statistical consistency, samples were selected with at least two replicates for early and late stages, summarized into 27 sites or individuals. Replicates of each sample were those taken from the same individual/site if possible (gut microbiomes), or from an equivalent source based on the conditions described in the original articles when sampling required destruction of the original sample (i.e., phyllospheres). Samples were classified as gut-associated (A), plant-associated (B), soil chronosequence (C), and water-associated (D) microbial communities. Within these categories, we considered different subcategories (habitat types): infant gut (A1–A4), primate gut (A5–A10), plant-associated (B1–B4), soil chronosequence (C1–C4), salt marsh chronosequence (C5–C8), river biofilm (D1), saline shallow lakes (D2–D4), and Hydra development (D5).

Sequence processing

Sequences originating from 454 and Illumina technologies were trimmed using Trimmomatic [30], cutting the first and last eight nucleotides, and trimming the rest of the sequence when the average quality of four nucleotides fall below 15. We kept only sequences with a minimum length of 150 nucleotides. Sequences from Sanger sequencing were left untreated. In order to associate taxonomy to genome content, we used a 16S rRNA database [31] linked to the IMG genomic database [32]. Processed sequences were matched to 16S rRNA gene records available in the PATRIC genomic database (as of January 2016) [31] using the usearch_global command [33]. This allowed us to cluster sequences of unequal lengths to a certain percentage of identity, despite being limited to the number of sequenced genomes available. A total of 1,098,744 sequences had a match of at least 97% with a sequenced genome, up to 1844 genomic matches. This approach allows a compromise between statistical consistency and taxonomic resolution, and has been previously shown successful to detect relationships between spatial distribution and genomic traits in soil bacteria [34]. Genomic matches averaged a total of 48.6% of the sequences (Fig. S1). In the analysis, only 14 phylotypes belonged to Archaea, so all the results displayed here respond to patterns in Bacteria.

Functional predictions

Metagenomic successional data sets across different habitats would be the best option for functional evaluation [35], but unfortunately available data sets are still limited. Functional predictions based on representative genomes are, however, still useful for the estimation of genomic and metabolic potential [29, 36]. For that purpose we downloaded from IMG [32] a functional matrix of 8191 gene categories (KEGG orthologs), their counts per genome, and genomic traits (rRNA operon copy number, G + C content and genome size) for 1844 genomic matches. Prediction required matching of the 16S rRNA gene at the 97% identity level, although we acknowledge that some strains within this level may have distinct functional signatures [37] or environmental distributions [38]. Therefore, further studies are encouraged to confirm our observations based on functional predictions. We calculated weighted-community trait abundance per replicate, combining the functional data and the relative abundance matrix of genomic matches. We assessed the relative amount of carbon fixation (genes prkB: K00855 and rbcS: K01602), nitrogen fixation (genes anfG: K00531, nifD: K02586, glnA: K01915, and nifK: K02591), and high-efficiency inorganic phosphate transport (genes pstB: K02036, pstC: K02037, and pstA: K02038), averaging the weighted KEGG abundances per process in the different samples between early and late stages of succession.

Diversity calculations

For alpha diversity, the Shannon index measures the amount of information contained in a system based on the number of species and their frequencies. Since alpha diversity measures are highly sensitive to sequencing depth, and original studies yielded different per-sample sequencing depths, we calculated the Shannon index after averaging the values from 100 rarefactions to 50 sequences per replicate sample. Shannon values estimated by subsampling to 500 and 1000 sequences per sample were well correlated with the Shannon values estimated by randomly selecting only 50 sequences per sample (both, r > 0.99, p < 0.001). No rarefaction was conducted for any other analysis, and other transformations were applied to standardize data without losing information [39]. Functional Shannon diversity was calculated on the community-weighted KEGG profiles per sample, weighting by relative abundances. For changes in community similarity (i.e., beta diversity), we calculated the Bray–Curtis dissimilarity metric between early and late communities for both the taxonomic and functional profiles (based on the whole KEGG profiles) after Hellinger transformation of non-rarefied matrices. We explored community dissimilarity differences with “habitat” and “succession stage” as sources of variation (permutational multivariate analysis of variance using distance matrices) regardless of the study of origin (ADONIS). The same analysis was used to search for differences in weighted occurrence values by succession stage in taxa and genes (using Euclidean distances with w-occurrence values as the input matrix). Then we compared only the replicates of the same sample (site or individual). If at least two replicates were available per sample (all except B1 and B3), the difference between “late” and “early” successional stages was calculated. If more than two replicates were available, we used the mean of late distances minus the mean of early distances. A10 samples were removed from early vs. late comparisons because of an extremely high difference of sequences, although they were still included in multivariate ordinations. A simplified version of the R script is available online (https://github.com/Rudigerceab/succession_ismej).

Results and discussion

To compare the communities across the 17 studies we started by matching the 16S rRNA gene sequences available for each study to the corresponding genomes available in IMG database [32]. This step was necessary as the selected studies differed with respect to the molecular methods used to characterize the microbial communities, making direct comparisons across studies difficult. Additionally, having whole-genome information allowed us to determine how the functional attributes of the communities varied over the course of succession. However, we acknowledge that by focusing solely on those bacterial taxa for which whole-genome information is available, we are excluding many taxa for which genomes from closely related taxa are not available. The proportion of sequences that matched the genome database was 49% across all samples and it ranged from nearly 60% in the primate gut data set to 8% in the salt marsh habitat. Interestingly, samples representing early successional stages typically had a higher proportion of genome matches per habitat (Fig. S1). Because the ubiquitous, faster-growing bacteria tend to be over-represented in genome databases due to their relative ease of cultivation [40], we would expect a higher number of genome matches where such bacterial types were more abundant. This has ecological significance, since it implies that there are more opportunistic, faster-growing bacteria in early successional stage communities. Additionally, we calculated overall averaged occurrences of the matched genomes per successional stage. An aggregated value of occurrence weighted by relative abundances per sample can indicate if taxa in those samples are, on average, more ubiquitous or more specialized in their habitat preferences. We observed that late successional stage microbial communities had less ubiquitous taxa than early stage communities (mean phylotype weighted-occurrence was 19.65 in early communities compared to 17.01 in late communities). Results were almost significant when considering all samples within an habitat (ADONIS R2 = 0.02, p = 0.08), and significant when dividing only per sample (ADONIS R2 = 0.02, p = 0.02), highlighting that those taxa that are more abundant in communities during the early stages of succession tend to be more widely distributed in most habitats, except soil chronosequences and plant communities (Fig. S3a).

Habitat drives strong community differentiation in late successional stages

It has been repeatedly shown that different habitats harbor distinct microbial communities [41, 42]. Not surprisingly, our results confirmed that the different habitats harbored communities that were distinct in taxonomic composition, both in the early and late stages of succession (Fig. 1a; see Fig. S2 for the distribution of major phyla across habitats). On average, only 8% of phylotypes (range 1.8–19.3%) in early successional samples and only 6% of phylotypes (0.5–15%) in late successional samples were shared between any pair of habitat types. When partitioning dissimilarities for the sources of variation (habitat and succession stage), we observed that in addition to habitat (ADONIS R2 = 0.40, p = 0.001), the successional stage was strongly significant and dependent on the habitat, although the percentage of variation explained was smaller (ADONIS R2 = 0.08, p = 0.001). Indeed, microbial communities were better differentiated with less overlap across habitat types in late successional stages than in early stages (ANOSIM: R = 0.92, p < 0.001; R = 0.73, p < 0.001, respectively). For example, primate gut samples [43] were far more differentiated from human gut samples [12, 44,45,46] in the late stages of succession compared to the communities found in these distinct hosts at the early stages of succession (ANOSIM R = 0.29, p = 0.007 and R = 0.57, p < 0.001, for early and late-stage comparisons, respectively).

Fig. 1
figure 1

Comparison of early and late-stage successional samples in non-metric multidimensional scaling ordinations based on Bray–Curtis dissimilarities of taxonomic (a), and functional (b) matrices

We next compared the relative abundances of different gene categories, as calculated from the matched genomes [32] (Fig. 1b), across successional stages. When partitioning dissimilarities for the sources of variation, habitat was a strong predictor of function (ADONIS R2 = 0.52, p = 0.001). The successional stage was again strongly significant and dependent on the habitat, although the percentage of variation explained was small (ADONIS R2 = 0.05, p = 0.002). We further observed that although the differentiation among communities from different habitats with respect to their annotated gene content was weaker than when we simply focused on taxonomic composition, functional changes across habitats were still significant in both late (ANOSIM: R = 0.66, p < 0.001) and early successional stage communities (ANOSIM: R = 0.43, p < 0.001). In other words, the communities found in different habitats were still distinct with respect to their genomic attributes, but such differentiation was lower than when we simply focused on the taxa present. Given that genomic strategies between gut symbionts and free-living bacteria are fundamentally different [47], it was not surprising to observe a weak differentiation within gut symbiont habitats and free-living habitats (water, soils, and plant phyllospheres; Fig. 1b). Since each gene category is an orthologous group of different genes with analogous functions, such weaker differentiation could be related to functional redundancy. That is, a common functional core of the genomic repertoire is present across communities in different habitats [48], a result that may reflect annotation biases toward “housekeeping” genes and other genes that are widely shared across taxa. As with the taxonomic patterns, late successional stage communities showed on average significantly fewer widely distributed genes than early stage communities (mean functional weighted-occurrence 464.46 in early communities compared to 323.46 in late communities, ADONIS R2 = 0.005, p = 0.002, Fig. S3b).

We would expect stochastic processes (processes that incorporate random variation such as random dispersal, ecological drift, or historical contingency) to be more important in structuring early stage successional communities [49]. Likewise, deterministic processes (that is, processes that lead to predictable outcomes such as environmental selection, biotic competition, or facilitation) are probably more important in structuring communities in later stages [18]. Taken together, our results show that early succession microbial communities are not just random subsets from a regional pool of species, instead they are the result of habitat-specific environmental filtering in regional pools [50]. However, this environmental filtering effect is stronger in late succession communities, when there is an increase in the habitat specificity of both taxa and annotated genes (Fig. 1). In fact, habitat selection is evident in those environments subject to the influence of airborne colonizers (i.e., phyllosphere communities or temporal lakes) and are clear examples of these ecological processes. Airborne colonizers in early communities of plant surfaces are dispersed and later selected to form distinct community compositions [5]. In lakes, aerial colonizers [51] and soil colonizers [52] are environmentally selected by the conditions of each lake to assemble the resulting communities.

Equivalent diversity changes in different habitats along succession

We compared the taxonomic and functional diversity of the communities in early vs. late successional stages. Taxonomic alpha diversity, measured using the Shannon index (H′) after standardizing sequencing effort, was generally higher in late stages of succession for most of the studied habitats (Fig. 2a). Although alpha diversity of functional genes followed the same general trend, the patterns were more variable than with taxonomic diversity (Fig. 2c). In general, we would expect that diversity would typically increase from early to late stages of succession due to an increase in potential niches, resource diversity, resource availability, and habitat heterogeneity [3, 20]. However, an increase in interspecific competition during the later stages of succession might counterbalance this increase in diversity [53]. The successful establishment of highly competitive organisms that become abundant under specific environmental conditions can explain this decline in diversity at late stages. In the microbial data sets analyzed, this decline is clearly observed in the development of Hydra, where the highly competitive bacteria Curvibacter dominates late-stage successional communities [54], and also in the saline shallow lakes where a few salinity tolerant taxa dominate at late successional stages [55].

Fig. 2
figure 2

Changes between late and early stages of diversity metrics. Taxonomic (a) and functional (c) Shannon index (H′), and beta diversity based on Bray–Curtis dissimilarities of taxonomy (b) and functions (d). A discontinuous line represents the overall mean change. The abbreviations “Ph.” and “H.” indicate the “Phormidium river biofilm” and “Hydra” samples, respectively

After assessing the changes in the number and frequency of taxa and functional genes, we explored the degree of community differentiation (beta diversity) between early and late successional stages. Considering all samples within a given habitat, later successional communities were consistently more similar to one another regarding their taxonomic composition than early successional communities (Fig. 1), and this was consistent by sample (Fig. 2b). Functional dissimilarity followed the same trend (Fig. 2d), that is, we found more functional convergence (more similar communities) at later stages of succession. The salt marsh chronosequence communities were the only exception to this general trend. It has been proposed that saline environments tend to develop strong gradients with heterogeneous conditions [55, 56] and anaerobic microsites [20, 57] that might enhance historical contingency and priority effects [58]. Interestingly, not a single microbial habitat showed both taxonomic community divergence (dissimilar communities) and functional convergence, a pattern observed in plant communities at later successional stages [25]. In some cases, we might expect that as succession proceeds, communities would tend to establish a more stable state with the surrounding environment [3, 53]. Ecological theory posits that a single stable state or equilibrium is likely to happen in systems with small regional species pools, high rates of connectivity, and low productivity [50, 58], such as primary successional systems. We have shown that different samples from the same habitat with similar conditions develop similar communities over time. This result follows expectations, and is likely explained by the effect of environmental filters on community assembly [59]. We can potentially predict the late-stage composition of microbial communities along succession given enough information on habitat characteristics, but such predictions would be less accurate for early stage communities.

Detection of changes in functional strategies along succession

We explored how functional traits aggregated per community (community-weighted traits) changed between early and late successional stages. We noticed that trait changes were more variable and complex than patterns in alpha diversity, even within the same habitat. Of the six traits evaluated (Figs. 3, S4), only two of them showed strong consistent signals across habitats. As previously reported [28, 29], we observed a general decrease in average rRNA operon copy numbers in most late-stage successional samples (Fig. 3a). Bacteria with higher rRNA operon copy numbers are typically more copiotrophic and have higher maximum growth rates, while microbes with lower rRNA operon copy number are expected to be slower growers and better competitors at later successional stages [28, 29]. Other microbial traits expected to also change consistently with successional stage such as genome size [34, 60, 61] or G + C content [62,63,64] did not vary consistently in the habitat types included in this meta-analysis (Fig. S4). Further studies on how microbial traits shift across successional stages are needed to make strong predictive inferences on microbial community assembly.

Fig. 3
figure 3

Changes in rRNA operon copy number (a) and high-efficient phosphate transport (b) community-weighted functional traits. A discontinuous line represents the overall mean change

Nutrient availability is known to have a direct impact on succession trajectories [9, 65]. Although we did not observe any trends for carbon or nitrogen fixation genes (Fig. S4), we found a consistent decrease in genes associated with the uptake and mobilization of inorganic phosphate (Pst gene system) in late-stage successional communities (Fig. 3b). Phosphorus (P) availability can often limit bacterial growth [66, 67], and its assimilation is more efficient in bacteria than in other organisms, such as phytoplankton [68, 69]. Traits related to phosphate uptake are expected to be important in many oligotrophic systems where labile forms of organic phosphorus are likely to be less available [70,71,72]. However, changes in phosphate uptake capabilities are an understudied component of microbial community changes during primary succession. Our results indicate that inorganic phosphorus assimilation is a relevant trait in microbial communities during the early stages of primary succession in all the habitats studied, except the Hydra and primate-gut samples, maybe the latter due to the short timespan and a specific type of change in the primates [43]. Also, nitrogen and carbon fixation changes, although heterogeneous, tended to be stronger in the plant and soil communities than in the gut communities (Fig. S4), indicating the differential importance of these processes per type of habitat and growth substrate [19]. The observed changes in both rRNA copy number and the Pst gene system suggest that later successional stages have relatively more slow growers (oligotrophic organisms) adapted to lower nutrient availability than early successional stages.

Conclusions and perspectives

Successional patterns in community composition have been traditionally studied with plant communities [1, 3, 4, 73]. By combining studies that have examined primary succession patterns across a wide variety of habitats, we were able to identify reasonably consistent and predictable trends in community composition, diversity, and functional attributes across successional gradients, trends that are in agreement with current concepts about how plant communities shift during primary succession. The understanding of these changes opens opportunities in a varied scope of research [74]. For example, the change of communities along the development of animal and plant diseases [75, 76] could be predicted based on the diversity changes here observed. Also, further research could focus on communities affected by global change based on the observed importance of P limitation and biogeochemical cycles [77]. Understanding the ecological processes behind microbial primary succession may be especially useful within restoration and conservation frameworks [78] tracking the progression of community change between early and late stages.