Anthropogenic ecosystem disturbance and the recovery debt

Ecosystem recovery from anthropogenic disturbances, either without human intervention or assisted by ecological restoration, is increasingly occurring worldwide. As ecosystems progress through recovery, it is important to estimate any resulting deficit in biodiversity and functions. Here we use data from 3,035 sampling plots worldwide, to quantify the interim reduction of biodiversity and functions occurring during the recovery process (that is, the ‘recovery debt'). Compared with reference levels, recovering ecosystems run annual deficits of 46–51% for organism abundance, 27–33% for species diversity, 32–42% for carbon cycling and 31–41% for nitrogen cycling. Our results are consistent across biomes but not across degrading factors. Our results suggest that recovering and restored ecosystems have less abundance, diversity and cycling of carbon and nitrogen than ‘undisturbed' ecosystems, and that even if complete recovery is reached, an interim recovery debt will accumulate. Under such circumstances, increasing the quantity of less-functional ecosystems through ecological restoration and offsetting are inadequate alternatives to ecosystem protection.

F ew ecosystems on Earth are undisturbed by people 1 and many degraded ecosystems are in the process of recovering worldwide [2][3][4] . Although in most cases the recovery process is without human intervention, societies spend billions of dollars annually to restore ecosystems [5][6][7] . Supporting recovery without intervention and repairing disturbed ecosystems are crucial to regain lost biodiversity, ecosystem functions and services provided to society [8][9][10] . Assessments of anthropogenic disturbances have shown global losses 11 in biodiversity, whereas the disturbance is still active and time lags exist in its response 12,13 (Fig. 1). However, as ecosystems recover after the disturbance ceases, it is less clear to what extent they continue to endure deficits in biodiversity and functionality.
Here we quantify the interim reduction of biodiversity and biogeochemical functions occurring during ecosystem recovery, which we call the 'recovery debt'. This metric measures the per annum amount that an ecosystem function or biodiversity is reduced during the recovery process after disturbance ceases (Fig. 1). The recovery debt is a useful indicator of the magnitude of ecosystem degradation, because even if ecosystems eventually recover their biodiversity and functions, there may be a long period of time until complete recovery is achieved. During the recovery debt period, shortfalls in biodiversity and ecosystem functionality will affect the quantity and quality of ecosystem services provided by the recovering systems.

Results
Meta-analysis descriptors. We found data from 3,035 sampling plots from 348 published primary studies covering a total study area 4550,000 km 2 ( Supplementary Figs 1 and 2 Table 1). Data collection was restricted to six major ecosystem categories (forests, grasslands, wetlands, rivers, lakes and marine ecosystems), eight anthropogenic disturbance categories (agricultural transformation, logging, mining, invasive species, eutrophication, hydrological disruption, overfishing and oil spills or combinations of them) and four recovery metrics (organism abundance, species richness, carbon cycling and nitrogen cycling). We also included hurricanes as an example of a natural disturbance for reference.

, Supplementary References and Supplementary
The outcome measures in the database related to the recovery metric 'organism abundance' included measurements of density, biomass, cover and basal area of trees, shrubs, grasses and algae, and measurements of density of birds, fish and invertebrates. The outcome measures related to the recovery metric 'diversity' included mainly measurements of species richness and diversity indexes, such as Shannon, Simpson and evenness indexes. Biogeochemical outcome measures related to the cycling of carbon and nitrogen contain both pools and fluxes of these elements in soil, litter and the water column. We amassed 3,816 outcome measures for which two measures of recovery were collected over time and compared with a reference value. The reference value was taken from either the same ecosystem before degradation occurred or a nearby comparable ecosystem that was undisturbed.
Recovery debt estimations. A per annum recovery debt was found in all the categories in which data were available (Fig. 2). We found that ecosystems undergoing recovery had about half of abundance (46-51%, 95% confidence intervals of the mean effect size) and one-third of species diversity (27-33%) compared with reference values (Fig. 2a), over 22 and 16 years (average time since recovery started), respectively, following a disturbance. This pattern was markedly consistent across ecosystem categories, which did not show strong moderating effects on our models except for the abundance debt (Supplementary Table 2). However, we did find strong moderating effects in the disturbance categories studied (Fig. 2b). These results were not affected by the organism type ( Supplementary Figs 3 and 4).

Discussion
The consistent decrease in diversity and abundance found in recovering ecosystems may, at first glance, contrast with other studies showing that a-diversity does not change through time 14,15 . However, our recovery metric 'diversity' includes other diversity measurements that account for differences in abundance, which could be responsible for this contrast. Nonetheless, our results agree with the worst scenarios estimated for the effects of land-use change on local species richness of plants and animals 11 , and with the reanalysis of references 14,15 , showing that spatial and temporal biases in these meta-analysis do not support a no net change of a-diversity 16 . This highlights that species assemblages could be more resilient to anthropogenic disturbance than populations, even when most individuals are lost.
Although nitrogen recovery debts could be expected to be lower than carbon debts because of faster turnover rates of nitrogen 10,17 , our results suggest similar impacts of anthropogenic disturbances in the cycling of both elements. This adds evidence to other large-scale recovery estimations that found similar recovery patterns for the cycling of carbon and nitrogen 18,19 . Our results also suggests that mining and water pollution, caused by agriculture and urban uses, could be not only major drivers of biodiversity, and ecosystem function and service loss 20,21 , but also major drivers preventing their recovery. The fact that hurricanes were responsible for the lowest recovery debts suggests that the negative effects of anthropogenic disturbances could cause more pervasive damage than some natural disturbances.  The light shading represents the total amount that an indicator of ecosystem integrity (outcome measure, for example, biodiversity or an ecosystem function) is reduced during recovery after a disturbance ceases, that is, the recovery debt. The dark shading represents our estimation of the recovery debt between the time when the measurement of the outcome measure started (Y s , t s ) and when the measurement ended (Y e , t e ). The dashed line (Y r ) represents the reference goal value existing in the pre-disturbance state or in another ecosystem with similar conditions that remained 'undisturbed'.
Even under the assumption that ecosystems will eventually recover to their 'reference' values at longer time scales than are included in this study, our results reveal a consistent pattern: the interim per annum debt of abundance, diversity, and carbon and nitrogen cycling of degraded ecosystems across the globe is pervasive and continues for decades or more. Our findings support studies showing that complete recovery may not be achievable during decades or more 10,22,23 and similar outcomes might occur globally across multiple disturbances. Beyond previous estimates of the effects of disturbance on biodiversity loss and its time lags 11,13 , these findings show that during recovery ecosystems worldwide have less plants and animals, and lower biodiversity and functions compared with undisturbed systems. In particular, recovering ecosystems may not only have lower diverse than undisturbed ones, but also may be much less populated.
These results suggests caution in pursuing ecosystem management strategies that exclusively rely on restoration or recovery to reverse biodiversity and functional loss [24][25][26][27] . This is particularly relevant in biodiversity offsetting strategies that allow ecosystem degradation if compensated through eventual restoration [28][29][30] . Given the lack of complete recovery, any further degradation, even if compensated by restoration, would increase the overall recovery debt of ecosystems. This would also suggest reconsidering restoration policies that attempt to fulfill 'no net loss' principles by simply increasing the mitigating or offset ratio so that more area of less-functional ecosystems are created 30 . If the restoration debt is large and sustained over several decades, then increasing the quantity of less-functional ecosystems is poor compensation for the overall intervening loss in ecosystem biodiversity and functions. Under such circumstances, ecological restoration and offsetting are inadequate alternatives to ecosystem protection.

Methods
Database construction. The database is the result of merging two previously published meta-analytical databases 8 Supplementary Fig. 2 for PRISMA flowchart 32 ). From these studies, we selected those that (i) were actually related to ecosystem recovery, (ii) had at least three measurements of recovery in time, (iii) had a clear reference system (either in the pre-disturbance state or an 'undisturbed' ecosystem with similar environmental conditions), (iv) were related to any of the nine disturbance categories, (v) reported time since recovery started and (vi) included measures of organism abundance, species diversity and cycling of carbon and nitrogen. We only considered these five recovery metrics to reduce the inherent heterogeneity of the database and provide robust results, particularly in the case of biogeochemical functions.
Our selection yielded 278 primary studies. From these studies, we extracted 3,468 comparisons of measurements of recovery between reference and recovering ecosystems from tables, figures and text of the paper. Outcome measures extracted from the selected studies were already averaged across several sampling plots in most cases. We used the free software DataThief III 33 to extract data from the figures. Following the same selection criteria, we added 253 outcome measures from the database of Rey Benayas et al. 8 and 95 from the database of Meli et al. 31 totaling 3,816 outcome measures. We included each outcome measure separately, instead of averaging them per study, because we assumed independent responses of each parameter to the recovery process.
To ensure the quality of the data in the new data set, a protocol for data extraction was created and each person who entered data was trained with three manuscripts, to ensure accurate numbers were entered and accurate categorizations were made using the same form. H.P.J. met with data enterers bi-weekly throughout the data collection process to answer questions about conflictive data, data entry selection and spot-checked data entered throughout the process to ensure accuracy. At these meetings, data enterers had the opportunity to raise ambiguities or other issues found during the extraction process and any disagreements were resolved by consensus. Lastly, H.C.J., P.C.J. and D.M.M. checked each category assigned per study before the data were analysed, including the data sets from Rey Benayas et al. 8 and Meli et al. 31 .
Studies used field-based measurements to assess ecosystem recovery of various outcome measures after disturbances. The outcome measures related to organism abundance included measurements of density, biomass, cover and basal area of trees, shrubs, grasses and algae, and measurements of density of birds, fish and invertebrates. The outcome measures related to diversity included mainly measurements of species richness and diversity indexes such as Shannon, Simpson and evenness indexes. Biogeochemical outcome measures related to the cycling of carbon and nitrogen contain both pools and fluxes of these elements in soil, litter and the water column. To test for potential differences between different kinds of measures within our metrics, we have estimated average effects sizes for subcategories within the metrics 'diversity', 'carbon cycling' and 'nitrogen cycling'. In the metric diversity, we defined subcategories 'species richness' and 'diversity indexes', this last one including Simpson, Shannon and evenness indexes. In the metrics carbon cycling and nitrogen cycling, we compared subcategories 'pools' and 'fluxes'. The subcategory pools (n ¼ 414 for carbon and n ¼ 212 for nitrogen) mostly included concentration of carbon or organic matter in soils or litter measured in weight units per volume units or weight units per area units. Fluxes (n ¼ 53 for carbon and n ¼ 38 for nitrogen) measured respiration, mineralization, accumulation, immobilization or decomposition rates in weight units of carbon or nitrogen per weight unit of soil or litter and time.
Species richness and diversity indexes had a marginal difference in their confidence intervals, richness 22.6-28.6% and diversity 28.8-34.6%. Even smaller differences were found between the pool and stock subcategories of carbon, stock 32.2-42.1% and pulse 37.4-47.4%, and nitrogen, stock 32.1-41.6% and pulse 38.9-48.8%. The largely skewed sample sizes between all subcategories did not allow to perform reliable Wald's tests. Although Mann-Whitney tests are not best adapted to test for significant differences in meta-analytic data, we found nonsignificant (P40.1) differences in any the subcategory tests performed. These marginal differences in the average effect sizes of the selected subcategories suggested that no major differences should be expected in the behaviour of each subcategory within each metric. Splitting these metrics into subcategories involved having substantially less robustness in the main analysis that prevented having reliable comparisons in most of the categories within the moderators ecosystem type and degrading factor. Thus, metrics were maintained undivided.
For each outcome measure, we also collected data on the climatic region according to the Köppen-Geiger climate classification system 34 , number of sites undergoing recovery, number of reference sites, area of the study site, ecosystem category (forest, grassland, wetlands, river, lake or marine), disturbance category (one of the nine factors used in the search or multiple when more than one category was reported), disturbance duration and time since recovery started. Except in studies monitoring land cover change (n ¼ 3), the size of these plots ranged from o1 m 2 to a few hectares. Even though the area of the study site was only reported in a limited number of studies (Supplementary Table 1), we collected these data to approximate the spatial representation of our results.
Regarding ecosystem category, forests included all ecosystems where trees were dominant, wetlands included both freshwater and coastal aquatic ecosystems according to the Ramsar Convention definition 35 , grasslands included ecosystems where grasses and forbs were dominant and marine ecosystems included benthic and pelagic ecosystems from the shoreline to 4100 m deep. The number of sites undergoing recovery included the number of plots being measured but not replicates within plots. In the case of chronosequences, we only recorded data for the start and end time points. The total area resulted from adding the areas of all the study sites that were reported and thus our estimated total accumulated study area is a conservative estimate. As previous studies have reported that restoration approach (that is, passive versus active restoration) does not generate significantly different responses in wetlands over the long term 36 , nor in other ecosystem types 22 , we have excluded this factor. The disturbance duration was reported in 217 studies and ranged from o1 day to 379 years (mean ± s.e., 29 ± 3.1 years; median ¼ 6). The time since restoration started ranged from o1 day to 380 years (mean±s.e., 13±1.9 years; median ¼ 9).
Weighting. As is commonplace with ecological meta-analyses 8,10,37,38 , the data necessary to determine variance with any confidence were not available in the majority (79% in our case) of outcome measures. In addition, meta-analysis theory suggests that when among-study variation is much higher than within-study variation, parameter estimates from random-effects models are nearly the same as those obtained with unweighted models [37][38][39][40] . Nevertheless, unweighted models may yield confidence intervals that are too narrow, as they do not account for the within-and among-study variation components that are accounted for in randomeffects models. In our meta-analysis, we used the subset of studies reporting variances (or enough information to calculate them) and computed the I 2 index 41 separately for each outcome variable. The values of I 2 were over 90% (range 93.74-99.64%), suggesting that among-study heterogeneity accounted for most variation and that random-effects weights across effect sizes would be expected to be very similar. Then, we used this subset of the database to estimate the average within-study variances as an arithmetic mean of the available within-study variances for each outcome variable and used the obtained values as approximate within-study variances for the remaining effect sizes, which had that information missing. This strategy allowed us to fit random-effects meta-analytic models for each outcome category.
Quantification of the recovery debt. Analytically, the recovery debt for a recovery process that takes place over a period of time T (usually denoted in years) can be calculated using X s (the value of the relevant ecosystem metric at the start of the recovery phase, at time t ¼ 0), X e (the value of the same metric at the end of the recovery period after a finite period of time, t ¼ T) and X r (the reference value of the same metric in a reference system, either in the pre-disturbance state or in an 'undisturbed' equivalent system; Supplementary Fig. 5). The transition between X s and X e is unknown but likely to be nonlinear 42 , which was approximated using an exponential function, f(x) ¼ e rt , where r is constant and t is the time between X s and any point between X s and X e . We estimated the recovery debt for each of the selected outcome measures using both an exponential and a linear approximation and found small differences between the resulting recovery debt estimations ( Supplementary Fig. 6), which led to similar conclusions.
There are five scenarios in which the recovery debt can be calculated ( Supplementary Fig. 3). The value of the recovery debt is the area existing between the X r (reference value of the outcome measure) line and the line connecting X s (starting value) and X e (end value). According to Supplementary Fig. 5, in scenarios a (n ¼ 1,993) and c (n ¼ 446), the recovery debt area during the time period (0, T) is X r T-AUC (area under the curve, green shading). There were many cases where X s and X e were higher than X r or where X s 4 X r 4 X e , represented in scenarios b (n ¼ 424) and d (n ¼ 953). Starting values exceeding reference values is commonly found for abundance measurements in early recovery stages and also for nitrogen concentration in aquatic ecosystems undergoing eutrophication. In these cases, we assumed that response values above the reference value represent negative effects, and thus X s and X e were inverse-transformed using the formula Z s;e ¼ X r Xr Xs;e . In the particular case where X r ¼ 0, scenario e (n ¼ 236), we used the approximation Z s;e ¼ X 2 r Xs;e . This allowed us to compare those cases with the rest of the database and to set a realistic recovery threshold of 100%.
The AUC was calculated as X s e rT ¼ X e , it follows that lne rT ¼ rT ¼ ln Xe Xs and therefore r ¼ 1 T lnð Xe Xs Þ. The AUC is then AUC ¼ R T 0 e rt dt, where r is defined above. After integration, AUC ¼ 1 r X s e rT À 1 r X s ¼ 1 r X e À X s ½ . It follows that recovery debt (RD) that occurs over the time period (0, T) is: It is noteworthy that as T varies for each outcome measure, it is preferable to express recovery debt in per annum terms. That is, recovery debt per annum is: In scenarios c and d, a negative exponential function f x ð Þ ¼ e À rt is assumed and used to estimate the recovery debt. In scenario e, RDt ¼ AUC T . Given the heterogeneity of the outcome measure measured, we homogenized the values of RDt to compare them. We did so by estimating the recovery debt ratio as a percentage RDr % ð Þ ¼ 100Ã RDt AbsðXrÞ , where Abs(X r ) is the absolute value of X r . In a some cases (n ¼ 250), the value of RDt4X r and, following the same principle used to estimate Z s,e , the recovery debt value was estimated as RDr % ð Þ ¼ 100Ã AbsðXrÞ RDt : The same approach was used to estimate the recovery debt value with the linear approximation. To estimate the area representing the recovery debt value, we used a linear function that merges any two points with positive slope. After calculation, the resulting formula is RDt ¼ X r À 0:5Ã X e þ X s ð Þ . We also used a linear function that merges any two points with negative slope and after calculation the resulting formula is RDt ¼ X r À 1:5ÃX e þ 0:5ÃX s . Finally, when X r ¼ 0, we approximated the recovery debt ratio as RDr % ð Þ ¼ 100Ã 1 À RDt ð Þ . The presence of zero values in the outcome measures could produce abnormally high or low values of our recovery debt estimations 43 . This issue occurred in two situations: (i) when X r ¼ 0 (scenario e) and (ii) when X s or X e ¼ 0 (n ¼ 628). To select the best approach to tackle this issue, we tested nine different strategies (Supplementary Table 3) to calculate the r parameter required to estimate the recovery debt. From them, six included the use of a constant value added to the numerator and denominator (that is, 0.01, 0.05, 0.1, 0.5 and 1) of the formula used to calculate r and four included the use of an amount specific for each outcome measure. These specific amounts were (i) an amount of the same order of magnitude than X s and X e located at the beginning of the order of magnitude (for example, 0.1, 1 and 10), (ii) the same than (i) but with the median magnitude (for example, 0.5, 5 and 50), (iii) an amount one order of magnitude larger than X s and X e located at the beginning of the order of magnitude and (iv) the same as (iii) but with the median magnitude. For example, if X s ¼ 0.81, the four amounts were 0.1, 0.5, 1 and 5, respectively. We compared the data distribution (median and 95% confidence interval) of each strategy with the rest of the database without the zero values using Mann-Whitney rank sum tests. Only the approach using an amount of the same order of magnitude than X s and X e at the median magnitude had a nonsignificant data distribution (P4 0.05) different from the rest of the database and thus this was the strategy we used in the two situations where we needed to address zero values. This strategy was not used in the rest of the database not affected by zero values. To further ensure that there were no differences between this approach and an approach that simply excludes all zero values, we compared the recovery debt excluding and including outcome measures with zero values and we did not find qualitative differences that could lead to different conclusions ( Supplementary Fig. 7).
Statistical approach. We ran sensitivity analyses using generalized linear models with a variety of probability distributions (that is, normal, log-normal and Gamma), link functions (that is, identity and log) and rescaling the data to determine the best modelling approach for this analysis. Through all of the models, the general mixed model with normal distribution and unscaled data was the most parsimonious and the best fit to the data in terms of Akaike information criteria (AIC) and likelihood ratio tests, and thus we continued to use a meta-analytic mixed model approach based on the normal distribution. We used the multivariate, mixed model function rma.mv() from the metafor package 44 to construct threelevel meta-analytic models in R 3.0.1 (ref. 45). We used three-level models, because our units of analysis (recovery debt) were clustered within effect sizes and those effect sizes were clustered within studies.
Following Mergensen et al. 37 , 'homogeneity tests are usually not undertaken and are not meaningful, in cases where a random-effects model has been used for conceptual reasons and/or because the meta-analyst recognizes in advance that there is substantial between-study variation'. Therefore, we did not carry out a heterogeneity test, although we estimated the I 2 index for each outcome category and the resulting values show evidence of substantial heterogeneity between study results, which is accounted in our models as explained in the 'Weighting' section. We divided the data into four subsets based on the type of metric that was measured: abundance, diversity, carbon cycling and nitrogen cycling. We investigated the effect of moderators on our models with a three-step approach 44 . We first fit a three-level meta-analytic model without moderators. Then, we tested the significance of the moderators using the omnibus test of moderators (Q M test). Third, we fit a model without intercept to get the effect size estimates and confidence intervals for each category of the moderator variable. We estimated the overall effect sizes and confidence intervals of recovery debt in each subset without any moderating effects from the null models.
Data availability. Data including the database used and the codes generated in R are provided in the Dryad Digital Repository, doi:10.5061/dryad.t5c97.