Opportunities for biodiversity gains under the world's largest reforestation programme

Reforestation is a critical means of addressing the environmental and social problems of deforestation. China's Grain-for-Green Program (GFGP) is the world's largest reforestation scheme. Here we provide the first nationwide assessment of the tree composition of GFGP forests and the first combined ecological and economic study aimed at understanding GFGP's biodiversity implications. Across China, GFGP forests are overwhelmingly monocultures or compositionally simple mixed forests. Focusing on birds and bees in Sichuan Province, we find that GFGP reforestation results in modest gains (via mixed forest) and losses (via monocultures) of bird diversity, along with major losses of bee diversity. Moreover, all current modes of GFGP reforestation fall short of restoring biodiversity to levels approximating native forests. However, even within existing modes of reforestation, GFGP can achieve greater biodiversity gains by promoting mixed forests over monocultures; doing so is unlikely to entail major opportunity costs or pose unforeseen economic risks to households.

Note: † -Sample size is in terms of the numbers of point counts (birds), trapping plots (bees) and mixed-species flocks recorded (birds), by land-cover type and season. ‡ -Based on Tietjen-Moore test (P < 0.05), we removed two cropland plots in low elevation, as well as one native forest plot and two bamboo plots in mid-high elevation as outliers. § -Numbers before and after the slash refer to the numbers of point counts and mixed-species flocks observed, respectively. Note: Household interview sample size is in terms of the numbers of estimates obtained for production profit, labor intensity, and percentage of household income coming from forest production. Note: † -For a given species/genus, the abundance analysis compares its abundance in each type of GFGP forest against that in each type of baseline land cover. "Lower" and "higher" abundances therefore refer to lower and higher abundance in GFGP forest than in baseline land cover, respectively.

Compilation of peer-reviewed literature on GFGP forest type.
To search the English literature on Web of Science (www.webofknowledge.com), we checked all publications cited in and citing four "anchor" publications on GFGP: two generic review articles 2,3 , one meta-analysis on soil carbon levels 4 , and one generic book 5 . We additionally searched for publications about biodiversity under GFGP, using the search terms "China", "grain for green", and "biodiversity".
Importantly, we used all six variants of GFGP's English translation that have occurred in the literature, including "grain for green", "grain-for-green", "grain to green" (e.g. 2 ), "grain-togreen", "sloping land conversion" (e.g. 6 ), and "returning farmland to forest program" (e.g. 7 ). Biodiversity sampling. All bird point count stations and bee trapping plots were ≥ 50 m from the edge of the focal land-cover type. We conducted all surveys at lower elevations before moving to higher elevations with the exception of a small subset of breeding season point counts and ~25% of the bee trapping plots at low elevations during the avian breeding season, which we surveyed toward the latter half of the field season. We determined the required effort level of point count surveys for each land-cover type based on the leveling of species accumulation curves. Pan trapping efforts were limited by time and personnel; we had a minimum of ten trapping plots for each land-cover type. For both birds and bees, survey effort was higher for the more biodiverse land-cover types (according to our field experience).
For bird point count surveys, we used a 150-m radius, avoiding double-counts to the best of our ability. We divided each 12-minute survey into four 3-min subintervals and recorded individual-/group-level detections (using visual and auditory cues) for each subinterval 8  the breeding and nonbreeding season surveys. We minimized observer bias with regard to landcover type by ensuring that each land-cover type was covered by all observers, and that observer identity was considered in subsequent analysis (see "Statistical analysis" below). Supplementary  Fig. 4). We used 96 ml plastic pans (3.25-ounce translucent plastic soufflé portion cups; www.cuptainers.com) that we fluorescent-dyed with white, red, yellow, blue and purple, respectively. We set up all pans on 1-m poles to improve visibility, filled each pan two thirds full with 2% scentless liquid detergent as a surfactant 9 , and operated them for 24 hours on days without rain. We collected all captured individuals (including bees and other insects) and stored them in 99.99% ethanol at ≤ 4°C before DNA extraction. All samples were stored at -20 °C within five days of field collection until the time of lab work. Seven out of the total 74 plots were changed to a rectangular shape of equivalent area due to terrain constraints.
Supplementary accessed July 30 th 2015) to filter out non-bee sequences, and obtained 546 bee sequences.
For all mtCOI Sanger sequences, we conducted single-threshold GMYC species delineation 12 . We constructed an ultrametric gene tree under a relaxed log-normal molecular clock with BEAST 2.3.0, generated the BEAST input file with BEAUti 2.3.0 13 and used the GTR substitution model, which we selected using jModelTest2 2.1.7 14,15 . We set mean substitution rate to one, and estimated the proportion invariant and base frequencies. We chose a birth-death model as a single coalescent cluster constituting the GMYC null model. We set the AG transition rate prior to a gamma distribution with Alpha = 2 and Beta = 0.5 and all the other transition-rate parameters to gamma distribution with Alpha = 2 and Beta = 0.25. We set the ucldMean prior to 0.0176, which is the mean substitution rate for arthropods. We defined three Crabronidae COI sequences as the outgroup. We ran the MCMC chains for 20 million generations and sampled every 1000 generations. We visualized run convergence using Tracer version 1.6.0 16 , and discarded the first 10% of the trees as burn-in. We ran TreeAnnotator 1.8.2 16 to produce a single tree for GMYC analysis using maximum clade credibility tree, with the Node Heights option set to Keep Target Heights. We applied single-threshold GMYC models to DNA barcodes using the package splits (1.0-19 17 ; in R 3.2.0 18 ; see enclosed R script: bees_GMYC_BEAST2.R).

Household interview.
We obtained ≥50 interviews for each GFGP forest type, except for eucalyptus monoculture, for which we obtained 30 interviews (total number of households included was 166). Supplementary Table 4 lists the number of households interviewed for each land-cover type whose data contributed to our economic analyses. In each household, we first asked what forest types the household managed. For each forest type, we then asked about the management/production costs per unit area per production cycle, breaking the costs down into discrete processes, including the establishment, maintenance and harvesting of forests, and to discrete components including costs of seedlings, chemicals, hired and self-labor, equipment and transportation. We also asked respondents about the yield and market price for forest products per unit area per production cycle, covering both timber and non-timber products. These questions allowed us to calculate the yield and profit of different GFGP forest types. Importantly, because of China's registration system that ties parcels of rural land to particular households, respondents generally know the sizes of their land-holdings with considerable accuracy. While government subsidies on seedlings and fertilizer could result in underestimations of production costs, there is no reason to expect this underestimation to systematically vary with GFGP forest type in a way that would bias our conclusions on profit from forest products. We additionally asked respondents to estimate what percentage of their household income came from forest production.

Statistical analysis.
Considering the likely inadequate sampling of bees, we first eliminated plots identified as outliers of anomalous trapping patterns. To identify outlier plots, we tallied the total number of individuals trapped from each survey plot, and used the Tietjen-Moore test for outliers 19 to identify plots with excessively low or high number of captures (P < 0.05). This resulted in the removal of five survey plots (Supplementary Table 3). We stratified biodiversity analysis into three elevation categories using the natural elevation range of the three monoculture forests (eucalyptus, bamboo, and Japanese cedar; Supplementary Note 1): eucalyptus defining low elevation, bamboo mid elevation, and Japanese cedar high elevation. For the analysis of bees, we combined data for mid and high elevations because of limited sample size. We assigned survey data for each monoculture to their respective elevation bands, and survey data for cropland, native forest and mixed GFGP forest, three land-cover types that spanned more than one elevation bands, to elevation bands according to the threshold elevation values provided in Supplementary Table 2; a portion of these data was used in more than one elevation bands because of the overlap between bands (Supplementary Table 2).
For species richness analysis using coverage-/sample-based extrapolation, we extrapolated to two times the minimum sample size or the largest sample size, whichever was greater 20 (Supplementary Table 8 forest-dependent if it is predominantly associated with forested habitat, i.e. its association with forested habitat was cited by the sources as being stronger than "occasional" or "sometimes". Similarly, a species was classified as open-country if its association with open habitat was cited by sources as being stronger than "occasional" or "sometimes". We classified the remaining species in between these two association categories into the generalist guild (Supplementary Data 6).
For N-mixture modeling of bird species abundance, we treated only species with ≥ 10 total detections of individuals or groups as standalone species, and collapsed the remaining species into their respective genera only when the genus satisfied the 10-detection requirement.
Abundance modeling for species living in groups estimated the abundance of groups rather than individuals. The rationale for using this 10-detection requirement lay in the fact that N-mixture models entail modeling both the "true" abundance and the detection probability of each species, involving a maximum of ten covariates in the most complete models (i.e. global models; see below). The convergence of these N-mixture models, particularly when involving a large number of covariates, depends upon a reasonable number of detections to provide sufficient information to tease apart the contribution of true abundance and detection to the observed abundance of the species. Because we are unaware of well-established guidelines for "a reasonable number of detections", we chose a minimum of ten detections; it was found to work well for the purposes of our models. . We assumed closed population and constant detection probability during each point count 8 . For each species/taxon, we generated a full set of sub-models from the global model (i.e. using all combinations of candidate covariates), and ranked these models based on the AIC (Akaike's information criterion 25 ) score. For the model with the lowest AIC score, we checked its estimates to gauge whether they were generally consistent with our biological knowledge of the species/genus. If they were, we would select the model with the lowest AIC score as the best model; otherwise, we would move to the model with the next lowest AIC score and conduct such check again, until we identified the model with the lowest AIC score that produced estimates consistent with our biological knowledge, as the best model. The reason for including our biological knowledge of the species/genus was that in some situations, the particular data structure could cause the models to produce unrealistic abundance and detection probability estimates even if they converged (e.g. exceedingly high densities of populations).
Our biological knowledge of the species'/genera's likely abundance from our field experiences therefore serves as a safeguard against such unrealistic estimates. For GLMs of bee species abundance, we followed the same 10-capture criterion in identifying standalone bee species, and collapsed the rest into one taxon. All GLM models included land-cover type as the only covariate and used a log link and Poisson error distribution. With known A pc and A f , we were thus able to mathematically express the expected number of individuals/groups detected during point count and flock observation using parameters pertaining to population density and detection probability, and estimate these parameters by fitting the expressions to observed data using the maximum likelihood approach.
For the analysis of household interview data, we first calculated the annual per ha sale and cost (in US$ and labor days) for each household, GFGP forest type and tree species, based on forest product yield and unit price, length of production cycle, and various aspects of production cost during initial forest establishment and subsequent maintenance. We in turn calculated the annual per ha profit by subtracting all available aspects of production cost from the annualized gross rents, including initial cost of forest establishment, annual maintenance cost as directly reported by households, and harvest cost. We applied a discount rate of 5% (r=0.05) to production sale, cost and labor input based on the 2015 one-year lending rate of the People's Bank of China (range 4.35-6% 27 ). For data from households with mixed forests that provided information on more than one species, such calculation involved all species for which data were available through weighted sum based on the production area of each species. We did not use mixed forest-owning households that provided production or labor estimates for only one tree species to estimate the profit or labor intensity of mixed forest.
Among the three main tree species used in GFGP in the study region, bamboo is generally harvested every year, while eucalyptus and Japanese cedar have a harvest (i.e. clear-cut) cycle of around seven years and 20 years, respectively. For households that did not report the number of years it took for bamboo to start producing, we assumed harvest started in the third year after forest establishment and used a 20 year time span of which production happened in 18 years to calculate average annual sale. For bamboo and eucalyptus forests, we calculated the annualized net rent and labor input over the harvest cycles (harvest cycle of eucalyptus was directly reported by respondent households) with the 5% discount rate. For Japanese cedar forest, forest production typically entailed a one-time clear-cut at the end of each production cycle, with 2-3 rounds of selective harvest before the final clear-cut. We assumed that selective harvesting of these forests happened at the mid-point of the reported production cycle. The net rent was therefore a summation of the net present value (NPV) for the clear-cut (full production cycle) and the NPV of the reported selective harvest. As with bamboo and eucalyptus, we applied the discount rate of 5% to calculate annualized net rent and labor input.
We conducted all analyses using multiple linear models. For the self-reported percentage of household income contributed by forest production, visualization of the distribution of the self-reported percentages suggested possible outliers toward the high end of the values. We thus used the Tietjen-Moore test for outliers 19