Climate Stability Index maps, a global high resolution cartography of climate stability from Pliocene to 2100

Climate changes are top biodiversity shapers, both during the past and future. Mapping the most climatic stable and unstable zones on Earth could improve our understanding of biodiversity distribution and evolution. Here, we present a set of maps based on a global scale, high resolution (ca. 5 km) new Climate Stability Index (CSI). The CSI considers bioclimatic variables for two different time ranges: (1) from Pliocene (3.3 Ma) to the present (CSI-past map set), using 12 time periods of PaleoClim representing warm and cold cycles; and (2) from present to the year 2100 (CSI-future), using nine general circulation models of climate change of four periods available from WorldClim. We calculated standard deviation of the variables and selected an uncorrelated set for summing, normalizing and obtaining the CSI maps. Our approach is useful for fields such as biogeography, earth sciences, agriculture, or sociology. However, CSI is an index that can be re-calculated according to particular criteria and objectives (e.g. temperature variables); maps are, therefore, customizable to every user.

www.nature.com/scientificdata www.nature.com/scientificdata/ In spite of the limitation of using only 12 time periods as a basis for inferring the long-term climatic stability (with only three older to 0.12 Ma), the much wider time interval, the variety of measures (means, peaks, and intra-year variability) and the finer resolution (ca. 5 km) makes our proposal a very versatile tool with a wide array of applications. With both CSI-past and CSI-future we offer maps of climatic stability adhering to FAIR principles 19 , adaptable to every user and circumstance: they are very easy to use and the variables to estimate the climate stability can be selected at the user's discretion.

Methods
A workflow for the calculation of CSI is presented in Fig. 1c. For all the analyses, we used the R v. 4.0.3 software environment 20 implemented in RStudio v. 1.4.1103. The scripts used for each methodological step are available at the Figshare repository 21 . After data download from primary sources (PaleoClim and WorldClim), specifically for the CSI-future map set we performed an initial step aimed to obtain individual bioclimatic variables for each future time period for the four SSPs (Fig. 1b). To achieve this, the median values of nine GCMs were calculated in functions compiled in raster R package 22 for each individual bioclimatic variable (see a few exceptions of number of GCMs used in Table 2).
The standard deviation (SD) was estimated as a measure of the amount of variation or dispersion along time series, from which the resulting output maps showed the places where climate conditions remained constant or variable across the temporal periods considered (Fig. 1a,b). The SD, as a way to identify stable/unstable climatic www.nature.com/scientificdata www.nature.com/scientificdata/ areas, was previously used in other climatic or evolutionary studies 4,14 . To compute the SD output rasters, we applied the mosaic function setting "fun = sd" from raster R package, calculating the SD for each pixel in the 12 time period rasters for CSI-past and five times for CSI-future, independently for each variable. The mosaic function was also used for the range calculation, with "fun = min" and "fun = max" to obtain the minimum and maximum values of input rasters, respectively, with a further step for subtracting maximum to minimum values.
Specifically, for CSI-past, as it includes several time periods with sea-level dropping below the present level (T1, T3, T5, T6, T7, T8, T9; Fig. 1a), we applied a mask of the current land surface, i.e. taking the T12 (Anthropocene) as a template. With this additional step, we were able to remove those pixels (grid cells) currently under the sea but that were once emerged. Most of these pixels, however, were only emerged during the LGM (ca. 21 ka), thus having values for bioclimatic variables for just a single time period (instead of the 12 routinely used for the variability estimation). The inclusion of these areas would result in highly climatically 2021-2040 GCM SSP1-2.6 SSP2-4.5 SSP3-7.0 SSP5-8.5 www.nature.com/scientificdata www.nature.com/scientificdata/ stable regions (low SD values; Supplementary Fig. 1), but this would be an obviously biased result. In contrast, we did not remove those areas affected by the sea-level rising periods, as only three periods contained "NoData" values (T2, T4, T10; Fig. 1a). However, to take this fact into consideration, we created a raster file in which these areas submerged during warm periods are indicated (see Supplementary Fig. 1). Finally, for both CSI-past and CSI-future, the resulting SD values were normalized to values between 0 and 1, with 0 representing completely stable areas and 1 the most unstable ones.
The next step was focused on the selection of a relatively uncorrelated set of variables for each map set. We used the removeCollinearity function from virtualspecies R package 23 that estimates the correlation value among pairs of variables from a given number of random sample points (10,000 in present case) according to a given method (Pearson for the present case) and a threshold of statistic selected (r > 0.8 as a cut-off value). The function removeCollinearity returns a list of uncorrelated variables according to the settings specified, randomly selecting just one variable from groups of correlated ones (see Table 1 for a complete list of variables used for each map set). As we compiled estimates of variability independently for each variable and map set (e.g. SD bio1 past, SD bio2 past, etc.), each user can define his own CSI, selecting the more interesting variables according to the case of study.
The final CSI maps were obtained by summing the SD values of the variables selected and the subsequent outputs normalized (0 to 1) (Figs. [2][3][4]. Histogram plots were represented with ggplot2 R package 24 and maps were exported with ArcGIS v.10.2.2 (Esri, Redlands, California, USA 2014). The histograms were computed for these final CSI maps, which represent the frequency and distribution of CSI values. We presented the final CSI maps with two different colour ramp schemes with ArcGIS. The first consisted of defining equal interval breaks from 0 to 1. The second was based on defining 32 categories with different value breaks for past and future map sets according to the value frequency shown by the histogram plot, i.e. the category with the highest CSI values (no. 32) was 0.71-1 in the past map set and 0.356-1 in the future map set.  CSI using SD r > 0.9 vs. r > 0.8 (Past and Future datasets) CSI using SD r > 0.9 vs. r > 0.7 (Past and Future datasets) CSI using SD r > 0.8 vs. r > 0.7 (Past and Future datasets)

Fig. 5
Detailed workflow of analyses employed to calculate and test the robustness of the Climate Stability Index (CSI) for past and future map sets. Final maps presented were obtained with conditions marked with wide-lined frame in the fifth step, which are: SD applying r > 0.8 threshold for variable correlation in past map set, and SD from median of nine GCM and applying r > 0.8 threshold for variable correlation in future map sets. Note that in the third step further analyses are repeated for the range statistic in the case of past map set and for both future map sets (mean and median).

Data records
Our data records are available through Figshare 21 in format of map raster layers (.TIF format).
Two sets of data records are stored: (1) SD-based maps of individual bioclimatic variables, which contain a set of raster layers (one for each variable): 14 for the case of the past map set (bio2, bio3, bio5, bio6, bio7 are missing) and 19 for the future map sets based on median calculations. These layers contain individual measures of the SD of each variable, which could be independently used according to the user's study purpose or combined according to the user's preferences to generate a customized CSI.   www.nature.com/scientificdata www.nature.com/scientificdata/ (2) the five CSI-based maps presented in this study, corresponding to CSI-past, CSI-future SSP1-2.6, CSI-future SSP2-4.5, CSI-future SSP3-7.0, and CSI-future SSP5-8.5.
Data files share a common naming pattern: For Past map set from individual variables and CSI, respectively: "sd_past_<bioclimatic variable>.tif " "csi_past.tif " For Future map sets from individual variables and CSI, respectively: "sd_future_<SSP scenario>_<bioclimatic variable>.tif " "csi_future_<SSP scenario>.tif " In addition, for the CSI-past map set, we include a map showing the areas affected by sea-level rising periods (intergl_affected.tif) and the raster map used to remove regions with landmasses currently under the sea but that were once emerged (lgm_del_mask.tif; see Supplementary Fig. 1).

technical Validation
To evaluate the robustness of the CSI index, we compared its performance by varying the statistics used for its calculation (see detailed workflow in Fig. 5). For the past map set, the CSI was computed independently by means of SD and range. For the future map sets, the CSI was computed using the mean and the median of the nine GCMs to obtain individual future rasters for each time and SSPs. In parallel, the sensitivity derived from selecting a more restrictive variable correlation threshold (r > 0.7), an intermediate (r > 0.8), or a less restrictive cut-off value (r > 0.9) was also checked. In Supplementary Tables 1 and 2 the variables selected for each r threshold tested are specified. All pairs of CSI rasters were compared through Pearson's correlation analyses, which www.nature.com/scientificdata www.nature.com/scientificdata/ showed a very high r value, ranging from 0.93 to 1.00. All r values in Table 3 and all correlation scatterplots are available in Supplementary Fig. 2 and some examples are included in Fig. 6. As a conclusion, methodological choices resulted in a non-remarkable impact of metric selected (mean vs. median, SD vs. range) or correlation threshold set (r > 0.7, 0.8, 0.9). Our choice was using the SD, median, and threshold r > 0.8, to draw the CSI-based maps.
Before concluding, however, it should be acknowledged that the fact of not using the same approaches for past and future climate stability assessments introduces a degree of uncertainty when considered as a whole. To obtain CSI-past maps we are using PaleoClim, which provide simulations from single models for different time periods, whereas for CSI-future maps we are employing an ensemble of nine model simulations (and, thus, there is regional variability in stability).

Code availability
Commented R codes used to generate CSI are available at Figshare 21 .