Introduction

Tree alpha-diversity (here defined as Fisher’s alpha1 of a tree-inventory plot) in Amazonia is influenced by regional and local drivers2,3. Regional drivers include large-scale patterns in rainfall4,5,6, temperature or for instance large-scale gradients in soil fertility5. In contrast, local drivers include local differences in soil type2,7,8, soil hydrology9, flooding10 and small-scale ecological processes, such as disturbance11,12, or frequency-dependent mortality13,14,15. Fisher’s alpha provides information on both species-richness in a sample of known size and the relative abundances of all species in that sample, providing both aspects of biodiversity. Species-richness is an important aspect of biodiversity, however, and often easier to communicate. In this paper we will use both indices.

Forest types in Amazonia are to a large extent driven by soil and hydrology (e.g. groundwater, seasonal flooding or waterlogging)8,9,16 and significant differences in composition and tree alpha-diversity between forest types exist2,3. Terra-firme (see methods section for a definition of all forest types) is the forest type with the highest tree alpha-diversity, while forests on white sand, várzea, igapó and swamps generally have lower tree alpha-diversity2 and higher dominance17, which are inversely related to each other. Lower tree alpha-diversity has been attributed directly to poor nutrient status in white sand forests and flooding and waterlogging in várzea, igapó and swamps. Assuming that poor soils and flooding require traits that by means of a trade-off reduce fitness on terra-firme18,19, it could be argued that in line with the Island theory of Biogeography 20, the relative area of any particular forest type is an important driver of large-scale diversity21 and thus influences local diversity7,17,20,22,23. Following this reasoning, forest types such as white-sand forest and swamp forest have small areas and a fragmented distribution across Amazonia, resulting in relatively small meta-populations and hence a lower potential alpha-diversity (subsequently influenced by local processes). In essence, their diversity is therefore likely strongly influenced by local extinctions and limited migration from a smaller meta-species pool23,24.

In 2000 and 2003 the Amazon Tree Diversity Network (ATDN) published its first maps of tree diversity of Amazonia at 1-degree resolution, based on a dataset of 268 and 425 terra-firme sampling plots, respectively25,26. At the time, plots in other forest types were omitted as they had lower diversity and were small in numbers. In a subsequent attempt, an interpolation of all plots (725) was used. It was assumed that an interpolation of all plot data would create the regional signal, while the residuals from the spatial interpolation could be regarded as the local ecological signal, including forest type and residual error2,3.

After a period of 20 years, sufficient plots are now available to map diversity directly by main forest type. Here, we produce a new tree diversity map of the original forest cover of Amazonia at 0.1-degree resolution, using soil information of all plots and a large-scale soil map27,28. We constructed four spatial models for tree alpha-diversity, one for terra-firme, one for white sand forest, one for floodplain forests (várzea, igapó), and one for swamp forests. Results were subsequently mapped on the major soil types at 0.1-degree resolution. Afterwards, the same approach was used to map tree density and tree species-richness across Amazonia.

Although a considerable number of plots has been added to the general database, we recognize there are several caveats in this attempt. First, it has long been recognised that personal experience and training can significantly affect the identifications (including distinguishing among similar species) of trees and thus the richness of a plot29, and that with a large network of over 230 contributors this might create a bias but one for which we cannot easily correct. Plots with clearly poor identification are, however, not included in the ATDN database. Second, the level of correct identification of specimens in herbaria may differ considerably30. Third, the spatial differences in the number of specimens collected over Amazonia are large31,32,33. Arguably, species identification would be better, and therefore richness higher, all other things being equal, in areas where collecting intensity is high, and large, active herbaria are present (e.g. Manaus, Belem, Cayenne). Finally, time of establishment may also affect plot tree species-richness. Plots of the ATDN were established between 1934 and 2020. The number of known (described) species for Amazonia has increased considerably in that period, especially since the 1940s31. In addition, new taxonomic insights contribute to splitting dominant taxa into separated lineages34,35 or merging rare species into common ones31. It should be expected therefore that, all other things being equal, early plots have fewer species than plots established more recently. The effect of this would depend on the quality of the botanists identifying the species, as they could either force a tree into a known species or keep it as a morpho-species, in which case there would be less effect on tree alpha-diversity and tree species-richness.

We tested potential drivers for the observed patterns in tree species-richness, including climatic variables, soil variables, time of establishment, and intensity density. Our results show that area has a strong effect on tree diversity and richness, as do cumulative water deficit and tree density. Finally, we produce a full list of all plots with metadata, tree density, alpha-diversity, species-richness data, and plot references, and two A3 maps with our main map results

Results

Tree density

Tree density (the number of trees/ha) for the plots ranged from 80–2629 trees/ha (Supplementary Fig. 1a), with an average density of 563 trees/ha based on 1956 plots (after removing outlier plots with density <200 and >900 trees/ha), with a total number of over a million trees. Density differed little between forest types, with slightly higher densities in swamp, white-sand forest, and terra firme forest of the Guyana Shield (Supplementary Fig. 1b). The highest tree densities were found in north-western Amazonia. Lowest tree densities were found in central-south, and eastern Amazonia (Supplementary Fig. 1c, Note that Supplementary Fig. 6c, is in fact the regional tree density). The spatial model explained 43% of the variation in tree density across Amazonia (Supplementary Fig. 1d), of which the residuals had no significant spatial autocorrelation (Supplementary Fig. 1d).

Tree alpha-diversity

Tree alpha-diversity (defined here as Fisher’s alpha) ranged from 0.51 to 257 per plot, with an average of 56. Data showed a strongly skewed distribution (Fig. 1A). Forest type explained a significant amount of variation of Fisher’s alpha (R2 = 23%, p < 0.001, Fig. 1B). The highest tree alpha-diversity was found in the terra-firme forests of the Pebas region (western Amazonia) and French Guiana (Fig. 1C, D). Lowest tree alpha-diversity was found on the sandy soils of Guyana, Upper Rio Negro, southern Amazonia and floodplains along the rivers and swamps. The combined spatial model explained 66% of the variation in tree alpha-diversity across Amazonia (Fig. 1D), and its residuals showed no significant spatial autocorrelation (Moran’s I < 0.001 n.s.). The standard error for Fisher’s alpha was mostly low (Supplementary Fig. 2a) and consistent across regions (Supplementary Fig. 2b) although higher for white sand forest and swamp forests (Supplementary Fig. 2c), resulting in higher standard errors in the white sand areas of the Upper Rio Negro and Guianas. Residuals of the combined spatial model had a mean close to zero and did not differ much between forest type and region, showing no spatial pattern, consistent with a non-significant Moran’s I (Supplementary Fig. 3).

Fig. 1: Tree alpha-diversity (Fisher’s alpha) in Amazonia.
figure 1

A Histogram of tree alpha-diversity in 2046 ATDN plots. Red lines, mean and mean ± 2 sd. B Tree alpha-diversity by major forest type. C Map of tree alpha-diversity across Amazonia. Legend truncated at 0 and mean + 2 standard deviation of the mean. Amazonian Biome limit - red79. D Observed values of tree diversity vs modelled values of tree diversity on the 2046 plots used for mapping. The significance or Moran’s I was tested with the function Moran.I() of ape61. Marker colours: Red: Terra Firme Pebas Formation; Brown: Terra Firme Brazilian Shield; Orange: Terra Firme Guyana Shield; Yellow: White sand forest; Light blue: Varzea; Dark blue: Igapo; Purple: Swamp. Map created with custom R80 script. Base map source (country.shp, rivers.shp): ESRI (http://www.esri.com/data/basemaps, © Esri, DeLorme Publishing Company).

Mapping the logarithm of Fisher’s alpha, to account for its skewed distribution (Supplementary Fig. 4a), did not produce a very different spatial pattern (Supplementary Fig. 4c) but a slightly better model (Supplementary Fig. 4c, R2 = 73%). As Fisher’s alpha is the more commonly used metric, we have kept this version in the main text.

Using only location (not stratified by forest-soil)2,3, provided a map with a comparable overall regional pattern, but with much more average values as nearby low and high diversity plots of different forest types were mixed in the local estimation (Supplementary Fig. 5a). This model explained 45% of local Fisher’s alpha (Supplementary Fig. 5b). Part of the local signal, following2,3, was explained by forest type (r2 = 19%, Supplementary Fig. 5c). Thus, the total explained variation by adding the two models would be 45% + 19% of 100−45% = 55.5%, which was 10% less than the spatial model with forest type included. Regions had no more effect on the residuals of the spatial model (Supplementary Fig. 5d).

Tree species-richness

Tree species-richness, defined as the number of species per ha, ranged from 3 to 357 with an average of 121 species/ha. The data were less skewed than those of Fisher’s alpha (Fig. 2A). Forest type explained a significant amount of variation in species-richness (R2 = 25%, p < 0.001) (Fig. 2B). The highest species-richness was found in the terra-firme forests of the Pebas region (western Amazonia) and French Guiana (Fig. 2C). Lowest species-richness was found on the sandy soils of Guyana, Upper Rio Negro, southern Amazonia, and on floodplains along the rivers and swamps. The combined spatial model explained 71% of the variation in tree species-richness across Amazonia (Fig. 2D), and its residuals showed no significant spatial autocorrelation (Moran’s I < 0.001 n.s.).

Fig. 2: Tree species-richness (species/ha) in Amazonia.
figure 2

A Histogram of tree species-richness in 2046 ATDN plots. B Tree species-richness by major forest type. C Map of tree species-richness across Amazonia. Legend truncated at mean ± 2 standard deviations of the mean. Amazonian Biome limit - red79. D Observed values of tree species-richness vs modelled values of tree species-richness on the 2046 plots used for mapping. The significance or Moran’s I was tested with the function Moran.I() of ape61. Marker colours: Red: Terra Firme Pebas Formation; Brown: Terra Firme Brazilian Shield; Orange: Terra Firme Guyana Shield; Yellow: White sand forest; Light blue: Varzea; Dark blue: Igapo; Purple: Swamp. Map created with custom R80 script. Base map source (country.shp, rivers.shp): ESRI (http://www.esri.com/data/basemaps, © Esri, DeLorme Publishing Company).

The standard error for tree species-richness was mostly low (Supplementary Fig. 6a) and rather constant across regions (Supplementary Fig. 6b) but higher for white sand forest and swamp forests (Supplementary Fig. 6c), resulting in higher standard errors in the white sand areas of the Upper Rio Negro and Guianas (Supplementary Fig. 6d). Residuals of the combined spatial model had a mean of close to zero and did not differ much between forest type and region (Supplementary Fig. 7). Mapping the logarithm of species richness, to account for its slightly skewed distribution (Supplementary Fig. 8), did not produce different results. Species-richness in 500 individuals showed identical patterns as species-richness/ha (Figs. S9–S11).

Mapping the species richness map 2046 times, leaving each plot out once and estimating its value with the map it did not contribute to, provided a final test of our model. If all data is involved the R2 of the observed tree species-richness vs. the predicted tree species-richness dropped from 71 to 65%. This reduction is mainly caused by the plots on white sand (partial R2 dropped from 56 to 16%) and swamp forest (96%–16%—see the close alignment of the swamp plots to the regression line in Figs. 1D, 2D), where sample sizes are much smaller. Because it was not possible to produce a map based on the leave one out principle our final maps are based on a model where all data is used (Figs. 1C, 2C).

The effects of abiotic factors on tree diversity and richness

As most plots were established in terra-firme forest, this forest type was used as an example to study the effects of climate, soil, collecting and year of establishment on tree alpha-diversity and richness across Amazonia. Tree species diversity and richness per ha of terra-firme forest had a clear maximum in central Amazonia, where the interpolation model predicted over 250 species per ha (Fig. 3).

Fig. 3: Tree alpha-diversity and tree species-richness of terra-firme forest in Amazonia.
figure 3

A Map of interpolated tree alpha-diversity (Fisher’s alpha), based on 1441 terra firme plots. B Map of tree species-richness (number of species/ha by plot), based on 1441 terra firme plots. Red polygon: Amazonian Biome limit79. Map created with custom R80 script. Base map source (country.shp, rivers.shp): ESRI (http://www.esri.com/data/basemaps, © Esri, DeLorme Publishing Company).

The spatial model of Fisher’s alpha explained 59.0% of the variation of plot level Fisher’s alpha, while the spatial model for species richness explained 65.5% of the variation of species-richness at plot level. Tree density (at plot level) affected species-richness (Supplementary Fig. 12b R2 = 0.16***). Regional tree density explained tree species-richness/ha even better (Supplementary Fig. 13, R2 = 0.26***). As sample size (N) affects tree species-richness, the regional richness pattern is better viewed with the average regional tree density pattern and for a sample of similar size (n = 500). However, regional tree density explained tree species-richness per 500 individuals less strongly (Supplementary Fig. 14, R2 = 0.16***). Plot tree density only explained 5% of the variation in tree species-richness in a sample of 500 individuals. Supplementary Table 1 shows the results of all models used.

Cumulative water deficit had a negative effect on tree species-richness (Supplementary Fig. 15) with a decrease of 17 species for each 100 mm deficit (r2 = 27%***) and a loss of maximum richness of 25 species per 100 mm deficit. Central Amazonia had more species per ha than expected based on cumulative water deficit alone (Fig S15d). In contrast, southern Amazonia and especially the Guyana Shield had a much lower richness than expected based on cumulative water deficit alone.

Annual rainfall (Bioclim 12) had a positive effect on tree species-richness in terra-firme forest (R2 = 22%***), both for the mean and the upper limit (Supplementary Fig. 16b). An increase of 1000 mm of annual precipitation resulted in an increase of 50 species per ha on average, and 80 species maximum. Residuals suggest that central Amazonia has 70–125 species per ha more than would be expected based on annual rainfall (Supplementary Fig. 16d). Northern Amazonia (defined here as the Guyana Shield) and southern Amazonia have less than 30 species per ha than expected based on annual rainfall.

Soil pH had a small effect on tree species-richness (Supplementary Fig. 17); Sum of bases had no significant effect with quantile regression (Supplementary Fig. 18) and a very small, significant effect with least-squares regression (0.3%, Supplementary Table 1); whether a plot was situated on the Pebas formations or cratons (Guyana or Brazilian Shield areas, Supplementary Fig. 19) also explained little variation (5%).

Collecting intensity explained 13% of species-richness on the plots (Supplementary Fig. 20). Especially in the Manaus area, the residuals of this relationship were very high (Supplementary Fig. 20d).

Most plots (78%) were established after 2000 (Supplementary Fig. 21a, b). Establishment year, however, had no effect on the richness across the full dataset (Fig. S21b), but plots established before 1980 were primarily found in the Guianas and eastern Amazonia (Supplementary Fig. 21c). Only after 1980 the distribution was more evenly spread across Amazonia. There was a very small (but significant) effect on plot tree species-richness per ha for plots established before and after 1980 (Supplementary Fig. 21d).

As cumulative water deficit, regional tree density and collecting intensity all had significant effect on tree species-richness, we combined these factors in models: alone these variables explained the following proportions (Supplementary Table 1): cumulative water deficit 27%, regional tree density (RD) 26% and collecting intensity (CD) 13%; combined cumulative water deficit+D 38%; cumulative water deficit+CI 28%; D + CI 29%. cumulative water deficit+D + CI 38% (Supplementary Fig. 22). Thus, collecting intensity contributed little to a model with two or three variables (Supplementary Table 1). Similar results were obtained by combining cumulative water deficit, tree density and location in Pebas formation (Supplementary Fig. 23). Adding other soil variables to these models contributed nothing to the explained variation.

The combination of cumulative water deficit, regional tree density and temperature seasonality (Bioclim 04) explained 43% of the tree species-richness per ha (Fig. 4). In the predictions of this model, the ‘dry transverse belt’36 across Amazonia was clearly visible (Fig. 4B). Adding collecting intensity to this model did not improve its explained variation (Supplementary Fig. 24). The residuals of this model showed a clear spatial pattern with low to moderate positive residuals in most of Amazonia and one area (upper Rio Negro – Guianas) with high negative residuals (indicated by a red line in Fig. 4D).

Fig. 4: The effect of cumulative water deficit (mm), tree density, and temperature seasonality on tree species-richness.
figure 4

A Tree species-richness observed. B Tree species-richness as predicted by cumulative water deficit, regional tree density, and temperature seasonality. C Model performance, showing predicted and observed tree species-richness. D Residuals of tree species-richness predicted by cumulative water deficit, regional tree density, and temperature seasonality (A, B). All figures based on 1441 terra firme plots. Amazonian Biome limit - red79. Map created with custom R80 script. Base map source (country.shp, rivers.shp): ESRI (http://www.esri.com/data/basemaps, © Esri, DeLorme Publishing Company).

Discussion

Major soil/forest type exerts a significant, strong influence on tree alpha-diversity and tree species-richness in Amazonia (Figs. 1, 2). Stratifying plots into four major soil and forest-type combinations and mapping for each major type separately allowed us to map Amazonian tree alpha-diversity at an very high level of accuracy of 66% and tree species-richness with 71%. The map with 0.1-degree resolution is a step forward compared to previous maps with one degree resolution and no stratification as to soil and forest. Other, global, mapping exercises also included Amazonia but included far less plot data resulting in a more course-grained pattern, despite the higher map resolution37,38. The difference in species richness patterns can be explained in part by our much higher number of plots in Amazonia but also in the way the maps were constructed. Liang et al.38, for example, used unstratified, interpolated data for soil variables. Such interpolations are invariably dominated by the main forest type in a region (i.e. terra firme in Amazonia). Thus, for fertility the very fertile várzea soils are downgraded by the infertile terra firme forests in their vicinity, much as Fisher’s alpha in those areas was overestimated in our previous map (Supplementary Fig. 5). The map average for that map is very close to that of the main forest types (the three terra firme types, Supplementary Fig. 5c), which have a residual close to zero. All other forest types have negative residuals (Supplementary Fig. 5c), as their Fisher’s alpha is generally below that of the regional average. In our new maps, by a-priori assigning plots to their main forest type the predictions are improved and white sand areas and riverine forests clearly emerge as areas with lower diversity and species richness, within richer terra firme (Figs. 1, 2). Overall, the model of Liang et al. explains a much lower percentage of variation in species-richness/ha of our plot data here (28%, Supplementary Fig. 25) and has significant residual variation for forest type (Supplementary Fig. 25b) and region (Supplementary Fig. 25d). In our model residuals for forest type and region are completely lacking (Supplementary Fig. 3b, c, S7b, c), indicating an overall better explanatory power of observed patterns. In addition, other recent maps based on species ranges39,40 cannot show the detail by soil type, we provide as the modelling is mainly based on smooth, interpolated climatic data. While ref. 40 shows a broadly similar pattern, the map of ref. 39 shows little similarity with the maps provided here.

Forest type has a strong effect on diversity. Diversity is the inverse of species dominance on the plots which was shown to be related to the total area these forest-soil combinations make up in Amazonia17. For the seven forest-soil combinations used here, a similar relationship holds (Supplementary Fig. 26). A larger are has a larger species pool and thus higher local richness and diversity21,41. We believe that the effect of area is one of the main drivers of the differences in diversity and richness in these forest systems17,26,42, as also mentioned in the introduction.

Both tree density at plot level and, the interpolated version of it (regional tree density), affected diversity and species-richness, with regional density having a stronger effect. This suggests that density is more than just sampling bias but rather a regional signal where more individuals in a region add up to more species21, a richer regional species pool and thus higher local richness as well41. Both larger area and the higher tree density lead to more regional individuals, leading to a larger regional species pool21 and thus higher potential local richness41. We believe that this is a strong driver of tree alpha-diversity and species-richness of tree plots across Amazonia.

The overall prediction of the maps is very good with low standard errors. For white sand forest and swamp forests, however, the standard error is much higher, leading to less accurate predictions, as was also shown by the leave-out-one test. This appears to be mainly caused by the lower number of plots, as the standard deviation of the mean diversity of terra-firme plots not so much smaller than that of the other forest types (Figs. 1, 2). Thus, the prediction of swamp forest diversity and white sand forest diversity would improve from more plots in these forest types. This is, however, partly caused by how the standard error is calculated—the standard deviation by the square root of the number of items—thus, strongly influenced by the number of plots per forests type. The standard error in Fig. S26, is therefore much related to the number of plots per forest type.

Annual rainfall and cumulative water deficit affected tree species-richness of terra-firme forest, with cumulative water deficit having a slightly stronger effect. The relationship is not strictly linear but more a quantile relationship, where the upper limit of species-richness is determined by the length and strength of the dry season, together driving the cumulative water deficit. This effect was also apparent in a set of 69 0.1-ha plots6 and in our earlier data25. The effect of species filtering by reduced rain and increased drought was also shown convincingly by studies in western Amazonia4.

Collecting intensity also had a significant effect on tree species-richness. Still, it added no explained variation to a model with just cumulative water deficit, regional tree density and temperature seasonality. Two areas stand out in collecting intensity: the area surrounding Manaus (Brazil) and French Guiana. These areas also tend to have plots with high species-richness. However, the expected effect does not add to the other variables. It should be noted that even in the final model (Fig. 4) the highest positive residuals are found around Manaus, central Amazonia. Year of collecting had little effect on tree species-richness. We expected richness to increase steadily through time, as more species would have been described, through time31. However, as most plots prior to 1980 were primarily located in the poorer eastern Amazonian regions, and the effect was very small, we may suggest that the morpho-species on most of the plots are a sufficient proxy for the actual species richness.

The estimated soil parameters, pH and sum of bases, did not have a particularly strong effect in explaining tree species-richness of terra-firme (see also ref. 43). This contrasts with earlier findings of a strong relationship between regional soil fertility and composition5 but supports the results of smaller plots, where actual measured nutrients contributed little to species-richness 6. The lack of strong relationship between soil fertility and diversity is best visualized by comparing two areas with high diversity (western and central Amazonia) with very contrasting overall fertility44,45. We therefore conclude that at large scale water availability is a more important driver for tree species-richness than is soil fertility, see also ref. 4.

The Guyana Shield area, especially the northern part, from Suriname to the upper Rio Negro area, had relatively low species-richness and is the main area with negative residuals for the predictions of the final model (Fig. 4 and all other models, Figs. S17–S25, S27–S29), indicating that this area has lower species-richness than expected by cumulative water deficit, regional tree density and temperature seasonality. Although our regional soil predictors had little predictive power, this is the main area in Amazonia with predominantly sandy, nutrient-poor, soils. Also much of the terra-firme forest here occurs on sandy soils with low clay content that gives them a reddish tinge (called Iwité in the Upper Rio Negro area [W. Magnusson pers comm.]) but having a very different composition than forests on adjacent white sand soils7,46. We suggest that the model results would improve with better regional soil maps, especially for this area. Another plausible hypothesis is that the Guianas have been separated geographically from the rest of Amazonia and experienced lower regional species input from the rest of Amazonia during glacial periods, when Amazonia may have consisted of two large forest blocks47,48,49 and are still somewhat separated by a “dry transverse belt”, the Acarai Mountains (located at the border of the Guianas and Brazil) and the Guyana highlands.

Southern Amazonia and especially the Bolivian forests had modestly low negative residuals in all models that did not include temperature seasonality. The Amazonian forests are thought to have been expanding southward into the Cerrado area only relatively recent see e.g., ref. 50. Some of these forests may be in the range of only 3000–7000 years old and are still dominated by fast-growing tree species, such as Moraceae and Urticaceae in the Bolivian forests50,51. More forests in the transition towards the Cerrado in Brazil, also may still be accumulating species-richness but may face a challenge with increasing drought caused by global warming and droughts. As pre-Columbian inhabitation and forest use and clearing has been perhaps been most prominent in the southern–southwestern border of Amazonia, this may also have affected diversity negatively. Indeed, in some Bolivian forests, species domesticated by pre-Columbian people, may make up over 60% of all trees52. We cannot determine whether the historical changes, the gradient in temperature seasonality, or a combination of the two, are responsible for the lower diversity in that area.

Finally, central Amazonia has a modestly higher richness than all models predicted and is undoubtedly one of the centres of tree species-richness in Amazonia (Figs. 3, 4). It has been proposed that central Amazonia is an area where several biogeographic regions overlap, leading to high richness53. Alternatively, a fully random mid-domain effect54 of overlapping distribution ranges has been suggested for this pattern25. However, since most species in Amazonia are rare55,56 and likely have small ranges57,58, a random distribution of ranges would lead to a rather flat curve with only an effect at the edges, ruling out a mid-domain effect. The high species richness of central Amazonia is not picked up by other maps37,38, and this is likely due to the much higher number of plots in our data, leading to a more precise prediction.

Our maps show the spatial distribution of tree diversity and richness of the original forest cover of Amazonia and we have identified drivers for the main patterns. Whereas species-richness may be taken as input for conservation, the composition of Amazonian forests is not homogeneous4,5,59 and the differences in forest composition have to be taken into account for comprehensive conservation. Forest loss in Amazonia has been increasing since 2014 (http://terrabrasilis.dpi.inpe.br/app/dashboard/deforestation/biomes/amazon/increments). Most deforestation has taken place in the states of Pará, Mato Grosso and Rondônia. These are not the areas on our map with the highest species-richness. However, they may have species that do not occur elsewhere in Amazonia and even with continuous, moderate forest loss, several species may become critically endangered40,58, and this area may have only very fragmented forest left, which will be vulnerable to drought, fire, hunting and other human impacts40,58.

In conclusion, seven main forest-soil combinations have a strong effect on tree species-diversity and richness, arguably driven by differences in size and fragmentation of their area and species trade-offs due to very different ecologies. Using location as the only predictor and stratified to account for the four major soil-forest combinations, our spatial model provides the most accurate map of tree diversity in Amazonia to date, explaining close to 70% of variation in tree alpha-diversity and over 70% of the variation in tree species-richness of Amazonian forest plots. Alternatively, a model not using location but with cumulative water deficit, tree density and temperature seasonality explains 43% of the tree species-richness in the terra-firme forest in Amazonia. Over large areas across Amazonia, residuals of this relationship are small and poorly spatially structured, suggesting that much of the residual variation may be local. The poor predictions of the final model in southern Amazonia (notably Bolivian Amazonia) and the northern Guyana Shield may have biogeographic and anthropogenic causes, as in the expanding forests of southern Amazonia and long-time separation in the Guyana Shield area, leading to a complex history and ecology of Amazonian tree species-richness.

Methods

Tree data

Tree-inventory data of undisturbed/old growth forest were taken from the April 2023 version of the Amazon Tree Diversity Network (ATDN) inventory database55,56. ATDN stores single inventories for each plot for trees with a diameter at breast height (dbh, 1.30 m) ≥10 cm. A tree is defined as a free-standing woody individual with dbh ≥10 cm60.

A total of 2220 plots were present in this database, with individuals identified at least at the morpho-species level. We omitted plots smaller than 0.5 ha (138 plots) and larger than 2 ha (36 plots), leaving 2046 plots for all calculations and mapping (Supplementary Fig. 27).

Modelling density, diversity, and richness patterns

For each plot, tree density was calculated as the number of stems per ha (Nha). Tree species alpha-diversity was expressed as Fisher’s alpha, a diversity measure theoretically insensitive to sample size1, by iteratively solving α = S/ln(1 + N/α), with N as the total number of individuals and S as the total number of morpho-species per plot. As not all plots represent one hectare, species-richness per ha (Sha) was estimated solving for Sha = α * ln(1 + Nha/α)1. Note that if a plot is exactly one ha, Sha is exactly equal to S. As both area and the number of individuals (i.e. sample size) have a known positive effect on richness21, calculating the number of species per ha circumvents the discrepancy in plot size but not differences in density of individuals. Plots with higher densities will still have, on average, higher richness. To account for the latter, we also calculated the number of species in (a random sample of) 500 individuals from each plot as S500 = α * ln(1 + 500/α)1.

Finally, the spatial predictions of tree density (number of trees/ha), tree alpha-diversity and tree species-richness for the Amazon lowland forest were plotted on a map with a resolution of 0.1 degree (11 × 11 km, Supplementary Fig. 28a)58,60, based on the original forest extent of Amazonia, stratified into the major soils corresponding to the major forest-soil combinations used28,55 (Supplementary Fig. 28b).

For this, we constructed a simplified soil map based on the Soil and Terrain database for Latin America and the Caribbean27,28,55, to match this division. We aggregated all soil types into 1) poor white sand areas using FAO soil types Podzols (PZ) and Albic arenosols (ARa); 2) floodplains (várzea, igapó) using Gleysoils (Gl), Fluvisols (Fl); 3) swamps, using all Histosols (Hs); 4) and the remainder as soils supporting terra-firme forest. The ATDN plots were subdivided following this soil-flooding-based approach according to the four categories, that do justice to the major soil-forest combinations, while ensuring sufficient plots for interpolation by category: 1) non-flooded Terra-firme (1443 plots used), 2) floodplain forests (várzea [241] and igapó [222]), 3) very nutrient poor white sand podzols (95), and 4) permanently inundated/waterlogged swamps (46) (Supplementary Fig. 28).

For our spatial interpolations we used loess regression, using only longitude, latitude and their interaction as independent variables and tree density, tree alpha-diversity and species-richness as the dependent variables. For all loess regressions we used a span of 0.2, a 2nd degree polynomial, and no extrapolation. Kriging was not possible as at several locations with multiple plots most variation was already locally present, so the semivariograms showed no range.

For each of the four categories (terra-firme, várzea plus igapó, podzols, swamps), we constructed a separate spatial interpolation model of tree density, tree alpha-diversity and tree species-richness across Amazonia.

For example, for tree alpha-diversity, we made a single spatial interpolation for all plots located on white-sand podzols. This interpolation was then used to predict the value for tree alpha-diversity for each grid cell on the soil map considered to be white-sand podzol (Supplementary Fig. 28b, yellow pixels for white-sand). The same was done for all seasonally flooded forest plots (várzea + igapó, combined to have sufficient plots), all swamp plots, and all plots established on terra-firme. Whereas the soil grid27 is based on the major hydrology/soil type; the soil type of the plots was determined independently of this map and based on field observations of those who established the plot. Consequently, it is possible that a plot classified by observers as white-sand podzols is located in a grid cell classified as terra-firme on the map. Regardless, it was used in the white-sand spatial model as the field observations are considered to be correct. For a visual explanation of this method, see Supplementary Fig. 29. As we allowed no extrapolation, pixels too far from the plots were not given a value. As a 2nd degree polynomial may produce upward and downward exaggerations, values higher than the observed maximum in the data were set to the maximum value and those lower than the minimum to the minimum value.

Testing the model fit

We calculated the percentage of variation as explained by the combination of the spatial models for each variable (tree density, tree alpha-diversity and tree species-richness), by analysing the observed and predicted values together, using a simple linear regression. We tested the goodness of prediction by mapping the standard error of the loess regression, also examining them by region and forest type. We tested for autocorrelation in the residuals, using the function Moran.I(), in the ape package distribution61 to further assess the validity of the model predictions62 and mapped the residuals to asses potential residual spatial signal. A histogram was constructed of all values for each variable, as well as boxplot by region and forest type. A final test was performed by producing 2046 maps for species richness/ha, where each plot was omitted in one run (a leave-one-out procedure). This map was then used to predict the species-richness/ha for the plot that was left out and can be considered a non-biased estimate of the quality of the resulting map. We modelled the effect of climate and large-scale patterns of soil nutrient richness but for terra-firme only, as this forest type had the highest number of plots.

To assess the effect of the number of individuals on tree alpha-diversity and tree species-richness of terra-firme forest, we used the local tree density (i.e. the number of trees/ha for each plot) and the interpolated stem density (also expressed as trees/ha), which is a measure of the average density in an area surrounding the plots. We assume that large areas with higher density, having more individuals, have a higher species pool21, resulting in higher species-richness at the plot level. We use the term regional tree density for this.

Climatic data was extracted by plot location from the grid data from Worldclim 263. The cumulative water deficit was calculated following Chave et al.64 and can be considered a parameter of the strength of the dry season. Soil fertility (log[sum of bases]) was extracted from the latest Amazonia-wide soil-fertility map65. We used sum of bases rather than the often-used CEC (Cation Exchange Capacity), as the latter includes the full exchange complex, which on acid tropical soils often includes a large portion of Al3+ and H+. Soil acidity (pH) is also an often-used index of soil fertility (a low pH being infertile). We extracted pH data from Soterlac27, ISRIC wise66, RAINFOR sites44, and refs. 67,68,69. For the sum of bases and pH, we created a loess interpolation model, based on all data available (data availability differed between sum of bases and pH), as described above (Supplementary Fig. 30). We then estimated the sum of bases and pH for each plot based on the loess interpolation.

Collecting intensity was based on the 530,025 unique herbarium collections of Amazonian trees from ref. 31, using the standard Kernel density function of R with Gaussian smoothing and adjustment of 0.270 (Supplementary Fig. 31A). The latter is comparable with the loess span = 0.2 used in our loess interpolation (Supplementary Fig. 1B). The year of the establishment was known for most plots. If unknown, we used the year of publication minus one year.

We analysed the effect of abiotic variables on tree species-richness only, for two reasons: 1) Species-richness is much easier interpretable than Fisher’s alpha, and 2) it has a very strong relationship with Fisher’s alpha. We also used quantile regression71,72, as 1) quantile regression is much less sensitive to outliers, with quantile regression using tau = 0.5 being identical to least absolute deviation regression (i.e. line dividing the data at 50% minimizes, as follows from the name, the absolute deviation from that line) and 2) because it allows flexibility of using other quantiles as well. We used tau = 0.9, which produces the line with 90% of the data below and 10% above it, minimizing the absolute deviation. This line can be seen as the maximum the dependent variable can achieve for a value of the independent variable and has been used successfully to demonstrate the effect of dry season length of the Amazonian forest before25.

All analyses were carried out in the R programming environment70, mostly with custom made scripts, using the libraries ape61, jpeg73, raster74, rgdal75, quantreg76, and vegan77.

Statistics and reproducibility

All tests were carried out with all plots (n = 2047) or all terra firme plots (1441). All tests and data are available in the online supplementary material (see below) and can thus be reproduced.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.