The ecological drivers of variation in global language diversity

Language diversity is distributed unevenly over the globe. Intriguingly, patterns of language diversity resemble biodiversity patterns, leading to suggestions that similar mechanisms may underlie both linguistic and biological diversification. Here we present the first global analysis of language diversity that compares the relative importance of two key ecological mechanisms – isolation and ecological risk – after correcting for spatial autocorrelation and phylogenetic non-independence. We find significant effects of climate on language diversity, consistent with the ecological risk hypothesis that areas of high year-round productivity lead to more languages by supporting human cultural groups with smaller distributions. Climate has a much stronger effect on language diversity than landscape features, such as altitudinal range and river density, which might contribute to isolation of cultural groups. The association between biodiversity and language diversity appears to be an incidental effect of their covariation with climate, rather than a causal link between the two.


3
Introduction 1 4 correlations between language diversity and environmental productivity 20 , mean growing season 16-1 7 number of languages per grid cell at all three resolutions (Table 2), consistent with a reduction in 1 8 river density is associated with greater language diversity at low and medium resolutions, beyond 1 its covariation with climatic variables and the other landscape variables (Table 3). This is consistent 2 with previous proposals that rivers act to isolate populations into smaller language groups 13 . 3 However, we find little additional support for this hypothesis. While river density is associated with 4 smaller minimum speaker population size at medium resolution (Table 3), there is no association 5 between river density and average speaker population size (controlling for the effects of population 6 density). These observations suggest that the association between river density and language 7 diversity is more akin to the ecological risk hypothesis than to the isolation hypothesis, because 8 rivers seem to allow the persistence of smaller speaker populations, but not to divide human 9 populations into smaller speaker populations. In this sense, rivers seem to act more as an ecological 10 resource than a barrier to interaction. 11 12 Similarly, while altitudinal range is associated with language diversity at high resolution with 13 marginal significance, there is no evidence that this is caused by isolation, as altitudinal range does 14 not result in reduction in speaker population size, even when controlling for population density 15 (Table 3). While landscape roughness is significantly associated with language diversity when 16 altitudinal range is not included in the model (t=2.87; p=0.004), we find no significant association 17 between landscape roughness and language diversity beyond its covariation with climatic variables 18 and the other landscape variables under the three resolutions, and no statistically significant 19 negative association between landscape roughness and speaker population size (Table 3). 20

21
In contrast to a previous study that described river density and landscape roughness as universal 22 determinants of language diversity 13 , we find little evidence that landscape variables have a strong 23 or consistent influence on language diversity. Although we use similar data to Axelson & Manrubia 24 (2014), there are a number of differences in our analytical approach. To compare our results to 25 theirs, we reanalyze our data using their method, fitting continent-specific parameter values and not 26 including altitudinal range. Without correcting for spatial and phylogenetic non-independence 9 among grid cells, we get similar results to Axelson & Manrubia (2014), namely that river density 1 and landscape roughness have significant associations with language diversity in most continents 2 (Table S1). But when we correct the data for non-independence among grid cells, neither river 3 density nor landscape roughness has a significant association with language diversity in any 4 continent (Table S1). We therefore conclude that the previous result was driven primarily by spatial 5 autocorrelation and phylogenetic non-independence, with the similarity in both landscape variables 6 and language diversity between neighbouring grid cells generating spurious correlations. 7 8 In conclusion, we find little consistent support for effect of isolation mechanisms on language 9 diversity. While we find associations between language diversity and river density, altitudinal range 10 and landscape roughness, these landscape factors have much less influence on language diversity 11 than climatic factors, and there is little indication that this is caused by the division of human 12 populations into smaller, isolated cultural groups. Instead, previous results suggesting river density 13 and landscape roughness are universal determinants of language diversity 13 may have been driven 14 by autocorrelation among grid cells. 15

(iv) Is language diversity significantly associated with biodiversity? 17 18
We now ask if biodiversity provides any additional explanation for language variation beyond 19 covariation with climate and landscape factors. Adding mammal or bird diversity as additional 20 predictors to the climatic and landscape variables significantly improves model fit, but adding 21 vascular plant and amphibian diversity do not provide additional explanatory power (Table 4).
Adding biome to the analysis increases model fit above climate variables at low resolution, 23 suggesting that ecosystem structures may influence language diversity, however it does not provide 24 significant explanatory power above the effect of climate at medium and high resolutions (low: 25 LR=27.01, p=0.02; medium: LR=14.91, p=0.38; high: LR=11.83, p=0.62). 26 10 Why are bird and mammal diversity associated with language diversity? There is no evidence that 1 this is due to a direct causal relationship between biodiversity and language diversity, because there 2 is no consistent relationship between these biodiversity measures and residual variation in language 3 diversity, above and beyond that explained by climate and landscape (Table S2). Instead, the 4 increase in model fit when bird and mammal diversity are added to the model of language diversity, 5 climate and landscape, seems to be driven primarily by regions that have both low language 6 diversity and low species diversity, particularly the Sahara, the Arabian Peninsula, and the Tibetan 7 Plateau (Figure 2 and S3), which present harsh environmental conditions for birds and mammals 8 (including humans). These are not the only regions of low diversity but they seem to have a 9 disproportionate influence on the relationship between mammal and bird diversity and language 10 diversity ( Figure S4). Running the high resolution analysis without these low diversity areas, we 11 find that adding mammal or bird diversity as additional predictors to the climatic and landscape 12 variables no longer increases model fit (n = 334, mammal: LR=1.92, p=0.17; bird: LR=3.67, 13 p=0.07), but results for the climatic and landscape effects are similar to the complete dataset. 14 Temperature seasonality is still the strongest predictor for language diversity in the climatic 15 variables (t=-2.34, p=0.02) and so is altitudinal range in the landscape variables (t=2.27, p=0.02). 16 These results suggest that the low diversity areas have a significant effect on the association 17 between biodiversity and language diversity, but they are not responsible for the broader association 18 between language diversity and climatic and landscape effects. 19 20 In conclusion, we find that the association between language diversity and biodiversity appears to 21 be largely a result of their covariation with common climatic and landscape factors, and any 22 additional increase in model fit between language diversity and mammal and bird diversity is likely 23 due to the disproportionate effect of a few regions of harsh environment that reduce both 24 biodiversity and language diversity.

12
Alternatively, it may be that other factors contribute significantly to shaping language diversity that 1 are not captured by climate variables (representing the ecological risk hypothesis) nor by landscape 2 variables (representing isolation mechanisms). For example, regions of higher than expected 3 language diversity may have had a longer period of in situ language diversification, or have 4 undergone a higher rate of diversification, leading to a greater accumulation of languages in these 5 regions than in other regions of similar climate. One way to investigate the influence of time or 6 diversification rate on diversity is to use a phylogeny that contains information on the relative 7 timing of diversification events in order to compare the timescale and rate of diversification in 8 different regions 24,36 . While phylogenies are available for the languages within some language 9 families 37-42 , there is currently no global dated phylogeny of languages, nor is there general 10 agreement on the relationships or age of language families. Therefore we lack the means to make a 11 quantitative comparison of duration or rates of diversification between the majority of grid cells 12 (those that contain languages from different families or languages not contained in comprehensive 13 phylogenies). 14 15 Nevertheless, we can make a qualitative comparison of the relative depth of divergence represented 16 in each grid cell if we make the simple assumption that languages from the same language family 17 diverged more recently than languages from different families. Number of language families per 18 grid cell is a significant predictor of residuals in language diversity under the three resolutions (low: 19 t=4.65, p=<0.001; medium: t=6.27, p=<0.001; high: t=8.83, p=<0.001; Figure S5). However, we are 20 hesitant to draw strong conclusions from this pattern. For example, while New Guinea has more 21 language families per grid cell than most other regions, the other areas of high unexplained language diversity do not have unusually high language family diversity, and some areas with many 23 language families do not have high language richness (Figure 4; Figure S5). Clearly, this is not an 24 ideal analysis of variation in time for diversification, as we cannot standardize time or rate of 25 language evolution across families without a global dated phylogeny. But it suggests that time to 13 diversification may be a profitable area of enquiry once complete language phylogenies become 1 available. 2 3

Conclusion 4
The overall picture supported by our analyses is that environmentally-driven ecological processes 5 are a major determinant of global variation in the diversity of human languages, as they are for 6 global variation in biodiversity. Associations between global patterns of language diversity and 7 climate are consistent with the ecological risk hypothesis, that stable productive climates allow 8 human cultures to persist in smaller, more localized groups. Our results offer less support for 9 isolation mechanisms as drivers of language diversity. While there are significant associations 10 between language diversity and river density, altitudinal extent and landscape roughness, landscape 11 factors have less explanatory power than climate. The association between biodiversity and 12 language diversity is likely due to an incidental association between language and species richness 13 driven by shared causal factors such as climate and landscape. The importance of influences such as 14 time to accumulate diversity or the rate of language diversification are yet to be explored in detail.  8741-8743 (1996). Table 1. Climatic effects on language diversity, at high, medium, and low resolution (n is the 1 number of grid cells used in the analysis at each resolution). We list the t value and the p value of 2 each predictor in a generalized least squares regression that includes all the six eco-climatic 3 predictors. Two additional parameters are the intercept and the coefficient for land coverage. 4 Because collinearity can inflate the standard error of regression coefficient, we also conduct 5 likelihood ratio (LR) tests to assess if adding a predictor significantly increases model fit. If so, the 6 predictor has a significant effect on language diversity beyond its covariation with other predictors. 7 Significant results are in bold.   Table 4. Association between biodiversity and language diversity after accounting for their