## Introduction

One of the major puzzles facing the language sciences is how to best account for the factors and processes shaping the observed patterns of linguistic diversity today. The color lexicon is particularly interesting from this perspective. The long-standing emphasis has been on establishing universals1. Recent work shows that universal tendencies in color naming systems arise from perceptual structure2 and communicative needs3,4,5. At the same time, it is just as important to be able to characterize and explain variation found in color terminologies6. Languages differ in the number of basic color terms and which distinctions exactly are recognized for naming. Some languages only lexicalize ‘black’, ‘white’, and ‘red’; while others make many more distinctions, including ‘green’ and ‘blue’.

Blue is a particularly interesting color term to study since there are competing explanations for why it would emerge as a word. According to one class of theories, color terms emerge from salient features of the environment: for blue this would include water bodies, such as the sea, lakes, or rivers, as well as visibility of a blue sky which is affected by climate and humidity. That is, the ecological environment (such as the presence of lakes) may be a factor relevant in shaping the color lexicon7,8,9,10, suggesting that languages spoken in different environments may categorize the color continuum in different ways. An alternative account appeals to cultural practices, such as the presence of dyeing technologies1,11,12 and industrialization3. On this account, the emergence of blue comes about in those societies where blue dyes are used to color artifacts. More generally, it has been predicted that complex color terminologies (including the presence of blue) emerge in more complex societies12,13,14.

A third explanation for the emergence of blue is grounded in physiology15,16. It has been suggested that increased exposure to ultraviolet light, specifically to UV-B, affects the physiology of the lens of the eye17,18,19,20,21,22, increasing, in the long term, lens opacity23,24 and yellow pigmentation density inside the lens25, which affects in turn how light reaches the retina (Fig. 1 Panel D). This process of lens brunescence, where yellow pigments absorb short-wavelength (blue) before reaching the retina, may reduce the ability to perceive the blue part of the color spectrum26,27,28,29 (Fig. 1 Panel E), increasing the probability that a single ‘grue’ term combining ’blue’ and ’green’ is used instead. That is, there is no distinct lexicalization of ‘blue’16,30. Lens brunescence is driven by environmental exposure to UV-B radiation which is related to latitude (locations closer to the equator receive more sunlight than those closer to the poles), altitude (higher elevations have less ozone filtering), climate and humidity (cloud coverage), vegetation (close versus open environments), and culture (amount and type of outdoor activity). This predicts that in populations with high exposure to UV-B radiation, the persistent loss of sensitivity to blue in adults across generations should generate negative pressure against the development of a ‘blue’ category or a tendency to lose it, while in populations with low exposure to UV-B incidence, other processes may proceed unhindered enabling the emergence and retention of ‘blue’. If true, we should observe a statistical tendency for fewer languages with a dedicated word for blue at high UV-B incidence (particularly close to the equator).

Previous cross-cultural analyses linking UV-B incidence and the presence of ‘blue’16 may have been biased by the lack of control for various confounds (especially language contact and inheritance from a shared proto-language31), exclusion of possible alternative explanatory factors (such as cultural complexity, environmental influences), lack of modeling of causal pathways, and the statistical methods used. Therefore, we combine multiple explanatory factors and potential confounds, including language contact generally (since specific data about the borrowing of words for blue is not available for most languages under consideration) and language family, subsistence strategy and population size (as proxies for cultural complexity32,33 as systematic, quantitative data about cultural artefacts and dyeing techniques across cultures does not exist), latitude (which partly determines UV-B incidence), climate, ecology and distance to large bodies of water (as proxies for various environmental influences) using a comprehensive database and advanced statistical methods. In doing so, we not only test the role of UV-B light on the development of the color lexicon, but also explicate the extent to which additional hypothesized factors shape the presence of a specific term for blue, and where relevant through which causal pathways specifically.

Environmentally-induced (or acquired) abnormal color perception is not the only type of color deficiency. Notably, inherited red-green color-blindness has a relatively simple genetic basis where certain alleles at the opsin genes on the X chromosome encode for abnormal pigments, resulting in a higher incidence of abnormal color perception in males specifically affecting the low (red) and mid (green) frequencies of the visual spectrum (Fig. 1 Panel E). It has previously been suggested that a subsistence strategy based on hunting and gathering might generate selective pressures against red-green abnormal color perception due to the high survival relevance of these colors in the wild, while the transition to agriculture and, in particular the industrial revolution, may have greatly relaxed these pressures34,35.

Moreover, these two broad types of abnormal color perception could interact with each other. At the individual level, for somebody already affected by a relatively profound red-green deficiency, also acquiring lens brunescence would have devastating effects on the ability to use color (Fig. 1 Panel E), and may affect survival and possibly the ability to produce and raise offspring. This predicts that, despite their different and a priori independent causal mechanisms (environmental exposure vs genetic), a negative correlation between the two could emerge across evolutionary time, in that there may be stronger selective pressure against the alleles responsible for red-green abnormal color perception in populations with high UV-B exposure, leading across generations to an overall lower frequency of such inherited abnormalities in these populations9. Here, we not only test the hypothesis that there is a negative relationship between UV-B incidence and the population frequency of red-green color deficiency using advanced methods and controls, we also include the potential influence of subsistence strategy.

In summary, we test two main hypotheses, one linking the existence of a dedicated word for blue to UV-B incidence while simultaneously controlling for other factors, and a second concerning the selective pressures against abnormal red-green vision in different environments. We do so on a large dataset of 142 populations (Fig. 1 Panel A) speaking languages from 32 families across the world, for which we collected information about the existence of a dedicated word for blue, the incidence of ultraviolet light, and the frequency of red-green color deficiency, as well as data on elevation, population size, subsistence strategy, climate, ecology, distance to large bodies of water, and several additional variables concerning the populations and physical environments they inhabit (see Methods and Supplementary Materials). We used Bayesian hierarchical regression, mediation and path analysis, and various machine learning techniques, informed by previous research and causal modeling, to specifically test these hypotheses in a comprehensive manner. We found that both hypotheses were by and large supported, but importantly they only captured part of the overall picture (Fig. 1 Panels B and C). While variation in UV-B incidence (ultimately due to variation in latitude) was the most important predictor of ‘blue’, population size and distance to large bodies of standing water also played a role, with their effects mediated by climate and subsistence strategy. Likewise, there was a lower frequency of people with red-green color deficiency in populations closer to the equator (i.e., with more lens brunescence), and whose languages lacked a dedicated word for blue.

## Results

### UV-B incidence predicts the existence of a specific word for blue

Here we tested the hypothesis that the incidence of UV (in particular, UV-B) predicts the existence of a specific word for blue, taking into consideration various potential confounds and causal pathways.

Both UV-A and UV-B affected blue negatively (here, blue is the dichotomous variable ‘is there a specific term for blue in the language?’, italics indicate variable names; for more information about the variables, see Methods): UV-A: $$\beta = -1.03$$, 95%HDI = $$[-1.65, -0.38]$$, $$p(\beta =0) = 0.0154$$, $$p(\beta <0) = 1$$; UV-B:$$\beta = -1.12 [-1.75, -0.55]$$, $$p(\beta =0) = 0.0128$$, $$p(\beta <0) = 1$$. Please note that for such Bayesian regression results we report the slope, $$\beta$$, of the fixed effect(s) of interest in terms of their mean and 95% HDI (Highest Density Interval), as well as specific tests that capture the posterior probability that the hypothesis is true (i.e., they should be interpreted directly, and not as frequentist p-values in terms of the probability of seeing such a result were the null hypothesis true); also, while directional hypotheses should be a priori motivated, point hypothesis do not have the same requirement. However, as UV-A and UV-B were highly multicollinear ($$VIF_{mean UVA} = 36.3$$, $$VIF_{mean UVB} = 36.3$$) and UV-A fitted the data worse than UV-B ($$BF = 0.31$$, $$LOO = -1.29\pm 0.77$$, $$WAIC = -1.13\pm 0.77$$, $$KFOLD = -1.20\pm 1.63$$; for such comparisons between two models, $$m_{1}$$ and $$m_{2}$$, we report the Bayes Factor, BF, ideally < 0.033 or > 10, as well as the LOO, WAIC and KFOLD as the difference in the Expected Log pointwise Predictive Density, ELPD, between the two models and its standard error, for which the absolute difference should be larger than the standard error; ideally, all these criteria should agree, but they capture a priori different aspects of model comparison36), we retained only UV-B in the following analyses. The variable blue was positively affected by latitude ($$\beta = 4.39 [0.82, 8.01]$$, $$p(\beta =0) = 0.068$$, $$p(\beta <0) = 0.99$$), population size ($$\beta = 0.31 [0.16, 0.48]$$, $$p(\beta =0) = 3.8 \times 10^{-7})$$ and climate PC1 ($$\beta = 0.99 [0.40, 1.57]$$, $$p(\beta =0) = 0.035$$), and negatively by distance to lakes ($$\beta = -0.56 [-0.87, -0.27]$$, $$p(\beta =0) = 0.0074$$). Furthermore, 4 of the 10 dimensions, resulting from the multidimensional scaling (MDS) of the matrix of genetic distances ($$F_{ST}$$37) between all pairs of populations, also affected blue (positively for genetic D7, and negatively for genetic D1, genetic D4 and genetic D6; see Supplementary Materials).

The mediation analysis (Fig. 2 Panel B) shows that latitude has a positive total effect (TE = 5.7, 95%HDI [2.3, 9.6]) on the presence of a distinct word for blue. There was a negative direct effect (DE = $$-7.4 [-17.9, 2.4]$$), as well as a positive indirect effect mediated by UV-B (IE = 13.0[3.6, 23.1]; $$\beta _{latitude \rightarrow UVB} = -5.8 [-6.2, -5.3]$$, $$\beta _{UVB \rightarrow blue | latitude} = -2.3 [-4.0, -0.6]$$). We explored other mediation models (see Supplementary Materials) which, taken together, suggest that distance to lakes also mediates between latitude and blue, and that subsistence is linked to blue through population size. We also constructed a complex (but largely theoretically motivated and conservative) path model (Fig. 2 Panel A) linking latitude and blue through UV-B and controlling for potential confounds. This model fitted the data very well ($$\chi ^2(1) = 0.4$$, $$p = 0.53$$; $$CFI = 1.0$$, $$TLI = 1.05$$, $$NNFI = 1.05$$, $$RMSEA = 0.0$$) and, importantly, showed that the effect of latitude on blue is fully mediated not only by UV-B ($$\beta _{latitude \rightarrow UVB} = -0.96$$, $$p < 0.001$$, and $$\beta _{UVB \rightarrow blue} = -0.65$$, $$p = 0.032$$), but also by subsistence, population size and distance to lakes.

The presence of a distinct term for blue was predicted by information in the full dataset using Bayesian mixed-effects logistic regression with family and macroarea as random effects, the conditional random trees, the random and conditional random forests, and the Support Vector Machines, with very little difference between them (accuracy from 76% for random forests to 91% for the Bayesian mixed-effects model). Moreover, all techniques generalized well and had comparable performance (as expected, lower on the testing sets than on the full database), the best being the conditional random forests and conditional trees. Across these, the most important predictors tended to be (not in any particular order): UV-B, population size, climate/humidity, latitude, and distance to lakes (see Fig. 3 Panel A).

### Population frequency of abnormal red-green vision

In a second set of analyses, we tested whether the population frequency of abnormal red-green color perception (here, we designate this variable as daltonism) was linked to the incidence of UV light and the presence of a dedicated word for blue (as a proxy for its physiological effects on blue-green color perception; see Fig. 4 Panels A and B). Bayesian mixed-effects regressions showed that daltonism was positively influenced by blue ($$\beta = 0.61$$ [0.33, 0.88], $$p(\beta =0) = 0.0004$$, $$p(\beta >0) = 1$$) and latitude ($$\beta = 1.69$$ [0.78, 2.64], $$p(\beta =0) = 0.0076$$, $$p(\beta <0) = 0$$) , and negatively influenced by UV-B ($$\beta = -0.3$$ [− 0.45, − 0.15], $$p(\beta =0) = 0.083$$, $$p(\beta <0) = 1$$); it was also affected by 4 genetic distance dimensions (see Supplementary Materials).

Taken together, the mediation analyses suggested that the effect of latitude on daltonism was mediated mostly by UV-B and blue, and especially the mediation UV-B $$\rightarrow$$ blue $$\rightarrow$$ daltonism was important (TE = − 0.74 [− 1.21, − 0.35], DE = − 0.21 [− 0.35, − 0.06], IE = − 0.53 [− 1.04, − 0.16]; $$\beta _{UVB \rightarrow blue} = -1.14$$ [− 1.78, − 0.55], $$\beta _{blue \rightarrow daltonism | UVB} = 0.49$$ [0.20, 0.77]). A path model (Fig. 4 Panel C) fits the data well ($$\chi ^{2}(1) = 4.0$$, p = 0.137; CFI = 0.951, TLI = 0.927, NNFI = 0.927, RMSEA = 0.08) and showed that UV-B has both a direct effect on daltonism ($$\beta = -0.31$$, p < 0.001) and one mediated by blue ($$\beta _{UVB \rightarrow blue} = -0.55$$, p < 0.001; $$\beta _{blue \rightarrow daltonism} = 0.4$$, p < 0.001). Adding subsistence to the model improved the fit further ($$\chi ^{2}(1) = 1.6$$, p = 0.45; CFI = 1.000, TLI = 1.011, NNFI = 1.011, RMSEA = 0.0). It added a significant influence of subsistence ($$\beta _{subsistence \rightarrow daltonism} = 0.38$$, p < 0.001), but removed the influence of blue to daltonism ($$\beta _{blue \rightarrow daltonism} = 0.11$$, p = 0.32).

Converging evidence across analysis methods predicted daltonism on the full dataset, but this performance did not generalize well. Moreover, the importance of the predictors varied across methods. Nevertheless, UV-B tended to be among the most important predictors, along with blue and some genetic distance measures (see Fig. 3 Panel B).

## Discussion

Taken together, these analyses demonstrate that no single explanatory factor explains the color lexicon. This finding resonates with the centuries of debate on this topic with myriad variables being proposed and opposed. Our results show that multiple environmental and cultural factors interact. Specifically, we found that a language is more likely to have a dedicated word for blue when it is spoken by a larger population, which resides at higher latitudes (where the incidence of UV-B radiation is lower), and near large bodies of standing water (in particular, lakes). While number of speakers is an imperfect proxy for the unmeasured and hard-to-define variable, cultural complexity, which in turn is an indirect reflection of the use of complex dyeing techniques and colored artefacts, the environmental influences were more straightforward. For example, large lakes might not always be poster-card blue, but they certainly tend to be a salient feature and reflect the sky more often than not. Importantly, the tendency of languages spoken closer to the equator to have a distinct term for blue, is likely due to the high incidence of UV-B light, and strongly supports the proposal that acquired lens brunescence has a negative effect on the development or maintenance of a lexical distinction between ‘blue’ and ‘green’.

Overall, these results suggest that variability in color perception leads to differences in the perceptual representation of color space, which then causes differences in color lexicons. It also raises interesting questions about the mechanisms and time required for language change, given that we are considering acquired color deficiency related to aging. If language does adapt to the color vision of its speakers, then our data suggest that language adapts to the reduced capacities of older adults, despite also being spoken by younger adults and children, who are still able to perceive the distinction between green and blue. Intriguingly, other evidence suggests extreme variation in exposure to direct sunlight during very early development also affects color vision38. Nevertheless, our findings show a distinct developmental trajectory. Perhaps older adults change the fitness landscape to which language adapts through cultural evolution at the scale of several generations of language use and transmission39, a process that can be studied using agent-based computer models40,41. The fact that we found no influence of environmental conditions at the putative origins of language families suggests that these effects act on timescales shorter than the few thousand years usually needed for language families to differentiate. The present-day languages of the same language family descend from a single ancient proto-language and, due to migrations and language shifts on the scale of a few thousand years42, may end up spoken in very different locations and environments to those of their proto-language. If the color lexicon needs several thousand years to adapt to changes in UV-B incidence, then the location of these ancient proto-languages should still have a detectable effect, but the lack of such a signal in our data suggests that this change is much faster.

It is striking to note that a similar phenomenon, where language adapts to an altered fitness landscape due to acquired changes in some of the speaker population, has been proposed for several typologically striking properties of Australian Aboriginal languages and the distribution of labiodental sounds. For example, the high frequency of chronic ear infections (chronic otitis media) among Australian Aborigine children can produce partial hearing loss affecting the lower and higher parts of the auditory spectrum, resulting in a loss of fricatives, a centralized vowel system, and a long, thin consonant system in the languages of the Australian continent43. Similarly, consuming processed and softer foods characteristic of agriculture leads to ontogenetic changes in bite (involving the teeth and lower jaw) such that the overbite/overedge is retained into adulthood, favoring the use of labiodental sounds (such as ‘f’ and ‘v’) by languages of populations practicing agriculture44. Our study adds to this literature by showing that external pressures can shape semantic, as well as phonological aspects of language.

Finally, our data also supported an interaction between acquired blue-green and inherited red-green color deficits: populations closer to the equator, more affected by lens brunescence, speaking languages without a dedicated word for blue have a lower frequency of people with red-green color deficiency. The data also suggested that populations of hunter-gatherers tend to have a lower incidence of abnormal red-green color perception. While this hypothesis requires further testing, it would support the idea that certain cultural practices and subsistence strategies might have higher demands on color perception, thereby generating selective pressures against inherited color perception deficits.

Some caution is required in interpretation, however, given limitations in the current study. First, our dataset contained only 14 groups classified as hunter-gatherers, which, even if representative of the current distribution of communities, is too small to allow strong conclusions. Second, more refined genetic data—especially including information about the opsin alleles involved in red-green color deficiencies—would be necessary to better test for selection. Third, there is a need for better and, ideally, more direct measures of the exposure to complex dyeing technology and colored artefacts. Finally, populations and languages are reduced to geographic dots in our analyses, but a better approach might instead be to use aggregated measures of UV-B incidence, climate, ecology and proximity to large bodies of water across the whole area they occupy.

Nevertheless, our results strongly support the view that the color vocabulary is shaped, at least in part, by environmental factors acting on individual speakers, generating biases that are amplified by the repeated use and transmission of language in communities of similarly affected individuals. This is akin to other cases of individual biases being amplified to shape cross-linguistic diversity45, biases that can be either rooted in genetics46,47,48 or emerging due to environmental or cultural factors acting during the lifetime of individuals44.

## Methods and materials

We collected data from 142 populations, extending and re-checking earlier databases9 considerably, especially concerning information about the color lexicon and the incidence of abnormal red-green color perception. Each population was uniquely identified by its Glottolog code49 of the primary language spoken and its geographic location. According to Glottolog, these languages belong to 32 language families (the most represented being Indo-European (41), Atlantic-Congo (19) and Afro-Asiatic (13)) distributed across 6 macroareas (Africa (31), Australia (2), Eurasia (79), Papunesia (9), North America (9) and South America (12)). We also obtained the geographic location of the putative origins of each family49,50. The geographic locations were cos-transformed: cos(longitude) ranges between − 1.0 (− 180) and 1.0 (180), and 1.0 − cos(latitude) ranges between − 1.0 (the South Pole), 0.0 (the equator) and 1.0 (the North Pole).

Based on their location, for each population we obtained the following data: the list of its geographic neighbors (derived from the Delaunay triangulation of our geographic locations taking into account large bodies of water51); its log(elevation+1) using the Mapzen data (through R’s elevatr package); the log(distances) to the nearest lake, ocean, river, or large body of water in general (using the OpenStreetMap data as per52); and data about specific humidity (using data from the NOAA, as the mean of the yearly medians and interquartile ranges), and climate and ecology (as in52, we conducted Principal Component Analysis on the 19 variables from WorldClim for the period 1960–1990, and we retained PC1, explaining 49.7% of the variance and reflecting low seasonality, wet and hot climates, PC2, 24.7%, high seasonality, hot and dry climates, and PC3, 8.6%, unclear interpretation). We calculated the incidence of ultraviolet light (UV) using the data from NASA Total Ozone Mapping Spectrometer (TOMS) for the year 1998 (in order to replicate the procedure in9): these contain daily measures of UV radiation received by the human body at four wavelengths (305 nm, 310 nm, 320 nm, and 380 nm) in J/m2, taking into account the thickness of the ozone layer in the stratosphere, the amount of cloud cover, the elevation, and how high the sun is in the sky. Here, we use the whole-year means and standard deviations for UV-A (315 nm to 400 nm), UV-B (280 nm to 315 nm) and the full spectrum.

Moreover, we obtained each population’s log(size) (from52) and subsistence strategy, dichotomized into populations whose subsistence mode is primarily based on hunting, fishing, gathering and/or foraging (‘HG’) and populations with subsistence modes centered around food production (‘AGR’), using data from44,53,54,55. The information about color vocabulary specifically concerns the existence of a specific term for blue (a dichotomous variable, ‘no’/‘yes’), and we also collected data about the frequency of abnormal red/green color perception in males excluding, if specified, tritanopia, and those samples that do not distinguish between males and females, or with fewer than 50 individuals (see Fig. 1 Panel A and Supplementary Materials for details).

Finally, to control for past demography and selection (about which we do not have direct information here), we estimated the overall pairwise genetic distances between populations, estimated by the fixation index $$F_{ST}$$. We used a set of genetic markers aimed at forensic applications (FROG-kb) from the ALFRED database56 with ultrametric missing value imputation57,58, followed by classic multi-dimensional scaling (MDS), retaining the first 10 dimensions (goodness-of-fit 10.5%).

We used the following short variable names: glottocode (Glottolog language code), family (language family), macroarea (the macroarea), latitude and longitude (the transformed geographical coordinates), elevation (transformed elevation), distance to ocean, distance to lakes, distance to rivers and distance to water (transformed distances to bodies of water), humidity median and humidity IQR (humidity), climate PC1, climate PC2, climate PC3 (the first 3 climate Principal Components, z-scored), mean UV-A, sd UV-A, mean UV-B, sd UV-B, mean UV, sd UV (UV incidence, z-scored), population size (transformed population size), subsistence (subsistence strategy), blue (is there a specific term for blue?), daltonism (frequency of abnormal red/green color perception), and genetic D1 ... genetic D10 (first 10 MDS dimensions of the genetic distances matrix); the suffix _family refers to the values at the putative family origins.

All analyses were done using R59. We used five main classes of methods in our analyses (see the Supplementary Materials). First, in order to understand the geographical patterning of our data, we tested it against complete spatial randomness60,61 using the the $$\chi ^2$$ test based on quadrat counts, and the G, F and K functions, and we also estimated its spatial autocorrelation using Moran’s I (with either the inverse of the shortest geographical distance “as the crow flies”, or the nearest neighbor distance on the Delaunay triangulation). Second, we fitted Bayesian mixed-effect regression models (as implemented by R’s package brms62 using Stan63) with family and macroarea as random effects, and explanatory variables and potential confounds as predictors. For blue we performed logistic regression, and for daltonism we performed beta regression (after replacing the 0.0% values by 0.01%); we used model selection based on Bayes Factors (BF), leave-one-out cross-validation (LOO), the Widely Applicable Information Criterion (WAIC), and K-Fold Cross-Validation (KFOLD, with K = 10). Third, we conducted mediation analysis using a Bayesian mixed effects model (brms with Stan) approach which allowed us to include random effects. Fourth, we used path analysis (as implemented by lavaan64) to test more complex models involving multiple potential predictors (but without controlling for the non-independence due to family and macroarea). Finally, we quantified the predictive capacity of a set of variables using Bayesian mixed-effects regression (brms with Stan), conditional inference trees (ctree65 in package partykit66), random forests (package randomForest67) and conditional random forests (cforest in package partykit) and Support Vector Machines (rminer package68).