Environment and culture shape both the colour lexicon and the genetics of colour perception

Many languages express ‘blue’ and ‘green’ under an umbrella term ‘grue’. To explain this variation, it has been suggested that changes in eye physiology, due to UV-light incidence, can lead to abnormalities in blue-green color perception which causes the color lexicon to adapt. Here, we apply advanced statistics on a set of 142 populations to model how different factors shape the presence of a specific term for blue. In addition, we examined if the ontogenetic effect of UV-light on color perception generates a negative selection pressure against inherited abnormal red-green perception. We found the presence of a specific term for blue was influenced by UV incidence as well as several additional factors, including cultural complexity. Moreover, there was evidence that UV incidence was negatively related to abnormal red-green color perception. These results demonstrate that variation in languages can only be understood in the context of their cultural, biological, and physical environments.


Figure 1. (Panel A):
Map of languages and populations in our data. Each circle represents one language and corresponding population, with the presence (blue) or absence (yellow) for a specific term for 'blue' . The circle radius is proportional to the population frequency of red/green abnormal color perception. The intensity of the background magenta is proportional to the incidence of UV-B radiation as given by the NASA Total Ozone Mapping Spectrometer (TOMS) for the year 1998. This map was generated with the package maps in R version 4.0.5 59 , which uses public domain data from the Natural Earth project https:// www. natur alear thdata. com. (Panel B): Posterior density plots of the best predictors of the presence of a word for blue across Bayesian regression models. (Panel C): Posterior density plots of the best predictors of the frequency of redgreen abnormal color perception across Bayesian regression models. The plots in panels (B) and (C) are based on the analyses detailed in the "Results" section and were generated using R 59 . (Panel D): The lens brunescence hypothesis, as detailed in this paper. UV rays cause the lens to opacify and become yellower (top), filtering blue light, so that the perceived spectrum (on the right) contains less blue and more green, compared to the nonaffected eye (bottom). (Panel E): A simulation of different types of color perception abnormalities. From left to right: normal vision; one type of red-green abnormal color perception (deuteranopia); strong lens brunescence (yellow filter and darkening of the image); both lens brunescence and deuteranopia, hypothesized to result in decreased biological fitness relative to other color vision types. Background image cropped from a photo by Joydeep Pal on Unspalsh (https:// unspl ash. com/ photos/ i6kcc imZz_8) under a free-to-use license (https:// unspl ash. com/ licen se). For deuteranopia we used Coblis-The Color BLIndness Simulator (https:// www. colorblind ness. com/ coblis-color-blind ness-simul ator/) and checked the results with the Colorblind Web Page Filter (https:// www. toptal. com/ desig ners/ color filter). The lens brunescence filter is inspired by 30 : we used Adobe Lightroom classic version 10.0 with a yellow filter and darkening to simulate a Kodak yellow filter. For more details, see the "Results", "Materials and Methods" and Supplementary Materials. www.nature.com/scientificreports/ proto-language 31 ), exclusion of possible alternative explanatory factors (such as cultural complexity, environmental influences), lack of modeling of causal pathways, and the statistical methods used. Therefore, we combine multiple explanatory factors and potential confounds, including language contact generally (since specific data about the borrowing of words for blue is not available for most languages under consideration) and language family, subsistence strategy and population size (as proxies for cultural complexity 32, 33 as systematic, quantitative data about cultural artefacts and dyeing techniques across cultures does not exist), latitude (which partly determines UV-B incidence), climate, ecology and distance to large bodies of water (as proxies for various environmental influences) using a comprehensive database and advanced statistical methods. In doing so, we not only test the role of UV-B light on the development of the color lexicon, but also explicate the extent to which additional hypothesized factors shape the presence of a specific term for blue, and where relevant through which causal pathways specifically. Environmentally-induced (or acquired) abnormal color perception is not the only type of color deficiency. Notably, inherited red-green color-blindness has a relatively simple genetic basis where certain alleles at the opsin genes on the X chromosome encode for abnormal pigments, resulting in a higher incidence of abnormal color perception in males specifically affecting the low (red) and mid (green) frequencies of the visual spectrum ( Fig. 1 Panel E). It has previously been suggested that a subsistence strategy based on hunting and gathering might generate selective pressures against red-green abnormal color perception due to the high survival relevance of these colors in the wild, while the transition to agriculture and, in particular the industrial revolution, may have greatly relaxed these pressures 34,35 .
Moreover, these two broad types of abnormal color perception could interact with each other. At the individual level, for somebody already affected by a relatively profound red-green deficiency, also acquiring lens brunescence would have devastating effects on the ability to use color ( Fig. 1 Panel E), and may affect survival and possibly the ability to produce and raise offspring. This predicts that, despite their different and a priori independent causal mechanisms (environmental exposure vs genetic), a negative correlation between the two could emerge across evolutionary time, in that there may be stronger selective pressure against the alleles responsible for red-green abnormal color perception in populations with high UV-B exposure, leading across generations to an overall lower frequency of such inherited abnormalities in these populations 9 . Here, we not only test the hypothesis that there is a negative relationship between UV-B incidence and the population frequency of red-green color deficiency using advanced methods and controls, we also include the potential influence of subsistence strategy.
In summary, we test two main hypotheses, one linking the existence of a dedicated word for blue to UV-B incidence while simultaneously controlling for other factors, and a second concerning the selective pressures against abnormal red-green vision in different environments. We do so on a large dataset of 142 populations ( Fig. 1 Panel A) speaking languages from 32 families across the world, for which we collected information about the existence of a dedicated word for blue, the incidence of ultraviolet light, and the frequency of red-green color deficiency, as well as data on elevation, population size, subsistence strategy, climate, ecology, distance to large bodies of water, and several additional variables concerning the populations and physical environments they inhabit (see Methods and Supplementary Materials). We used Bayesian hierarchical regression, mediation and path analysis, and various machine learning techniques, informed by previous research and causal modeling, to specifically test these hypotheses in a comprehensive manner. We found that both hypotheses were by and large supported, but importantly they only captured part of the overall picture ( Fig. 1 Panels B and C). While variation in UV-B incidence (ultimately due to variation in latitude) was the most important predictor of 'blue' , population size and distance to large bodies of standing water also played a role, with their effects mediated by climate and subsistence strategy. Likewise, there was a lower frequency of people with red-green color deficiency in populations closer to the equator (i.e., with more lens brunescence), and whose languages lacked a dedicated word for blue.

Results
UV-B incidence predicts the existence of a specific word for blue. Here we tested the hypothesis that the incidence of UV (in particular, UV-B) predicts the existence of a specific word for blue, taking into consideration various potential confounds and causal pathways.
Both UV-A and UV-B affected blue negatively (here, blue is the dichotomous variable 'is there a specific term for blue in the language?' , italics indicate variable names; for more information about the variables, see , p(β = 0) = 0.0128 , p(β < 0) = 1 . Please note that for such Bayesian regression results we report the slope, β , of the fixed effect(s) of interest in terms of their mean and 95% HDI (Highest Density Interval), as well as specific tests that capture the posterior probability that the hypothesis is true (i.e., they should be interpreted directly, and not as frequentist p-values in terms of the probability of seeing such a result were the null hypothesis true); also, while directional hypotheses should be a priori motivated, point hypothesis do not have the same requirement. However, as UV-A and UV-B were highly multicollinear ( VIF meanUVA = 36.3 , VIF meanUVB = 36.3 ) and UV-A fitted the data worse than UV-B ( BF = 0.31 , LOO = −1.29 ± 0.77 , WAIC = −1.13 ± 0.77 , KFOLD = −1.20 ± 1.63 ; for such comparisons between two models, m 1 and m 2 , we report the Bayes Factor, BF, ideally < 0.033 or > 10, as well as the LOO, WAIC and KFOLD as the difference in the Expected Log pointwise Predictive Density, ELPD, between the two models and its standard error, for which the absolute difference should be larger than the standard error; ideally, all these criteria should agree, but they capture a priori different aspects of model comparison 36  . We explored other mediation models (see Supplementary Materials) which, taken together, suggest that distance to lakes also mediates between latitude and blue, and that subsistence is linked to blue through population size. We also constructed a complex (but largely theoretically motivated and conservative) path model (Fig. 2 Panel A) linking latitude and blue through UV-B and controlling for potential confounds. This model fitted the data very well ( χ 2 (1) = 0.4 , p = 0.53 ; CFI = 1.0 , TLI = 1.05 , NNFI = 1.05 , RMSEA = 0.0 ) and, importantly, showed that the effect of latitude on blue is fully mediated not only by UV-B ( β latitude→UVB = −0.96 , p < 0.001 , and β UVB→blue = −0.65 , p = 0.032 ), but also by subsistence, population size and distance to lakes.
The presence of a distinct term for blue was predicted by information in the full dataset using Bayesian mixed-effects logistic regression with family and macroarea as random effects, the conditional random trees, the random and conditional random forests, and the Support Vector Machines, with very little difference between them (accuracy from 76% for random forests to 91% for the Bayesian mixed-effects model). Moreover, all techniques generalized well and had comparable performance (as expected, lower on the testing sets than on the full database), the best being the conditional random forests and conditional trees. Across these, the most important predictors tended to be (not in any particular order): UV-B, population size, climate/humidity, latitude, and distance to lakes (see Fig. 3 Panel A).
Population frequency of abnormal red-green vision. In a second set of analyses, we tested whether the population frequency of abnormal red-green color perception (here, we designate this variable as daltonism) was linked to the incidence of UV light and the presence of a dedicated word for blue (as a proxy for its physiological effects on blue-green color perception; see  . 1st row: Using Bayesian mixed-effect models (BRMS), the best predictors' (according to Bayes Factor, WAIC, LOO and K-Fold methods) slope estimates. 2nd row: specificity-based predictor importance from SVMs. 3rd row: accuracy-based predictor importance from random forests (RF), measuring the amount by which the accuracy decreases when one variable is removed from the model; higher values represent more important predictors. 4th row: Gini-index-based predictor importance from random forests (RF); this measures by how much the Gini impurity decreases when a variable is chosen to split a node (note, only relative values matter, and there is a bias towards using numeric variables to split nodes). 5th row: unconditional predictor importance from conditional random forests (CF); this is similar to the accuracy-based importance from random forests. 6th row: the performance of the four methods (BRMS, SVM, RF and CF) in terms of accuracy (left; as this is a binary classification problem) and R 2 (right; as this is a regression problem). Variable names have been abbreviated for legibility: popSize is population size, humid_m is median humidity, dist2lak is the distance to the closest lake, lat is latitude, genD4 is the 4th dimension of the multidimensional scaling of the between-populations genetic distances, climPC1 is the 1st principal component resulting from the Principal Component Analysis (PCA) of the climate variables, macroar is the macroarea, long is the longitude, and dist2wat is the distance to the closest body of water (ocean/sea, lake or river). Plots generated using R 59 . Converging evidence across analysis methods predicted daltonism on the full dataset, but this performance did not generalize well. Moreover, the importance of the predictors varied across methods. Nevertheless, UV-B tended to be among the most important predictors, along with blue and some genetic distance measures (see Fig. 3 Panel B). The latent variable 'abn. blue perc. ' , representing the unobserved abnormal color perception of blue, is indirectly measured by two indicators ('daltonism' and 'blue'), while the latent 'abn. red-green perc. ' (abnormal red/green color perception) is not of interest here. Therefore, we used 'blue' as a proxy for 'abn. blue perc. ' and ignored 'abn. red-green perc. ' , leading to the path model shown here. Icons made by Freepik https:// www. freep ik. com from Flaticon https:// www. flati con. com/.

Discussion
Taken together, these analyses demonstrate that no single explanatory factor explains the color lexicon. This finding resonates with the centuries of debate on this topic with myriad variables being proposed and opposed.
Our results show that multiple environmental and cultural factors interact. Specifically, we found that a language is more likely to have a dedicated word for blue when it is spoken by a larger population, which resides at higher latitudes (where the incidence of UV-B radiation is lower), and near large bodies of standing water (in particular, lakes). While number of speakers is an imperfect proxy for the unmeasured and hard-to-define variable, cultural complexity, which in turn is an indirect reflection of the use of complex dyeing techniques and colored artefacts, the environmental influences were more straightforward. For example, large lakes might not always be postercard blue, but they certainly tend to be a salient feature and reflect the sky more often than not. Importantly, the tendency of languages spoken closer to the equator to have a distinct term for blue, is likely due to the high incidence of UV-B light, and strongly supports the proposal that acquired lens brunescence has a negative effect on the development or maintenance of a lexical distinction between 'blue' and 'green' . Overall, these results suggest that variability in color perception leads to differences in the perceptual representation of color space, which then causes differences in color lexicons. It also raises interesting questions about the mechanisms and time required for language change, given that we are considering acquired color deficiency related to aging. If language does adapt to the color vision of its speakers, then our data suggest that language adapts to the reduced capacities of older adults, despite also being spoken by younger adults and children, who are still able to perceive the distinction between green and blue. Intriguingly, other evidence suggests extreme variation in exposure to direct sunlight during very early development also affects color vision 38 . Nevertheless, our findings show a distinct developmental trajectory. Perhaps older adults change the fitness landscape to which language adapts through cultural evolution at the scale of several generations of language use and transmission 39 , a process that can be studied using agent-based computer models 40,41 . The fact that we found no influence of environmental conditions at the putative origins of language families suggests that these effects act on timescales shorter than the few thousand years usually needed for language families to differentiate. The present-day languages of the same language family descend from a single ancient proto-language and, due to migrations and language shifts on the scale of a few thousand years 42 , may end up spoken in very different locations and environments to those of their proto-language. If the color lexicon needs several thousand years to adapt to changes in UV-B incidence, then the location of these ancient proto-languages should still have a detectable effect, but the lack of such a signal in our data suggests that this change is much faster.
It is striking to note that a similar phenomenon, where language adapts to an altered fitness landscape due to acquired changes in some of the speaker population, has been proposed for several typologically striking properties of Australian Aboriginal languages and the distribution of labiodental sounds. For example, the high frequency of chronic ear infections (chronic otitis media) among Australian Aborigine children can produce partial hearing loss affecting the lower and higher parts of the auditory spectrum, resulting in a loss of fricatives, a centralized vowel system, and a long, thin consonant system in the languages of the Australian continent 43 . Similarly, consuming processed and softer foods characteristic of agriculture leads to ontogenetic changes in bite (involving the teeth and lower jaw) such that the overbite/overedge is retained into adulthood, favoring the use of labiodental sounds (such as 'f ' and 'v') by languages of populations practicing agriculture 44 . Our study adds to this literature by showing that external pressures can shape semantic, as well as phonological aspects of language.
Finally, our data also supported an interaction between acquired blue-green and inherited red-green color deficits: populations closer to the equator, more affected by lens brunescence, speaking languages without a dedicated word for blue have a lower frequency of people with red-green color deficiency. The data also suggested that populations of hunter-gatherers tend to have a lower incidence of abnormal red-green color perception. While this hypothesis requires further testing, it would support the idea that certain cultural practices and subsistence strategies might have higher demands on color perception, thereby generating selective pressures against inherited color perception deficits.
Some caution is required in interpretation, however, given limitations in the current study. First, our dataset contained only 14 groups classified as hunter-gatherers, which, even if representative of the current distribution of communities, is too small to allow strong conclusions. Second, more refined genetic data-especially including information about the opsin alleles involved in red-green color deficiencies-would be necessary to better test for selection. Third, there is a need for better and, ideally, more direct measures of the exposure to complex dyeing technology and colored artefacts. Finally, populations and languages are reduced to geographic dots in our analyses, but a better approach might instead be to use aggregated measures of UV-B incidence, climate, ecology and proximity to large bodies of water across the whole area they occupy.
Nevertheless, our results strongly support the view that the color vocabulary is shaped, at least in part, by environmental factors acting on individual speakers, generating biases that are amplified by the repeated use and transmission of language in communities of similarly affected individuals. This is akin to other cases of individual biases being amplified to shape cross-linguistic diversity 45 , biases that can be either rooted in genetics [46][47][48] or emerging due to environmental or cultural factors acting during the lifetime of individuals 44 .

Methods and materials
We collected data from 142 populations, extending and re-checking earlier databases 9 considerably, especially concerning information about the color lexicon and the incidence of abnormal red-green color perception. Each population was uniquely identified by its Glottolog code 49 of the primary language spoken and its geographic location. According to Glottolog, these languages belong to 32 language families (the most represented being Indo-European (41), Atlantic-Congo (19) and Afro-Asiatic (13)) distributed across 6 macroareas (Africa (31), Australia (2), Eurasia (79), Papunesia (9), North America (9) and South America (12)). We also obtained the www.nature.com/scientificreports/ geographic location of the putative origins of each family 49,50 . The geographic locations were cos-transformed: cos(longitude) ranges between − 1.0 (− 180) and 1.0 (180), and 1.0 − cos(latitude) ranges between − 1.0 (the South Pole), 0.0 (the equator) and 1.0 (the North Pole).
Based on their location, for each population we obtained the following data: the list of its geographic neighbors (derived from the Delaunay triangulation of our geographic locations taking into account large bodies of water 51 ); its log(elevation+1) using the Mapzen data (through R's elevatr package); the log(distances) to the nearest lake, ocean, river, or large body of water in general (using the OpenStreetMap data as per 52 ); and data about specific humidity (using data from the NOAA, as the mean of the yearly medians and interquartile ranges), and climate and ecology (as in 52 , we conducted Principal Component Analysis on the 19 variables from World Clim for the period 1960-1990, and we retained PC1, explaining 49.7% of the variance and reflecting low seasonality, wet and hot climates, PC2, 24.7%, high seasonality, hot and dry climates, and PC3, 8.6%, unclear interpretation). We calculated the incidence of ultraviolet light (UV) using the data from NASA Total Ozone Mappi ng Spect romet er (TOMS) for the year 1998 (in order to replicate the procedure in 9 ): these contain daily measures of UV radiation received by the human body at four wavelengths (305 nm, 310 nm, 320 nm, and 380 nm) in J/m 2 , taking into account the thickness of the ozone layer in the stratosphere, the amount of cloud cover, the elevation, and how high the sun is in the sky. Here, we use the whole-year means and standard deviations for UV-A (315 nm to 400 nm), UV-B (280 nm to 315 nm) and the full spectrum.
Moreover, we obtained each population's log(size) (from 52 ) and subsistence strategy, dichotomized into populations whose subsistence mode is primarily based on hunting, fishing, gathering and/or foraging ('HG') and populations with subsistence modes centered around food production (' AGR'), using data from 44,[53][54][55] . The information about color vocabulary specifically concerns the existence of a specific term for blue (a dichotomous variable, 'no'/'yes'), and we also collected data about the frequency of abnormal red/green color perception in males excluding, if specified, tritanopia, and those samples that do not distinguish between males and females, or with fewer than 50 individuals (see Fig. 1 Panel A and Supplementary Materials for details).
Finally, to control for past demography and selection (about which we do not have direct information here), we estimated the overall pairwise genetic distances between populations, estimated by the fixation index F ST . We used a set of genetic markers aimed at forensic applications (FROG-kb) from the ALFRED database 56 with ultrametric missing value imputation 57, 58 , followed by classic multi-dimensional scaling (MDS), retaining the first 10 dimensions (goodness-of-fit 10.5%).
We used the following short variable names: glottocode (Glottolog language code), family (language family), macroarea (the macroarea), latitude and longitude (the transformed geographical coordinates), elevation (transformed elevation), distance to ocean, distance to lakes, distance to rivers and distance to water (transformed distances to bodies of water), humidity median and humidity IQR (humidity), climate PC1, climate PC2, climate PC3 (the first 3 climate Principal Components, z-scored), mean UV-A, sd UV-A, mean UV-B, sd UV-B, mean UV, sd UV (UV incidence, z-scored), population size (transformed population size), subsistence (subsistence strategy), blue (is there a specific term for blue?), daltonism (frequency of abnormal red/green color perception), and genetic D1 ... genetic D10 (first 10 MDS dimensions of the genetic distances matrix); the suffix _family refers to the values at the putative family origins.
All analyses were done using R 59 . We used five main classes of methods in our analyses (see the Supplementary Materials). First, in order to understand the geographical patterning of our data, we tested it against complete spatial randomness 60, 61 using the the χ 2 test based on quadrat counts, and the G, F and K functions, and we also estimated its spatial autocorrelation using Moran's I (with either the inverse of the shortest geographical distance "as the crow flies", or the nearest neighbor distance on the Delaunay triangulation). Second, we fitted Bayesian mixed-effect regression models (as implemented by R's package brms 62 using Stan 63 ) with family and macroarea as random effects, and explanatory variables and potential confounds as predictors. For blue we performed logistic regression, and for daltonism we performed beta regression (after replacing the 0.0% values by 0.01%); we used model selection based on Bayes Factors (BF), leave-one-out cross-validation (LOO), the Widely Applicable Information Criterion (WAIC), and K-Fold Cross-Validation (KFOLD, with K = 10). Third, we conducted mediation analysis using a Bayesian mixed effects model (brms with Stan) approach which allowed us to include random effects. Fourth, we used path analysis (as implemented by lavaan 64 ) to test more complex models involving multiple potential predictors (but without controlling for the non-independence due to family and macroarea). Finally, we quantified the predictive capacity of a set of variables using Bayesian mixed-effects regression (brms with Stan), conditional inference trees (ctree 65 in package partykit 66 ), random forests (package randomForest 67 ) and conditional random forests (cforest in package partykit) and Support Vector Machines (rminer package 68 ).

Data availability
All data and scripts needed to reproduce the results reported in this paper are available in the GitHub repository https:// github. com/ ddediu/ colors-UV and on Zenodo at https:// doi. org/ 10. 5281/ zenodo. 50836 76.