Duration of agriculture and distance from the steppe predict the evolution of large-scale human societies in Afro-Eurasia

Understanding why large, complex human societies have emerged and persisted more readily in certain regions of the world than others is an issue of long-standing debate. Here, we systematically test different hypotheses involving the social and ecological factors that may ultimately promote or inhibit the formation of large, complex human societies. We employ spatially explicit statistical analyses using data on the geographical and temporal distribution of the largest human groups over a 3000-year period of history. The results support the predictions of two complementary hypotheses, indicating that large-scale societies developed more commonly in regions where (i) agriculture has been practiced for longer (thus providing more time for the norms and institutions that facilitate large-scale organisation to emerge), and (ii) warfare was more intense (as proxied by distance from the Eurasian steppe), thus creating a stronger selection pressure for societies to scale up. We found no support for the influential idea that large-scale societies were more common in those regions naturally endowed with a higher potential for productive agriculture. Our study highlights how modern cultural evolutionary theory can be used to organise and synthesise alternative hypotheses and shed light on the ways ecological and social processes have interacted to shape the complex social world we live in today.


Introduction
T he size and complexity of modern human societies is on a scale unmatched in other species. Yet for much of our evolution humans lived in small-scale, internally undifferentiated groups, and it is only in the last several thousand years that larger-scale societies with more complex forms of organisation began to develop resulting in what we can label "macrostates" or "empires" involving millions of individuals. Anecdotal and empirical research indicates that historically the largest human societies tended to be situated in a relatively narrow band of the Afro-Eurasian landmass, stretching from Western and Central Europe and the Mediterranean in the West, through to China in the East (Fig. 1) (Diamond, 1997;Turchin et al., 2013). Understanding how and why humans are able to form functioning societies on such a scale, and why large, complex societies have tended to form more readily in certain places are questions of long-standing interest across a range of disciplines (Carneiro, 2003;Flannery and Marcus, 2012;Sanderson, 2015). The strong geographic patterns noted above suggest that ecology may play an important role yet a number of other factors have been proposed to be important in driving the evolution of social complexity such as the development and productivity of agriculture (Diamond, 1997;Nielsen, 2004), information processing (Morris, 2013), warfare (Turchin et al., 2013), the geography of continental land masses (Diamond, 1997), technology (Morris, 2013), and religion (Norenzayan et al., 2014). However, there have been relatively few empirical tests of these competing ideas within a common theoretical framework. Here, we employ modern cultural evolutionary theory to systematically develop and empirically test a range of alternative hypotheses involving the socio-ecological factors that may ultimately promote or inhibit the formation of large, complex human societies.
Deriving cultural evolutionary hypotheses. Cultural evolutionary theory (CET) is a conceptual framework concerned with understanding how and why socio-cultural traits emerge and spread (Boyd and Richerson, 1985;Henrich, 2015;Mesoudi, 2011). In CET cultural traits are seen to exhibit variation that is inherited from one individual or group to another, and when there is competition then selection and adaptation can occur; mirroring the key processes of biological evolution (Futuyma, 2013). For groups to remain politically unified as they expand their territory, either through the physical movement of people, or the joining together or annexation of other groups, cultural norms and institutions must be developed that structure social interactions and enable social cohesion (Fukuyama, 2011;North, 1990;Turchin, 2016;Turchin et al., 2018). For example, establishment of formal leaders with the authority to punish free-riders can solve collective action problems (Smith et al., 2015), while more hierarchical organisation and specialised, bureaucratic forms of political organisation can improve coordination over larger distances (Carneiro, 1981;Spencer, 2010;Turchin and Gavrilets, 2009). While a large number of processes may be involved in the evolution of large-scale societies, our focus here is on factors that have systematically affected the geographic and temporal distribution of such groups. From a CET perspective, the variation seen in the occurrence of large-scale societies, could be due to differences in different parts of the world relating to (i) the benefits and costs of large-scale organisation (selection), (ii) the generation of different socio-cultural traits involved in largescale organisation (variation), (iii) the transmission of these traits across time and space (inheritance). Here, we develop specific hypotheses about the factors in the real-world relating to these processes of selection, variation, and inheritance.
In humans, competition between groups is potentially a strongselective force. While warfare has probably occurred throughout history, the intensity of between-group conflict has varied across space and time. A major historically attested factor intensifying warfare was the development of horse-based military technologies such as chariots and cavalry (Turchin et al., 2013). These first developed in the pastoralist societies of the Eurasian steppe and enabled such groups to raid settled agricultural societies in regions neighbouring the steppe, sometimes inflicting severe casualties (Turchin, 2010). It is hypothesised that pressure from the steppe selected for the unification and scaling-up of agricultural societies into larger groups to more effectively counteract these incursions. This in turn would select for greater size in pastoralist communities, and also other neighbouring agricultural groups who were now relatively smaller and at a competitive disadvantage with their neighbours. This effect would be amplified by the diffusion of such military technology from the steppe. Under this hypothesis we would predict that there is relationship between the occurrence of large-scale societies and distance from the Eurasian steppe (relating to the presence of intensive forms of horse-based warfare), with large-scale societies occurring more frequently nearer the steppe (the "steppe warfare" hypothesis).
A number of factors may affect the probability of developing the kinds of norms and institutions that underpin larger scale societies (i.e., the generation of variation may be greater in some regions than others). While the rate of cultural evolution is generally faster than biological evolution, the development of social norms and institutions for collective action is not straightforward and may require long periods of cultural experimentation (Richerson and Boyd, 2001;Wright, 2006). Furthermore, norms and institutions may need to accumulate over generations and build on preceding innovations (Currie et al., 2016;Flitton and Currie, 2018). Differences in the time that has been available to societies to develop the institutions that underpin stable large-scale organisation may therefore play an important role in explaining the distribution of such societies. Related to this, agriculture is often cited as being a necessary condition for large, centralised societies as it enables societies to develop and finance institutions of coordination and control involving political specialists (leaders, bureaucrats, priests etc.) who do not produce their own food but are supported by the rest of the population due to the productive nature of the resource base (Johnson and Earle, 2000;Mayshar et al., 2015). Both the duration (Diamond, 1997;Morris, 2013) and productivity (Johnson and Earle, 2000;Nielsen, 2004) of agriculture have featured prominently in debates in the evolution of sociopolitical complexity. For this paper we therefore predict that large-scale societies would occur more in places where agriculture has been practiced for longer (the "duration of agriculture" hypothesis), or where the productivity of agriculture is higher (the "agricultural productivity" hypothesis). For the second of these ideas we focus on the hypothesis that some regions may have had more favourable ecological conditions and examine potential agricultural productivity (rather than achieved productivity), representing a kind of productivity endowment.
The above hypotheses capture the roles of selection and variation (which is inherited and accumulated over time) in potentially affecting the spatio-temporal distribution of largescale societies. However, these processes can also interact-in order for selection to occur variation in cultural traits needs to be generated. We therefore also assess whether the effect of selection is stronger in regions that have had longer to develop some of the norms and institutions that underpin larger-scale organization ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-020-0516-2 (i.e., we predict a positive interaction between proximity to the steppe and duration of agriculture).
Finally, we also control for two factors in assessing support for the above ideas. We assess whether more rugged regions are more difficult to be conquered and brought under control by an outside force (the "terrain ruggedness" hypothesis) (Scott, 2014). We also control for the possibility that the spread of large-scale societies and their norms and institutions was not due to any of the processes outlined above but may simply have been contingent on where the first such societies initially arose (the "first empires" hypothesis)-warfare may have been more intense initially near these regions, leading to scaling up of societies in neighbouring regions, and/or technological or social innovations may have diffused out from these regions.
In previous work (Turchin et al., 2013), we tested the logic and plausibility of the "steppe warfare" hypothesis through developing an agent-based model of this process, running the model in a geographically explicit framework, and testing how well the output from the simulation matched the observed historical distribution of large-scale societies. However, this hypothesis was not systematically tested against other plausible ideas, so it was not possible to evaluate the importance of such a process in shaping the evolution of human societies. Importantly, there are a number of other processes that lead to similar spatial predictions as the "steppe warfare" hypothesis. Here, we use the same outcome variable as in our previous work (the extent to which different parts of the world have been inhabited by large-scale societies-see below), and assess the relative importance of different predictors representing different hypotheses and derived from newly collated data in a geographically explicit statistical modelling framework.

Data.
In order to test the predictions of these hypotheses we statistically assess how well these factors explain real-world data on the historical and geographical distribution of large-scale societies. A spatial explicit dataset was created using Geographic Information Systems. Data were organised in grid cells under an equal-area projection so as to maintain a constant cell size of 10,000 km 2 (~100 km by 100 km).
Dependent variable. We examine data on the spatial and temporal distribution of large states and empires in Afro-Eurasia from the period 1500 BCE-1500 CE (Turchin et al., 2013). Our data consist of maps of the extent of politically unified states and empires ("polities") compiled from historical atlases and other sources and sampled at 100-year intervals. Polygons of polities at each time step were created and areas were calculated under an equal-area projection. For the main analyses polygons <100,000 km 2 were not included in the final dataset (we explore the effects changing this threshold to 80,000 and 120,000 km 2 in the SI). For each time step we assessed whether a grid cell was occupied by a polity meeting the size threshold. Data from different time slices are combined to assess how frequently different cells have been occupied by polities over the entire 3000 year time-span to create our main dependent variable-"imperial density" (Fig. 1). To focus our analyses on regions in which agriculture was the main form of subsistence, analyses were conducted using cells in which agriculture was practiced by 1500 CE, and cells in which agricultural production was at least potentially possible (i.e., values for potential agricultural productivity were greater than zero-see below). This means that our analyses are focussed on areas where agricultural populations can actually live and we avoid including cells that are not inhabitedsuch as desert areas.
Predictor variables. In order to test the different hypotheses outlined above we used historical and ecological sources of information to create the following predictor variables: (i) Distance from the Eurasian steppe (as a proxy for the intensity of warfare), (ii) time since development of agriculture, (iii) potential agricultural productivity, (iv) terrain ruggedness, (iv) distance from the first empires present at 1500 BCE. To assess whether our results are due to assumptions made in the construction of predictor variables we created two versions of the distance-from-steppe variable (a more inclusive classification (Distance from Steppe (Max)), and a more restrictive (Distance from Steppe (Min)), and three versions of the duration-of-agriculture variable. This enables us to assess whether our results are robust to the kind of uncertainty that is present when conducting these kinds of analyses.
i. Distance from the Eurasian steppe: The steppe region was defined according to the World Wide Fund for Nature terrestrial ecoregions of the world map. We explore the effects of different assumptions about what should be classified as the Steppe. One assumption has Steppe areas extending into the levant region of the middle east, and into Kyrgyzstan in Central Asia (maximal extent-distance from the Steppe (Max)), while the other stops in the Caucasus, and does not extend into Kyrgyzstan (minimal extentdistance from the Steppe (Min))( Table 1 and SI). Great-Circle distances from the Steppe were calculated in R using the package geosphere (Hijmans, 2019). ii. Duration of agriculture: An estimate of the time since agriculture was practiced in various parts of the world was taken from a variety of sources reflecting the latest archaeological information (see SI for definitions and literature). For testing this hypothesis we are primarily concerned with evidence about when societies began to cultivate food as a major part of their diet (see SI). Uncertainty in these dates was incorporated by specifying minimum and maximum dates and creating two additional maps. We calculated the average duration of agriculture for each cell from these maps therefore creating three variables: a "best" estimate, and "minimum", and "maximum" estimates). iii. Potential agricultural productivity: The agricultural productivity hypothesis reflects the natural endowment of conditions conducive to productive agriculture (e.g., climate, available crops, soil) rather than achieved agricultural production that may rely on technological or cultural innovations that raise productivity (cf. Currie et al. (2015)). Data on estimated potential crop productivities were taken from the Food and Agriculture Organisation of the UN's Global Agro-Ecological Zones (FAO GAEZ) methodology v.3 (FAO, 2012). Different crop types were used in different regions to reflect the kinds of crops that were grown historically. We make the simplifying assumption that overall potential agricultural productivity can be reasonably proxied by the maximum potential yield from the main carbohydrate staple crop. Climatic effects were based on the baseline average climatic conditions from 1961 to 1990. See discussion and SI for evaluation of these assumptions with respect to this paper. iv. Elevation: Elevation data were taken from the GTOPO30 digital elevation model of the world (USGS)(resolution: 30 arc-seconds) (USGS, 1993). The measure of the unevenness of terrain was calculated as the standard deviation of altitude across each grid cell. v. Distance from first empires: The first empires are defined as those regions that had empires >100,000 km 2 at 1500 BC. Great-Circle distances from the first empires were calculated in R using the package geosphere (Hijmans, 2019).
Statistical analyses. Analyses were conducted in R version 3.6.3 (R Core Team, 2015). Simple Pearson and Spearman correlational analyses were carried out using the base package in R. To systematically test between the hypotheses, we conducted statistical analyses using a generalised least squares (GLS) framework, which allows incorporation of the spatial structure and autocorrelations contained in the data (see Supplementary Information). The main spatially explicit GLS analyses were carried out using a modified version of the nlme (non-linear mixed-effects) package (Pinheiro et al., 2015), using longitude and latitude as random control variables (following Pinheiro and Bates (2009) but with distances calculated based on great-circle distances (see SI)). Including the spatial structure of the data in nlme using GLS is extremely memory intensive. We therefore ran analyses over a number of random sub-samples of the data each involving 1000 cells in order to ensure that results were not dependent on any particular sample (see below). All dependent and predictor variables were scaled in order to produce standardised parameter estimates.
Testing alternative models and model selection. In order to test between the alternative hypotheses we specified different statistical models containing different predictors of imperial density. We ran models containing the predictor variables of duration of agriculture, productivity of agriculture, terrain ruggedness, and distance from the steppe as main effects. We also assessed to what extent model fit and parameter estimates were affected by including distance from first empires as a variable in these models. We also assessed the impact of including an interaction term involving the duration of agriculture and distance from the Steppe variables. We compared models in a model selection framework (Burnham and Anderson, 2002) based on Akaike Information Criteria (AIC) (Akaike, 1974). Each model was run on 20 random samples of 1000 cells to assess variation in the parameter estimates. As likelihoods are not directly comparable across samples we used change in AIC (ΔAIC) to guide model comparability across samples (see SI results).
Model parameters over time. In order to assess if there were any changes in the strength of our different main factors as predictors of imperial density over time we ran analyses over a sliding time-frame of 1000 years. In other words we first analyse a model with all the predictors of imperial density for the period 1500 BCE-600 BCE, we then move on to analyse the period 1400 BCE-500 CE etc. until the final period of 600 AD-1500 AD. Within these time frames we remove any cells to which agriculture had not spread by the end of that time period.
To assess the effect of distance from first empires we ran these models again including this variable. Each model was run on 10 random samples of 700 cells to assess variation in the parameter estimates.
We also ran several confirmatory analyses to assess the effect of uncertainty in our measures, and to confirm the robustness of the GLS methods we have used. We also ran exploratory analyses assessing whether ecological or cultural similarity may have shaped the occurrence of empires. Details are provided in the SI.

Results
Correlational analyses and main empirical patterns. Correlational analyses and visualisation of the distributions of the data indicate that distance from the steppe, duration of agriculture, and distance from first empire all show patterns in line with the predictions of the relevant hypotheses (Table 1 and Supplementary Fig S3). However, potential agricultural productivity does not appear to predict imperial density well; estimated productivity is high in many places where empires did not tend to form for example. Ruggedness of terrain also does not show a strong relationship with imperial density and indicates slightly higher imperial density in more rugged areas (which is in the opposite direction to that predicted). These analyses also reveal substantial correlations between the main predictor variables. This highlights the need to assess these hypotheses within the same model rather than examining them purely independently.
Spatially explicit GLS analyses. Comparing the fits of different models (Table 2) involving different combinations of the main predictor variables and ruggedness of terrain shows that the bestfitting models include the variables duration of agriculture, potential agricultural productivity, terrain ruggedness, and distance from the steppe. The interaction between duration of agriculture and distance from the steppe is also included in the bestsupported models. The standardised parameter estimates (β) indicate the strength of the association between our predictor variables and imperial density. The results (Table 2) indicate that Different measures of duration of agriculture and distance for the Steppe are given to capture the uncertainty in these variables. Imperial density is generally greater closer to the steppe and with increasing duration of agriculture. There is not a strong relationship between imperial density and potential productivity. There is a strong relationship between imperial density and the control variable of distance from first empires, but the relationship with elevation (terrain ruggedness) is fairly weak and in the opposite direction to that predicted. There are also substantial-positive relationship between the measures of duration of agriculture, the measures of distance from the steppe, and the distance from first empires. ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-020-0516-2 duration of agriculture and distance from the steppe are the strongest predictors of imperial density, with the interaction term also being important. The parameter estimates for ruggedness and productivity are indistinguishable from zero. Overall, the best-fitting model explains 58% of the variation in imperial density in this dataset (calculated as pseudo-R 2 value where imperial density is regressed on the predicted values from the GLS model). Including distance from first empires as a control variable (see SI results) increases the AIC scores of the best models and this variable has the largest standardised parameter estimate (β = 0.34). The variables duration of agriculture (β = 0.21) and the interaction term (β = 0.13) show very small decreases in their parameter estimates, while distance from the steppe is more heavily affected (β = 0.16), but still remains substantial. Figure 1 compares the original imperial density data with the distribution predicted by the full GLS model. Standardised coefficients (β) are presented to indicate the relative strength of each predictor. The best-fitting model contains duration of agriculture and distance from the steppe, and the interaction between these two variables. Only one other model falls within 2 mean AIC units of the best-fitting model and this also includes small effects of agricultural productivity and elevation.  Table 2 as it contains all the tested parameters.
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-020-0516-2 ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2020) 7:34 | https://doi.org/10.1057/s41599-020-0516-2 Model parameters over time. We also assessed whether the predictive strength of different factors varies through time. Figure 2 shows how well the variables predict imperial density over a sliding 1000-year window. In all analyses the distance from the steppe shows a substantial uptick towards the end of the time period considered. This could reflect the increasing intensity of warfare as time went on and the incursions into Europe by groups such as the Mongols. It is also important to note that this may also be due to the spread of European large state societies into Eastern Europe up to the steppe and the Ural mountains during these later periods. Duration of agriculture is a good predictor of imperial density and shows a slight increase in predictive strength before slightly decreasing. This reflects the fact that initially the spread of empires broadly follows the routes taken in the previous spread of agriculture, but then in later periods discrepancies begin to emerge (see Supplementary Discussion). Interestingly, the interaction term between duration of agriculture and distance from steppe decreases steadily over time. The potential productivity of agriculture and elevation are poor predictors through all time periods. Further analyses (see SI Results) reveal that the effect of distance from the first empire declines steadily over time. This suggests that in early time periods the location of these early empires may have been an important factor increasing the probability of large-scale societies inhabiting surrounding areas. However, this did not have a strong deterministic effect on the development of such societies as time went on and other historical or ecological processes increased in importance.

Discussion
In this paper, we have tested a number of influential hypotheses about the evolution of the large, complex human societies. Overall, the results show strongest support for the duration of agriculture hypothesis and the steppe warfare intensity hypothesis. These two ideas are complementary and we find support for an interaction between our measures of duration of agriculture and intensity of warfare, indicating that selection for large-scale societies has had more chance to act when the required variation in norms and institutions has been generated. Importantly these effects remain important even when distance from first empires is introduced as a control variable. The fact that the distance from first empires variables is generated partly from the same data as the imperial density variable makes this quite a conservative test of the other hypotheses. This suggests duration of agriculture and intensity of warfare really are explaining variation that would not be predicted if the patterns we see simply are due to the historically contingent locations of the initial areas of high social complexity. It is also not surprising that the importance of distance from the steppe is more strongly affected by the inclusion of distance from first empires as both hypotheses involve similar processes (intensity of warfare, and diffusion of innovations). The fact that duration of agriculture and the interaction effect are less affected also lends support to the idea that general processes other than historical contingency have been important. The study is also revealing in indicating the hypotheses that are poor explanations of the data. Both the productivity endowment hypothesis and the terrain ruggedness hypothesis are not supported by our analyses.
Our confidence in these results is supported by the fact that different measures of the variables of interest produce broadly similar results (see Table 1 and Supplementary Analyses). The causal interpretation of our analyses is also supported by the fact that the variables chosen are exogenous in that their variation is not dependent on or shaped by the presence of complex societies. Agriculture was first practiced in societies at a scale much smaller than those that are the focus of our analyses in this paper, and in this dataset agriculture was generally present in regions before large-scale societies are recorded as being present (only in the last few time slices do large societies and agriculture spread together in a small area of eastern Europe). We have used distance from the steppe as a proxy for warfare intensity rather than actual measures of warfare intensity partly because warfare may itself be a function of social scale and related increases in warfare technology. Similarly focusing on potential agricultural productivity (rather than actual or achieved productivity) avoids the problem that larger societies may develop more intense forms of agriculture (see below).
Our results indicate the importance of ecology and geography in shaping the evolution of complex societies. The hypotheses relating to the two best predictors both involve the emergence of certain practices that were more likely to occur in certain places rather than others (i.e., the domestication of crops in suitable environments, and the domestication of horses and use in warfare in the Eurasian steppe). These practices then subsequently spread geographically either through population movements or adoption by neighbouring groups. As the lack of support for the productivity endowment hypothesis also suggests, these results do not point to a strong form of environmental determinism (in which external environmental factors dictate that human history would unfold unerringly in a certain way)(Painter and Jeffrey, 2009) but rather stress the importance of the ways in which humans interact and shape their social and natural environments.
More generally it is important to note the probabilistic nature of the ideas we are testing. Even though we have shown that some variables are good predictors of imperial density it is not the case that regions that experienced increased warfare, or places with long histories of agriculture will necessarily develop large, stable polities. Our best-fitting statistical model explains more than half the observed variation in imperial density yet this leaves a substantial proportion of variation unexplained. Some of this may be due to measurement error and the necessarily coarse grain nature of our proxy variables. However, it is also indicates that factors and processes other than those examined in this study may also be important in determining the geographic and temporal distribution of larger-scale societies. Some insight into factors that could be assessed in future analyses comes from examining the residuals of our best-fitting model (Fig. 1). It can be seen that our model over-predicts the occurrence of large-scale societies in parts of eastern Africa south of the Sahara, some pockets within the Sahara and Arabian desert zones, and eastern Europe. On the other hand, imperial density in areas of China, southeast Asia, and Iran is under-predicted. More fine-grained or better measures of the factors we have addressed in this paper (e.g., other proxies that capture increases in the strength of competition between group, or more localised data on the adoption and importance of crop-based agriculture), may help reduce some of these discrepancies. But is also possible that other factors are important too such as the nature and distribution of resources and the ease with which they can be controlled (Carneiro, 1970;Mattison et al., 2016;Summers, 2005), or the degree of connectivity between societies such as in the form of trade (which may enable institutions to spread more easily, or may provide access to resources that are important in creating and stabilising large-scale societies).
This discussion highlights the fact that for selection to act and drive the evolution of larger societies cultural traits and innovations need to persist and be inherited from one generation to the next. The ability of cultural traits to be transmitted between societies also means that societies may not have to develop solutions to collective action problems independently but can borrow such innovations from other societies. Traits may be more likely to be transmitted between societies that are similar ecologically and culturally. Furthermore, just as biological species can disperse most easily into regions to which they are genetically or behaviourally pre-adapted (Wiens and John, 2011), human groups may find it easier to spread and expand their control over regions that are similar ecologically or culturally as the institutions they possess are suited to such conditions. These kind of ecological or cultural barriers may be another potential explanation of variation in the occurrence of large-scale societies. This may explain the seemingly slower spread of large-scale societies into sub-Saharan Africa for example. More generally, it has been argued that as Eurasia extends predominantly along lines of latitude, where ecological conditions are more similar, the traits involved in creating large-scale societies and the societies themselves could spread more easily here as compared to elsewhere (Diamond, 1997;Turchin et al., 2006). It has also been proposed that new institutions may also be more effective when they are adopted by or imposed on societies that share a common cultural history, and therefore possess more similar cultural traits and institutions (Currie et al., 2010;Currie et al., 2016;Spolaore and Wacziarg, 2013). In supplementary analyses (see SI) we have conducted some initial exploratory tests of these ideas, but we do find support for the predictions of these generalised hypotheses. However, further tests are required before we can be confident about rejecting such processes as important in shaping the distribution of large-scale societies.
Our study has focused on the region of Eurasia and Africa during a certain historical time-span so another thing to consider is to what extent the processes identified in this study have also been important in the Americas and in later time periods. Horsebased warfare was not present in the Americas during the timeframe considered in this study, and reduced in importance in Afro-Eurasia after 1500 CE due to the subsequent development of firearms, and increasing importance of naval warfare in Eurasia. In these cases if we want to test selection-based hypotheses it will be important to identify other variables that may be better proxies for these systems. It also worth noting that regions where some large-scale states such as the Incas and Aztecs did emerge are also known to be important early centres of plant domestication (Larson et al., 2014). Therefore, the duration of agriculture of hypothesis may also be relevant in this part of the world. Such tests would provide "natural experiments" in which to assess the generality of these ideas.
Although we find no support for the agricultural endowment hypothesis, it should be noted that our analyses do incorporate the assumption that some non-zero degree of agricultural productivity is essential. What are our analyses indicate, however, is that beyond this baseline, regions that had the potential to be more productive are no more likely to have been occupied by large-scale societies than regions of lower productive potential. It is important to note, achieved, rather than potential, agricultural production is likely to be an important factor in that it supports and enables complex societies to function. Achieved production is a function not just of ecological endowment but also the technologies that societies develop, which may be developed in response to increasing population pressure and may be facilitated by more complex forms of organisation. In the future, comparative time series data on changing agricultural practices and technologies may enable us to further assess the relationship between agricultural productivity and the evolution of complex societies (Currie et al., 2015), taking into account the potential reciprocal causal pathways involving these factors and changes in productivity over time due to historical climate change (see Supplementary Discussion).
In contrast to this study, much previous historically informed research has focused on the particular cultural traits, social features or even individual characteristics of leaders that may have led to the success of certain societies over others (Acemoglu and Robinson, 2012;McAdam et al., 2001). Here, our focus is on hypotheses that make tractable predictions about geographic patterns of variation, yet other explanations for the origin and maintenance of complex societies are also possible (see Supplementary Discussion). For example, understanding why complex societies exhibit high degrees of inequality, with elites in some societies potentially benefitting more than the masses from such forms of organisation, while other societies have developed more inclusive forms of institutions (Diehl, 2000). Future work will examine the different costs and benefits for individuals within societies and the changing levels of inequality over human history and the social and ecological conditions under which this occurs.
More generally the study also highlights the importance of setting out and testing alternative hypotheses about human cultural evolution. The kind of cultural evolutionary approach taken in this paper enables us to integrate insights and findings from different disciplines and to develop hypotheses with a clear understanding about how those ideas might fit together. This provides a framework to understand whether hypotheses are competing or complementary, and allows us to quantitatively assess how well they explain the data and whether some ideas can be rejected. In doing so we can develop a better understanding of the historical and ecological processes that have shaped the world of large-complex societies that we live in today.

Data availability
Data, R code and sources used in these analyses are openly available at Harvard Dataverse: https://doi.org/10.7910/DVN/8TP2S7.