Evidence of the interplay of genetics and culture in Ethiopia

The rich linguistic, ethnic and cultural diversity of Ethiopia provides an unprecedented opportunity to understand the level to which cultural factors correlate with–and shape–genetic structure in human populations. Using primarily new genetic variation data covering 1,214 Ethiopians representing 68 different ethnic groups, together with information on individuals’ birthplaces, linguistic/religious practices and 31 cultural practices, we disentangle the effects of geographic distance, elevation, and social factors on the genetic structure of Ethiopians today. We provide evidence of associations between social behaviours and genetic differences among present-day peoples. We show that genetic similarity is broadly associated with linguistic affiliation, but also identify pronounced genetic similarity among groups from disparate language classifications that may in part be attributable to recent intermixing. We also illustrate how groups reporting the same culture traits are more genetically similar on average and show evidence of recent intermixing, suggesting that shared cultural traits may promote admixture. In addition to providing insights into the genetic structure and history of Ethiopia, we identify the most important cultural and geographic predictors of genetic differentiation and provide a resource for designing sampling protocols for future genetic studies involving Ethiopians.

likely that there was wider involvement in the Red Sea trade involving the Arabian Peninsula and the Eastern Mediterranean regions. Contacts with South Arabia have been better attested archaeologically than those with Nubia and Egypt. 3

Supplementary Note 1C. Ethiopian Christianity and contacts with Egypt, the Levant and Europe
Sometime during the second quarter of the fourth-century CE Aksum adopted Christianity (Sergew 1972). Unlike corresponding events in the Roman Empire, the first to be converted to the new creed in Ethiopia were the ruling class. Palaeographic and documentary sources show that the first Christian ruler of Aksum was King Ezana. 4 The conversion of the ruling class to the new faith seems to have resulted in important transformations that provided politico-ideological legitimacy to the monarchy and also in the sphere of the wider highland Ethiopian popular culture. The celebrated document known as the Kibra Nagast, historic Ethiopia's national epic, represents an epitome of this event of revolutionary proportions (Budge, 1922;Shahid, 1976). It took the new religion over a century to reach the broad masses through the evangelical activities of two groups of foreign missionaries, known as the Tsadqan and the "Nine Saints" (Sergew 1972). These missionaries were monks who came to Ethiopia from the Eastern Mediterranean region around the end of the fifth century CE. As sufficiently trained clergy, the missionaries appear to have completed the task of translating the Bible mainly from the Old Greek version (the Septuagint), which was translated from the older Hebrew/Aramaic versions (Polotsky 1964;Ullendorff 1968Ullendorff , 1980Ullendorff , 1987Knibb 1988) into the local vernacular, Ge'ez. It is claimed that this literary activity led to the introduction of some foreign terms into the local language, for example, the Armenian word adja (adcha or adjar) for 'emmer wheat' as has long been used in Ethiopia (Harlan, 1969) and the Syriac term haymanot for 'religion' (Sergew, 1972). The missionaries appear to have established churches and monasteries, both built and rock-hewn, and generally helped propagate the new faith among the people.
The Alexandrine See came to have the status of spiritual suzerainty and guidance over the Ethiopian Orthodox Church (Taddesse, 1972). Invariably, all heads, known in Ge'ez as Abun, of the latter Church were Egyptian bishops, duly consecrated and appointed by the former down to 1959 when Ethiopian bishops begun to serve as the heads of their own Church. From the outset, a special relationship of mutual help and interdependence, appears to have been established between Church and State. The Church seems to have acted as the most prominent ideological arm of the State. In return, the latter endowed the former with massive material subsidies used to establish new churches and monasteries as well as proselytising among, first, believers of traditional religions and later Muslims. This arrangement was manifest, most markedly, during the fourteenth and fifteenth centuries (Taddesse, 1972). Throughout the medieval period, Ethiopian Orthodox Christian monks and devout lay pilgrims are known to have flocked, in comparatively large numbers, to Jerusalem where they made contact with people from other countries. An Ethiopian monk named Father Gregory provided the famous German scholar Hiob Ludolf with reliable material to write his A New History of Ethiopia (published in 1682). This book was the second work to provide generally correct information about the country to western Europe following the account of the Portuguese Embassy of 1520-1526 referred to below.
Notwithstanding the contacts described above, Ethiopia was relatively isolated from the earlyseventh century to the late nineteenth century. However, limited, more or less regular, contacts were maintained with the outside world, chiefly Egypt, the Holy Land, and, to a lesser extent, the Vatican (Taddesse, 1972;Sergew, 1972). Ideas were exchanged and knowledge filtered through in what may have arguably been a two-way traffic. Since the mid-fourteenth century, Ethiopia also maintained some communication with western Europe through visiting travellers and explorers, diplomats, Christian missionaries and scholars (Crawford, 1958;Ullendorff, 1960). In 1520 a fourteen-man Portuguese Embassy, which included a chaplain-chronicler, Father Francesco Alvares, visited the country establishing a well-documented official link between Ethiopia and a European country (Alvarez, 1961;Castanhoso, 1902). The Embassy remained until 1526 when it returned to Lisbon (Alvarez, 1961). Detailed accounts of the venture and observations were published in 1540. It is thought that the Portuguese were instrumental in introducing New World flora including pepper and perhaps also corn, cotton, and beans. Later Jesuit missionaries, led by the Spaniard Father Pedro Paez, succeeded (after years of persuasion) in converting Emperor Susneyos (r. 1607-1632) and some of his most important courtiers to Catholicism in 1622.
Jesuit involvement in the country led to religious wars. Emperor Susneyos abdicated in favour of his son, Emperor Fasil(adas) (r. 1632-1667), who founded Gondar as Imperial Capital in 1634/5 and pursued a close-door policy banning all European visitors to the country (Merid, 1971;Berry, 1976). Despite this edict, some European travellers did succeed in entering the country.

Notes
1 The earliest African attestation of cattle has been dated to ca. 9000 BCE.
3 Phillipson (1998: 24) has the following to say on this subject: "It is remarkable how few are the artefacts of demonstrably Egyptian  4 Legend has it that there were two Greek-speaking Christians from Egypt, named Frumentius and Aedesius, who had been taken captive from the Red Sea littoral and brought to the royal court in Aksum where a deceased king was survived by his minor son and the Queen Mother (Sergew 1972: pages 95-100). Frumentius and Aedesius reportedly succeeded in gradually converting the young monarch and his mother to embrace the new religion.

Supplementary Note 2. Sampled Ethiopian groups
Today Ethiopia is one of the most populated countries in the world ("Ethiopia". The World Factbook, CIA, 2017), home to over 90 ethnic groups and 86 living languages. The largest groups are the Oromo (34.4% of the total population -source: Ethiopia People 2017, theodora.com), the Amhara (27%), the Somali (6.2%) and the Tigraway (6.1%) that make up around three-quarters of the population. The rest of the groups represent low percentages of the total population, and some of them represent minorities with less than 10,000 members. Christianity is the most practiced religion in Ethiopia (62.8%), followed by Islam (33.9%), traditional faiths (2.6%) and other religions (0.6%) (Ethiopian Central Statistical Agency, 2009). Ethiopia is administratively divided into nine regions: Afar, Amhara, Benishangul-Gumuz, Gambela, Harari, Oromia, Somali, Southern Nations, Nationalities and Peoples' and Tigray. This is subdivided into 68 zones, that in turn are subdivided into districts or woredas and two chartered cities (Addis Ababa and Dire Dawa).
Languages spoken in Ethiopia can be classified into two language families: Afroasiatic and Nilo-Saharan. Branches of the Afroasiatic languages represented in Ethiopia are Cushitic, Omotic and Semitic (www.ethnologue.com), while Nilo-Saharan languages spoken in Ethiopia have been classified as members of two groups: Chari-Nile and Koman (Bender, 1976). Previous work has suggested that linguistic affiliation is the main factor driving genetic structure in Ethiopians (Pagani et al., 2012). Supplementary Data 1 and Figure 1a show the languages associated with the samples included for this work.
For the Ethiopians with genetic data newly released in this study, we genotyped those individuals whose grandparents' birthplaces were coincident, with the exception of a few ethnic groups (Meinit, Qimant, Suri, Negede-Woyto, Shinasha, Bana) where we did not find any sampled individual fulfilling this condition. In these cases, the geographical location was calculated as the average point between the birth locations of the paternal grandfather and the maternal grandmother.
The latitude and longitude coordinates of birthplaces of donors were recorded as the roughly central place in the locality of their birth, which was obtained by one or other of the following means: onsite use of a GARMIN GPS unit during the data collection, information provided by a local service (Information Systems Services (ISS)) and manual searches using Google Maps, OpenStreetMap and, in a few cases, other programs. We did not have geographic or birthplace information for Beta Israel individuals whose genetic variation data is newly released in this study. Information about elevation was obtained using the geographic coordinates of each individual in the dataset with the "Googleway" package.
Samples were collected by co-author AT in a programme that consisted of two collection expeditions a year taking place from 1998 to 2011 and which frequently involved communication through local translators in remote locations. Consistency is a difficult thing to achieve in Ethiopian linguistic nomenclature. Neither the Ethnologue (www.ethnologue.com) nor many linguists consistently give primacy to the opinions of native speakers in ascribing a name to a language.
Sometimes a group may declare that they speak a language that linguists may designate a dialect.
Consequently the Ethnologue may use a common name for a cluster of closely related tongues. AT recorded the name of a language as declared by the speakers who were sampled. Similarly AT's practice was to use the self-adopted names of ethnic groups rather than names used to refer to them by their neighbours or other outsiders, all of which they are likely to consider derogatory. All information recorded with respect to donors was either provided by the individual donors themselves or by local informants. Information about linguistic classifications used in this study are provided in Supplementary Data 1-2. As examples of some of the complications, the labeled group "Manja" refers to a hunter-gatherer group claiming Kafa as their original language, but who presently live in the Dawro Zone and report Manja as their first language and Dawro as their second language. They claim to be descendants of a gift of slaves by the king of Kafa to the king of Dawro.
The labeled group "Manjo" refers to a hunter-gatherer group, locally called Manjo, who live amongst the Kafa and the Shekacho in the Kafa and Sheka administrative Zones (previously the Kafa-Sheka Zone). Manjo study participants living in the Sheka Zones reported their first language as Shekacho and their second language as Amharic. More generally, it is believed that Manjo living in Sheka Zone speak Shekacho as their first language while Manjo living in Kafa Zone speak Kafa as their first language.

Supplementary Note 3. Simulations
We performed simulations to illustrate how inference can differ under the "Ethopia-internal" versus "Ethiopia-external" analyses in the presence of population bottlenecks, differing levels of admixture from outside sources, and the intermixing of groups that contain prior admixture events.
For example, we demonstrate how two populations splitting 45 generations (~1200-1300 years) ago, with one of the populations subsequently experiencing a strong bottleneck, can lead to high genetic differentiation under the "Ethiopia-internal" analysis but relatively little differentiation under the "Ethiopia-external" analysis. In contrast, two populations that have experienced admixture from the same sources, but at proportions differing by 20%, look genetically different under both analyses. We also show how the detection of admixture events when running GLOBETROTTER can depend on the surrogate populations used, in a manner consistent with our observation of how inferred admixture dates differ between the "Ethiopia-internal" and "Ethiopiaexternal" analyses in the real Ethiopian data.
Specifically, we generated simulated admixed populations by mixing two populations A and B that consisted of the following individuals: For the former, we generated each admixed haploid as a sequence of tracts, with tract sizes in centimorgans sampled from an exponential distribution with rate equal to the time (in generations ago) that admixture occurred. Each tract was copied intact from a single source population haploid randomly selected according to the simulated admixture proportions. For the latter, in each generation we simulated each haploid genome as a mosaic of tracts from two randomly selected parent haploids from the previous generation, with tract sizes based on the build 37 recombination map used for the real data analyses. Here the number of tracts per chromosome is equal to a B1 + B2 + 1, where B1 = {0,1} with probability {0.5,0.5}, and B2 is a random sample from a Poisson distribution with rate equal to the total Morgan length of the chromosome minus 0.5. B1 models the expected obligate crossover per generation on a chromosome, and B2 models the remaining crossovers.
We generated four populations of admixed individuals, where g below refers to the admixture time in generations ago and p refers to the proportion of DNA inherited from population A: (1) "40% (Exp)": Admixture with g=75 and p=0.4, with exponential growth. To do so, we first For analyses of these simulations, for each of (1)-(4) we made 20 diploid simulated individuals that consisted of 40 randomly selected (without replacement) simulated haploid genomes that were randomly paired. We performed analogous CHROMOPAINTER, SOURCEFIND and GLOBETROTTER analyses to those performed in the "Ethiopia-internal" and "Ethiopia-external" analyses of the real data. In particular, for the simulations' "Ethiopia-external" analogue, we used CHROMOPAINTER to paint each simulated individual against 259 donor populations that included all of those used in the real "Ethiopia-external" analysis but excluding the five populations in A and B (i.e. K=259 here). For the simulations' "Ethiopia-internal" analogue, we painted each simulated individual against these same 259 donor populations plus the four simulated groups comprising the 80 sampled individuals from (1)-(4) (i.e K=263 here). (We note that the sampled individuals for (1) and (3) did not include any of the 320 haplotypes used to simulate (4).) Mimicking our real data "Ethiopia internal" analysis, we excluded matching to individuals from the same labeled group when generating painting samples files for each simulated population (1)-(4) for the simulations' "Ethiopia-internal" GLOBETROTTER analogue. For all CHROMOPAINTER analyses of simulated data, we used E-M estimated values of the global mutation (-M) and switch rates (-n) from the "Ethiopia-external" real data analysis.
To ease computational burden, all surrogate individuals used in our GLOBETROTTER/SOURCEFIND simulation analyses were represented by slightly modifying the "Ethiopian-external" paintings we had already generated for the real data analyses. In particular, as this real data analysis painting had allowed each surrogate to match to the five populations from A and B, we set any such matching to 0 (rather than re-painting). Furthermore, for the simulations' "Ethiopia-internal" analogue, we assumed each surrogate population matched 0 to the simulated individuals from (1)-(4). This could potentially lead to a reduction in power for our SOURCEFIND and GLOBETROTTER analyses, but we show that it seems to have made little difference here.
For the "Ethiopia-external" GLOBETROTTER analogue, each of the four populations used 116 surrogate present-day groups, mimicking our real data version of the "Ethiopian-external" GLOBETROTTER analysis but excluding 11 ancient groups and the present-day groups used to simulate. For the "Ethiopia-internal" GLOBETROTTER analogue, each of the four populations used 119 surrogate groups, consisting of these 116 groups plus the three other simulated populations. For the "Ethiopia-external" analogue, we also applied SOURCEFIND to each simulated population, again mimicking our real data analysis but removing the five groups used to simulate and hence using 271 surrogates in total.
This indicates how the "Ethiopia-external" analysis can be used to uncover recent shared ancestry that has been masked by recent endogamy effects. On the other hand, under both analyses, simulations (1) and (2) are genetically differentiated from simulations (3) and (4) (1) and (2) show very similar inferred proportions, sources and dates, reflecting their common recent ancestry and in particular consistent with them having split more recently than their common inferred admixture date of 65-85 generations ago.
Under the "Ethiopia-internal" analogue, inferred dates for simulations (2) and (3) closely match the truth, while GLOBETROTTER failed to detect admixture in simulation (1), presumably due to masking since its ancestry patterns are similar to those in simulation (4).
For simulation (4), the inferred admixture date for the "Ethiopia-external" analogue reflects the admixture inherent in simulated population (3), which contributed 70% of the ancestry for simulation (4) (Supplementary Figure 5d). In contrast, the inferred date for simulation (4) under the "Ethiopia-internal" analogue captures the admixture between simulated populations (1) and (3) (Supplementary Figure 5d). This reflects our observations in the real data of more recent inferred dates under the "Ethiopia-internal" analysis relative to the "Ethiopia-external analysis" ( Figure 4A), indicating how the former can capture intermixing among Ethiopian groups that is missed by the latter, because only the "Ethiopia-internal" analysis includes Ethiopian surrogate groups. The oldest admixture date of 85 generations in simulated population (4), i.e. the admixture inherent in simulated population (1) that contributed only 30% of the ancestry to (4), is missed under both analyses, indicating that older events may be masked by more recent ones in our analyses.

Supplementary Note 4. The association of genetic similarity and language classifications
This section provides further insights into the association between genetics and linguistic classifications. Our study contained Ethiopian individuals from ethnic groups classified as belonging to the Nilo-Saharan (NS) and Afroasiatic (AA) language families. These are classified into four different within-family branches (www.ethnologue.com): the NS Satellite-Core (179 individuals after quality control), AA Cushitic (383 individuals), AA Omotic (536 individuals) and AA Semitic (96 individuals) branches. In addition, our study included 20 individuals from two linguistic isolates (NegedeWoyto, Chabu) not classified into these families (www.ethnologue.com).
Genetic differences among individuals from these different categories are summarized in Figure  We had individuals representing two distinct sub-branches within each of the AA Cushitic, AA Omotic and NS Core-Satellite branches, as well as additional classifications within most of these sub-branches (see Supplementary Data 2). Sub-branches within each of the above are genetically differentiable (p-val < 0.001) under both the "Ethiopia-internal" and "Ethiopia-external" analyses (Supplementary Figure 9). Therefore on average people from the same language sub-branch classification (i.e. to the third tier of classification provided in www.ethnologue.com) share more recent ancestry with each other than they do with people from other language classifications, with these effects not solely resulting from recent isolation.
We also find that individuals from different linguistic classifications within each sub-branch can be significantly more genetically similar, though some genetic patterns are not consistent with linguistic classifications (Supplementary Figure 9). For example, within the AA Cushitic East subbranch, it has been suggested that individuals from Highland and Lowland linguistic categories diverged before 3,000BCE, and that Werizoid (or Dullay) languages diverged from other Lowland languages between this time and 1,000BCE (Ehret, 1976). Conflicting with this, on average Highland speakers are more genetically similar to individuals from particular Lowland groups than individuals from different Lowland groups are to each other, and Lowland speakers from the Dullay and Konso-Gidolo groups are not differentiable from each other (p-val > 0.05) while each being significantly different from other Lowland groups (Supplementary Figure 9). However, if linguistic trees correlated with the order in which groups became isolated from one another, these genetic discrepancies could be driven by subsequent factors, such as more recent admixture events shared among Dully and Konso-Gidolo speakers that did not affect other Lowland speakers, as suggested by Black (1975). Similarly, all three language classifications (Surmic, Nilotic, Koman) within the NS Core subbranch are genetically distinguishable under the "Ethiopia-internal" analysis (Supplementary Figure   9). The Gumuz have a disputed language classification, B'aga in Ethnologue, but in Bender (1976) it is suggested that the Gumuz language may be classified as Koman. Genetically, the Gumuz are significantly most similar to Komo speakers under the "Ethiopia-external" analysis (Supplementary

Supplementary Note 5. Inferring recent ancestry/admixture in Ethiopian groups
In this section we provide further details of the SOURCEFIND analysis used to infer ancestry and GLOBETROTTER analyses used to identify and date admixture events in the Ethiopian groups.
Inferring recent ancestry and dating admixture by comparing Ethiopian clusters to non-Ethiopian groups ("Ethiopia-external" analysis) We   We caution that this ancestry composition is not implying that each Ethiopian group is a mixture of these reference populations, or that the total proportions contributed by the 2-3 sources per cluster outlined in Figure 3 and Supplementary Figure 12 accurately reflect the proportion of DNA inherited from those sources (though they can do -see simulations in Supplementary Note 3).
Instead Ethiopian groups carry haplotype patterns that match those carried in these reference populations, suggesting more recent shared ancestry with those reference populations relative to the other reference populations. There are two important caveats/limitations of our approach. The first is that comparing to different reference populations potentially can give quite different results, and it is unclear what the "best" reference population set is. Our choice of reference populations is A second caveat of our SOURCEFIND analysis is that five of the 12 present-day groups Tanzania_Iraqw) had only two samples. When painting an individuals' genome using CHROMOPAINTER, an individual cannot match to itself. Thus each of these four populations matched to only one sample from their own population via CHROMOPAINTER, which may mitigate signals of isolation (e.g. due to endogamy) in that population, relative to groups that can match to a greater number of individuals from their own labeled group. By mitigating signals of endogamy effects, such reference populations can potentially be favored as an ancestral source in SOURCEFIND analysis, which aims to find the reference populations with painting patterns that most closely match those of the target (in this case Ethiopian) cluster. This may also explain why Mota is favored, as it has only a single sample and hence no means of measuring endogamy under this approach. Nonetheless, comparisons among Ethiopian clusters are still meaningful when conditioning on this set of references, as each cluster was analysed in the same way. In general, we note that surrogate populations with high degrees of isolation (e.g. due to endogamy) may be less likely to be selected as representative of an ancestral source, which is one way SOURCEFIND likely differs from e.g. a f3 outgroup test (Patterson et al 2012). But arguably such surrogates should be downweighted, as -due to recent isolation -the genetic make-up of such surrogates likely no longer well-reflects the ancestral source population.
We also applied GLOBETROTTER (Hellenthal et al 2014) as described in the main text separately to each Ethiopian cluster in order to identify and date admixture events, under a model that assumes one or two pulses of admixture where two or more sources intermixed. Analogously to SOURCEFIND, GLOBETROTTER uses surrogates to the putative admixing sources. Briefly, first CHROMOPAINTER is used to match haplotype patterns within individuals from the target (i.e.
putatively admixed) group to those in a set of reference individuals. In doing so, each target individual's genome is composed of a series of DNA segments, with each segment matched to (i.e. inferred to share a most recent common ancestor with) a specific reference population.
GLOBETROTTER then infers admixture in the target group by modelling the decay in linkage disequilibrium among segments within each target individual that match to different surrogate populations using the CHROMOPAINTER results. admixing source, the inferred probability for that pair will decrease exponentially with increasing genetic distance. In contrast, if two surrogate groups represent different admixing sources, their inferred probability will increase exponentially with increasing genetic distance (Hellenthal et al., 2014). Therefore, by studying the probability patterns among all pairs of surrogates, GLOBETROTTER can automatically infer the number of admixture events (though attempts only to characterize up to two events in practice; see below), as well as which surrogates best match genetically to each putative source involved in each event.
For each Ethiopian cluster, GLOBETROTTER assigns any inferred admixture to one of four categories: (i) "one-date" involving a single purse of admixture between two sources, (ii) "one-datemultiway" involving a single pulse of admixture between greater than two sources, (iii) "multipledates" involving more than a single pulse of admixture at different times, potentially between greater than two sources, and (iv) "uncertain" where the probability curves described above are challenging to categorize into (i)-(iii). For (iii), GLOBETROTTER attempts to only date and describe two distinct pulses of admixture, though we note these signals can be consistent with more than two pulses of admixture or continuous admixture (Hellenthal et al., 2014). Furthermore, signals concluding (i),(ii),(iv) may reflect a failure of GLOBETROTTER to identify genuine older admixture events and/or have inferred dates biased towards more recent admixing in the case of continuous or multiple admixture events (Hellenthal et al., 2014), as illustrated in our simulation results (Supplementary Figure 5).
We also visually inspected the probability curves (e.g. Supplementary Figure 15) to assess whether the conclusions (i)-(iv) that GLOBETROTTER reports fit the data. Based on this visual inspection, and using the parameters highlighted below from the GLOBETROTTER *main.txt output files, we made some slight alterations to GLOBTROTTER's reported conclusions. In particular, to be conservative we do not report GLOBETROTTER results for clusters where "r2.oneevent", which assess the overall evidence of admixture (on a 0-1 scale) by measuring the fit of an exponential distribution to the probability curves, was <0.34, as such clusters had noisy probability curves. In addition to these omissions, we slightly altered GLOBETROTTER's default threshold for concluding "one-date, multiway" over "one-date" from "fit.1event < 0.975" to "fit.1event < 0.98", which changed the conclusion from "one-date" to "one-date, multiway" for four clusters (Eth_ab, Eth_ap, Eth_ar, Eth_bh). Finally, we visually inspected clusters for which "maxScores.2events" ε (0.3,0.35], which is indicative of multiple-dates of admixture but does not meet GLOBETROTTER's default criterion of "maxScores.2events" > 0.35 for concluding "multiple-dates". In some of these cases, two admixture dates appeared to fit the data notably better than one date; i.e. the red line in the GLOBETROTTER *pdf file output was a better fit to many probability curves relative to the green line (see examples of these red and green lines in Supplementary Figure 15). Thus we changed the conclusion from "one-date" of admixture to "multiple-dates" of admixture for three clusters (Eth_ag, Eth_ak, Eth_as). We note that other clusters may have multiple dates of admixture that we miss here, and that more data from Ethiopians will help to clarify these admixture signals in the future.
For each Ethiopian cluster for which GLOBETROTTER infers admixture, we use 100 bootstrap re-samples of individuals to infer confidence intervals around inferred dates.
GLOBETROTTER infers admixture events in 68 of the 78 Ethiopian clusters, with dates ranging from ~100 to 4200 years ago. The type of admixture events inferred among these 68 include 31 "one-date", 27 "one-date-multiway" and ten "multiple-dates". Based on visual inspection of each cluster's GLOBETROTTER probability curves, i.e. the probability that two DNA segments separated by increasing centimorgan distance are matched to two particular surrogate groups as described above (see Supplementary Figure 15), we determined that admixing sources broadly could be defined by contribution patterns from the following six reference groups:  the NS Nilotic-speaking Dinka from Sudan (sometimes including the Bulala from Chad)  the NS Nilotic-speaking Sengwer from Kenya , for which these groups were the highest contributors to inferred ancestry.
Various combinations of these different groups define the six numbered admixture signatures we report in Figure 3 and Supplementary Figure 12: (1) A cluster containing the NS speaking Murle and Nyangatom shows multiway admixture at one date (10-16 gen ago) betweent three distinct sources best represented by the Muganda, Dinka and Sengwer.
(2) Seven clusters of NS speakers, plus separate clusters of the AA Omotic speaking Karo and linguistically-unclassified Chabu, show relatively recent admixture (typically < 30 gen ago) between two sources best represented by Mota and Dinka/Sengwer.
(3) Four clusters of AA speaking groups show admixture between two sources represented by (cluster 2) have relatively more Dinka-like ancestry. Also consistent with their results, in general we find NS-speakers to have more Dinka-related ancestry than AA speakers. Furthermore, we infer mixture between Mota-like and Levant-like sources as old as around 4000 years ago among some AA speaking populations, though we note our study of present-day populations may miss some of the older admixture events reported in thier study.
Exploring recent admixture among Ethiopian clusters ("Ethiopia-internal" analysis) We next set to determine whether there has been intermixing among Ethiopian groups. To do so, we applied GLOBETROTTER to each Ethiopian cluster, using 64 Ethiopian clusters as potential surrogates for the admixing sources in addition to the 130 surrogates used in the "Ethiopia-external" analysis. Fourteen of the 78 Ethiopian clusters (marked by asterisks in the first column of Supplementary Data 4) were not included as surrogates, because they each contained small numbers of individuals from several ethnic groups that hence would confuse interpretation of results. As the "Ethiopia-internal' analysis is picking up more subtle admixture between genetically similar groups (i.e. between Ethiopian groups), we did not analyse clusters with <=5 individuals, and we do not report results for two clusters (Eth_al, Eth_cr) with "r2.oneevent" < 0.5 that had noisy probability curves. We used the same citerion described above the "Ethiopia-external" analysis for changing GLOBETROTTER's default conclusion of "one-date" to "one-date, multiway" (i.e. using "fit.1event" < 0.98 rather than < 0.975), which changed the conclusion from "one-date" to "onedate, multiway" for one cluster (Eth_ax). We also used the same criterion described above to change GLOBETROTTER's default conclusion of "one-date" of admixture to "multiple-dates" of admixture for three clusters (Eth_ag, Eth_ak, Eth_bi).
After these changes, we concluded admixture in 61 of the 78 Ethiopian clusters, with 32 "onedate", 19 "one-date-multiway", 6 "multiple-dates" and 4 "uncertain" events. The inferred dates of events were much more recent relative to the analysis that excluded Ethiopian surrogates ( Figure   4a). Overall 43 (84.3%) of 51 groups that concluded "one-date" and "one-date-multiway" events had inferred point estimate dates <30 generations ago (~750-850 ya) under this analysis. This demonstrates how this analysis is capturing more recent intermixing by including Ethiopian surrogates, likely because different Ethiopian groups have been intermixing more recently.
To assess whether Ethiopian groups are intermixing with geographically nearby groups, we first arranged our clusters (and the 4.5kya Ethiopian Mota) along a line (represented by the circle in Figure 4b) based on the geographic distance between them. To do so, we ordered groups according to their order along the first component of a principal-components-analysis (PCA) of the geographic (Haversine) distance matrix between clusters, where the latitude/longitude of each cluster is the average of that among all individuals within that cluster. For each of the 57 clusters that did not infer "uncertain" admixture, we took the GLOBETROTTER inference of the most strongly the surrogate group that GLOBETROTTER inferred as the best genetic match to the minoritycontributing source and (B) the surrogate group that GLOBETROTTER inferred as the best genetic match to the majority-contributing source. For each of (A) and (B), we did not include clusters where the surrogate was a non-Ethiopian group, and we averaged scores across all included clusters.
This gave final values of 14.02 and 9.60 for (A) and (B), respectively. To be conservative, we took the highest of these two scores, i.e. that based on the ordinal distance between the cluster and the minority-contributing source. The lower score of (B) makes intuitive sense, as typically the majority contributing source is more genetically similar and geographically closer to the target group than the minority contributing source. Indeed, for this reason, the majority source is often presumed to reflect the ancestors who lived in the same region as the present-day target individuals, while the minority source is presumed to have admixed with these ancestors, e.g. after migrating into the region. We then permuted labels around the line 50,000 times, and recalculated our average proximity score (i.e. to the same minority contributing source labels) for each permutation. The permutation-based p-value calculating the proportion of permutations whose average proximity score was less than or equal to that of the observed average proximity score was highly significant (p-val < 0.0002). Overall these results suggest that geographically nearby groups in Ethiopia have intermixed with each other more recently than the W.Eurasian and W.African-source events inferred in our admixture analysis that excludes Ethiopians as surrogates.
Alternative approach to explore recent admixture among Ethiopian clusters used in Figure 5 Very recent admixture may be challenging to detect via GLOBETROTTER, which e.g. has no power to see admixture occurring one generation ago. Therefore, for each of these six cultural practices shown in Figure 5, we used an alternative means of assessing whether there was evidence of recent intermixing among people from pairs of groups that both reported the given practice. To

Supplementary Note 6. Descriptions of cultural traits
The practices listed below are reported in (The Council of Nationalities, Southern Nations and Peoples Region, 2017), with groups' reports regarding them provided in Supplementary Data 12.
Arranged marriage or marriage arranged by parents: marriage arranged by the parents of the couple with little or no involvement of the couple.
Abduction: involves a man, often assisted by members of his family and/or friends, forcibly taking a young girl or a mature woman as a wife. No parental consent is obtained.
Swift/spontaneous unions: said to occur only occasionally in southwestern parts of Wollo province to the northeast of the south-westerly bend of the Blue Nile but also elsewhere including in the SNNPR (e.g. among the Halaba). Involves a girl, well-past normal marriageable age and spur of the moment agreement.
Sororate/cousin marriages: a widower marries a sister or cousin of his deceased wife.
Wife-replacement: a widower marries a sister or close female relative of his deceased wife.
Wife-inheritance: a married man "inherits" i.e. takes as an additional wife a widow of his deceased elder brother, cousin or close kinsman with the primary objective of providing trusteeship for children and assets left behind by his deceased relative.
Belt-giving: a form of marriage that involves offering the intended bride ladies' belts as a symbolic gesture of the young man's desire to marry her. If the girl does not wish to marry the man she refuses to accept the belt. The belt may be considered a token of love and may form a small portion of the bride-price. Parents may not be able to object to the marriage.
Bead-giving: a young man offers his future bride beads as a token of his love and affection. A variant of this practice involves a young man forcibly tying beadwork round a girl's neck despite her resistance (parents may not be able to object to marriage following such an event). The beads may be considered a small pre-marriage instalment of bride-price.
Beaded necklace snatching: a form of marriage that involves an earlier snatching of a young girl's beadwork necklace. (The act of snatching the beadwork is a symbolic gesture of the man's wish to marry the woman.) If adult male members of the girl's family apprehend the snatcher, he may be beaten and dispossessed of the bead necklace, in which case he cannot lay customary claim to the girl as his future wife. If the man escapes with the girl's bead necklace the girl's family will be forced to give the girl up to the man for marriage.
Note: many groups in the SNNPR accord ideological/symbolic significance to belts and beads that revolves around 'omens and a person's fate and fortunes'.
Men moving in to marry women: the married couple move into the bride's home after marriage. This is uncommon. A women most commonly moves into a home built and owned by her husband.
Women moving in to marry men: suddenly (in circumstances in which a man and woman are in love), completely unannounced, a young woman enters the family home of a young man and, clinging to the central pole of the house, pleads with the boy's parents to allow their son to marry her.
Repeat marriages: marriages following divorce or annulment of a previous union.
Bride's butter anointment: an important part of a marriage ceremony that takes place at the groom's family home during a wedding. It involves the future mother-in-law anointing the hair of the bride with a generous amount of butter.
Male circumcision: removal of the foreskin. (May be performed on babies, young boys, teenagers and adult males, individually or in groups of similar age. May be part of initiation ceremonies.) Female circumcision: cutting of the labia minora and/or the labia majora, with or without excision of the clitoris of young girls and women. In northern Ethiopia it is performed at an early age, while in the south-western parts of the country it takes place at a later age and is related to marriage.
Pre-marital sex: sexual intercourse with the opposite sex prior to marriage (considered unacceptable in most communities in Ethiopia but accepted as the norm by a few groups in the SNNPR).
Pre-marital pregnancy or birth: a woman becoming pregnant or having a baby prior to a marital union.
Endo/exogamy: marriage of a man or a woman to the opposite sex, respectively, within/outside their clan or lineage.
Poly/monogamy: marriage of a man to many wives or just one wife, respectively.
Marriage with bride's parental consent: marriage of a woman to a man with her parents' consent.
Marriage with brides encouraged by parents: marriage of a woman to a man whom the woman's parents prefer to other men.
Groom's parental choice marriage: marriage of a man to a woman whom the man's parents prefer to other women.
Groom's aunt arranged marriage: marriage of a man to a woman whom the man's aunt prefers to other women.
Marriage involving third-party agents: marriage between a man and a woman arranged by thirdparty agents (might be cousins, aunts, uncles, friends, acquaintances, any other person, related or unrelated, to the couple).
Marriage involving women intermediaries: marriage between a man and a woman arranged solely by female intermediaries (they may or may not be related to the couple).
Special unions: marriage unions between a man and a woman that do not conform to commonly accepted cultural practice followed by most members of a group (important examples of such marriages include community or religious/spiritual leaders, chiefs and kings).
Minors' marriage: marital union between underage children of the opposite sex arranged by their parents.
Mate-selection: marriage in which the couple themselves decide to marry (with little involvement of either set of parents in marriage negotiations).
Marriage with mutual spouse consent: as mate-selection.
Marriage with spouse and parental consent: marriage of a man and a woman with the consent of the couple and their parents.
Marriage by elopement/persuasive absconding: marriage of a man and woman usually without parental consent after the man persuades the woman to abscond with him.

Supplementary Tables
Supplementary Table 1. All ancient DNA (aDNA) samples included in this work, for both the "Ethiopia-internal" and "Ethiopia-external" analyses. Supplementary Table 2 Mean and inner 95% empirical quantiles of genetic similarity (1-TVD) under the "Ethiopia-internal" analysis between all pairs of individuals, or restricting to individuals that are from the same self-reported group label, whose group labels belong to the same major language group, or who speak the same first language, second language or have the same religious affiliation. Results are also shown after first conditioning on geographic distance ("Geo") or elevation difference ("Elev"), and rescaling each to span the same range as the first column. analysis. In the second row ("Geo distance/Elevation") and second column ("All"), values give the proportions of 1000 permutations that were more associated with genetic similarity than the unpermuted data, when testing for an association with (a,c) geographic distance or (b,d) elevation difference (see Methods). Column 2 ("All") in the subsequent rows give analogous proportions when testing an additional factor (1st column) for association with genetic similarity after accounting for spatial distance: sharing a common group label ("Group label"), having ethnicities from the same language branch ("Lang group": AA Cushitic, AA Omotic, AA Semitic, NS Satellite-Core), sharing the same first language ("1st lang"), second language ("2nd lang") or religious affiliation ("religion"). Columns 3-7 depict results when permuting in a manner to account for each other factor. Figure 1b and Supplementary Figure 6 give the maximum across values in each row (*=ignored when determining this final p-value; see Methods). For the first row in Supplementary   Table 5a-d, the p-value to the right of the "/" is after adjusting geographic distance and elevation for each other (see Methods).  Table 6 Median and interquartile range (IQR) of spatial distance between individuals. These values, are shown for all pairwise combinations of individuals ("All"), or as the median/IQR across median distances of all pairwise combinations of individuals within each group label ("group label"), major language group ("lang group"), first language ("1st lang"), second language ("2nd lang") or religious affiliation ("religion").  Average genetic similarity under the "Ethiopia-internal" analysis between two individuals with religious affiliation = Christian ("C"), Muslim ("M") or Traditional ("T"), within each of 16

All
Ethiopian groups with n >= 5 sampled individuals from each of at least two of these religious affiliations. Also given is the average similarity between two individuals from separate religions ("C vs M", "C vs T") within each group. Asterisks denote that, within the ethnicity, the average genetically similarity is significantly higher between two individuals from that same  Table 8 Genetic association with cultural similarity. P-values from Mantel tests for association between genetic and cultural similarity among ethnicities ("All"), and from partial Mantel tests that accounts for one of geographic distance, elevation, or language branch (AA Cushitic, AA Omotic, AA Semitic, NS Satellite-Core) when testing for an association between genetic and cultural similarity (without adjusting for multiple comparisons). Within each analysis ("Ethiopia-internal", "Ethiopia-external"), the first row measures cultural similarity as the number of matching reported cultural practices across ethnicities, while the second row up-weights sharing of rare cultural practices among ethnicities (see Methods).  Columns include all Ethiopian groups, plus all non-Ethiopian groups with >= 7 sampled individuals that had relatively high genetic similarity to at least one Ethiopian group. Within each row X, dots denote groups (columns) for which the average genetic similarity between two individuals from group X is not significantly higher than that between an individual from X and an individual from the column group at Type I error rate = 0.001 (black) or 0.05 (green), unadjusted for multiple testing (see Methods). Also within each row X, colored rectangles enclose:

Analysis
between the group with highest genetic similarity to X at Type I error rate = 0.001, unadjusted for multiple testing. Blue and pink rectangles in Supplementary Figure 8a and Supplementary Figure   8b, respectively, signify that there are no other groups (columns) enclosed in white rectangles for the given row X, while green rectangles signify that there are. Ethnic group labels on axes are coloured by language classification for Ethiopian groups (legend in Figure 1a) and by major geographic region for non-Ethiopian groups. Ethiopian groups. fineSTRUCTURE's best-fitting tree relating its inferred clusters. Each leaf of the tree lists the number of individuals from each labeled group that were assigned to that cluster.

Supplementary
Contiguous clusters of the same color were merged into one of the 78 final clusters we used in analysis; we alternate colors here to assist visualisation. Labels for these 78 clusters are provided at right. Full details in Supplementary Data 4.
Supplementary Figure 11. Inferred clustering of Ethiopians using ADMIXTURE for K=2-15 clusters. Each bar is an individual, with colors representing clusters. Group labels along the x-axis are colored according to language group (see Figure 1a for key). Labels colors at left denote the new color added in that row. under the (a) "Ethiopia-internal" and (b) "Ethiopia-external" analyses. Circles are placed at the average of each group's individuals' locations, with Suri and Zilmamo slightly shifted (as they have the same such average). Note these three ethnic groups show a relatively high genetic similarity to each other under the "Ethiopia-internal" analysis, while their similarity is not notably higher than that between these three and other Nilo-Saharan speakers under the "Ethiopia-external" analysis (Supplementary Data 13). These observations are consistent with these three groups' relatively high genetic similarity to one another being primarily attributable to recently separating from one another and/or recently intermixing with each other.

Supplementary
between two sources, and red lines when assuming two pulses of admixture with distinct dates. In the middle plot, we provide the inferred date (in generations ago) and 95% CI. Curve patterns indicate the number of distinct admixing sources, in that any curves that increase with distance indicate the two reference populations are representing different admixing sources. In contrast, curves that decrease with distance indicate the two reference populations are representing the same admixing source. GLOBETROTTER infers a single admixture event (i.e. "one-date") between sources related to Egypt/Mota and Sudan_Dinka in the "Chabu" cluster (top row).
GLOBETROTTER infers a single admixture event between sources related to Mota/Kenya_Rendille and Egypt in the "Negede_Woyto" cluster (middle row). GLOBETROTTER infers admixture at one date between more than two sources (i.e. "one-date-multiway") related to Mota, Kenya_Rendille and Egypt in the "Gamo/Gofa/Dorze" cluster (bottom row).