Main

Models for the expression of human behavioural complexity during the Late Pleistocene (around 125–12 thousand years ago (ka)) often invoke demographic change1,2. By around 50 ka, technological innovations and symbolic behaviours (such as ornaments, bone tools, pigments and microliths) that were present earlier in the Middle Stone Age (MSA) become more consistently expressed across sub-Saharan Africa4,6,7. Archaeologists refer to this as the transition to the Later Stone Age (LSA)1,7,8,9. By around 20 ka, these material culture components were nearly ubiquitous, but regionally diverse. One explanation is that people began living in larger and/or more connected groups, with variations in population size and connectivity driving differences in material culture across space and time. Given the morphological variation among Late Pleistocene skeletons, interactions may have involved deeply structured populations2,10, consistent with some population history models based on genetics3.

The advent of genome-wide ancient DNA (aDNA) technology holds promise for better understanding major changes in material culture and hypothesized demographic shifts among ancient African foragers (Supplementary Notes 12). Compared to elsewhere, especially Europe, there has been little genomic investigation of ancient African peoples. Previously available aDNA sequences from sub-Saharan African foraging contexts11,12,13,14, despite being relatively recent (younger than about 9 ka), provide evidence of ancient genetic structure that has since been disrupted by demographic transformations (such as the spread of food production, as well as colonialism, imperialism, enslavement and modern sociopolitical reorganization). The structure of ancient populations cannot be robustly reconstructed based solely on genetic data from present-day people.

Here we present new genome-wide aDNA data and radiocarbon dates from three Late Pleistocene and three early to middle Holocene individuals associated with LSA technologies at five sites in eastern and south-central Africa: Kisese II and Mlambalasi Rockshelters in Tanzania, Fingira and Hora 1 Rockshelters in Malawi, and Kalemba Rockshelter in Zambia (Fig. 1a and Extended Data Table 1). Direct and indirect dates range from around 18 ka to 5 ka, doubling the time depth of aDNA reported from sub-Saharan Africa. We analyse these data together with the published sequences of 28 other ancient African individuals recovered from contexts spanning the past 8,000 years and largely associated with foraging at 17 sites in eastern, central and southern Africa. We also provide higher-coverage data for 15 of these individuals. Analysis of the ancient data together with sequences from present-day groups, aided by new statistical methods, enables a reconstruction of changes in regional- and continental-scale population structures among people who lived before the sweeping demographic changes of the past approximately 5,000 years. It also enables comparisons of Pleistocene forager population dynamics between the tropics and more temperate regions.

Fig. 1: Locations of the individuals analysed and PCA analysis.
figure 1

a, Locations of individuals analysed in this study. The shapes and colours of the symbols correspond to the PCA in b. 1, Shum Laka; 2, Mota Cave; 3, Kakapel RS (Rockshelter); 4, Nyarindi RS; 5, Jawuoyo RS; 6, White Rock Point; 7, Panga ya Saidi; 8, Makangale Cave; 9, Kuumbi Cave; 10, Gishimangeda Cave; 11, Kisese II RS; 12, Mlambalasi RS; 13, Fingira; 14, Hora 1; 15, Chencherere II; 16, Kalemba RS; 17, Ballito Bay; 18, Faraoskop RS; 19, St Helena. b, PCA results. Axes were computed using present-day groups from eastern (Dinka pastoralists), southern (Juǀ'hoansi foragers) and central Africa (Mbuti foragers). Small circles represent present-day individuals; other symbols represent ancient individuals (larger points corresponding to earlier individuals and black outlines to newly reported individuals). The lowest-coverage individual (from Mlambalasi), shown with an asterisk, has the most uncertain position. The base map in a is from Natural Earth (https://www.naturalearthdata.com). E., east.

The dataset

Of 31 samples (Supplementary Table 1), five petrous bones and one distal phalanx yielded aDNA sequences, which, after preparation of up to six libraries from each sample and enrichment for a panel of around 1.2 million single-nucleotide polymorphisms (SNPs), ranged in coverage from 0.001–3.2× (median, 0.06×) of targeted genome-wide SNP positions (Extended Data Table 1 and Supplementary Table 2). Additional archaeological and bioarchaeological information for these individuals is summarized in Supplementary Note 3. Direct 14C dates were attempted for the five petrous bones, but only two preserved sufficient collagen: Kalemba (I10726; 5,280–4,880 calibrated years before present (cal. bp), PSUAMS-4764) and Kisese II (I18821; 7,240–6,985 cal. bp, PSUAMS-4718) (Supplementary Table 3 and Supplementary Note 4). Moreover, a new date was generated on enamel carbonate for a published individual from Hora 1 (I2966; previously estimated around 8,100 bp, now directly dated to 9,090–8,770 cal. bp, PSUAMS-5145). Individuals from Mlambalasi (I13976; about 20–17 ka) and Hora 1 (I19528, I19529; 17–14 ka) are well constrained to the Late Pleistocene based on multiple indirect dates (Supplementary Table 4 and Supplementary Note 3). One individual from Fingira (I11019) is represented by a distal phalanx that was recovered in isolation near the surface during excavation. This sample was too small to be both dated and assessed for aDNA; its age is constrained to around 6,200–2,300 cal. bp by association with direct dates on other human remains from the site. The 15 previously published individuals11,13,15,16 (Supplementary Note 3) for which we increase sequence coverage include approximately 26× shotgun coverage for the individual from Mota Cave in Ethiopia15 (I5950), enabling reliable calling of diploid genotypes (Extended Data Table 1, Methods and Supplementary Table 2). The authenticity of the new aDNA data was assessed through a combination of several criteria; detectable contamination was observed for only two samples (Methods, Extended Data Fig. 1a, Supplementary Table 2 and Supplementary Note 5). In Supplementary Table 5 and Supplementary Note 5, we report genotypes at SNPs associated with lactase persistence, sickle cell trait and the Duffy antigen, with derived alleles observed only at the DARC (Duffy) locus (four published individuals from Cameroon).

Uniparental markers

All four newly reported males are similar to most published ancient foragers from this region of Africa in carrying the widely distributed Y chromosome haplogroup B2 (Extended Data Table 1). Among the 23 individuals in our dataset with known mtDNA haplogroups, up to 14—almost all from Kenya and Tanzania—have haplogroups that are today associated with eastern Africa (Extended Data Table 1 and Supplementary Table 6). Eight individuals—all from Malawi and Zambia—have haplogroups that are associated with some ancient and present-day southern African people, specifically groups for whom foraging is the main mode of subsistence17,18,19,20. Two individuals from Malawi (I19529 from Hora 1, dating to about 16 ka and carrying L5b, and I4426 from Fingira, dating to about 2.3 ka and carrying L0f/L0f3) have eastern-Africa-associated haplogroups, whereas a different individual from Malawi (I2967 from Hora 1, dating to about 8.2 ka with L0a2/L0a2b) and possibly one from Kenya (I8930 from White Rock Point with L2a4) belong to lineages that are characteristic of present-day central African foragers (such as Mbuti and Aka). These results show that eastern and south-central Africa was home to, and an area of interaction among, diverse ancient foraging groups, and also that several of these haplogroup lineages were formerly more widespread than they are today.

Three-way cline of genome-wide ancestry

For the bulk of our analyses, we used the genome-wide genotype data to gain insights into the ancestry of the ancient forager individuals and their connections to other groups. We performed a supervised principal component analysis (PCA) (Methods) in which we used three present-day groups—Juǀ'hoansi (San) from southern Africa, Mbuti from central Africa and Dinka from northeastern Africa—to define a two-dimensional plane of variation, and projected all other individuals (ancient and present day) onto this plane (Fig. 1b). Consistent with previous studies5,11,13,14, we observed an ancient latitudinal gradient of ancestry, represented at its northern extreme by an individual from around 4.5 ka from Mota Cave, Ethiopia, and its southern extreme by individuals from around 2 ka from South Africa. The newly reported individuals generally cluster with their geographical neighbours but extend documentation of the cline both geographically (southwest to Kalemba, at the corresponding extreme on PCA) and temporally (to a maximum of approximately 18–16 ka, with no apparent temporal subclusters). Furthermore, we found complexity in the cline in the form of deviations from a straight line: (1) the main direction of variation does not align with ancient southern African foragers; and (2) several individuals appear to shift in the direction of present-day and ancient central African foragers. Both observations may indicate that some of the ancient eastern and south-central African individuals sampled here trace part of their ancestry to groups that are related to foragers currently living in central Africa. Furthermore, (1) could indicate that the southern-African-related ancestry among the ancient individuals is only distantly related to present-day Juǀ'hoansi and ancient southern African foragers.

We used allele-sharing tests (f-statistics) (Methods) to further investigate which individuals differed in their degree of relatedness to ancient South African foragers (AncSA) (Extended Data Table 1), the Mota individual or present-day Mbuti. Consistent with the PCA, most pairs of individuals from the same region (including from different time points) were nearly symmetric in their ancestry (|Z| < 3) (Supplementary Table 7). The exceptions were (1) excess affinity between Mbuti and KPL001 (Kakapel; max Z = 5.1); (2) excess affinity between AncSA and I0589 (Kuumbi Cave; max Z = 4.1); and (3) modest differences within Malawi and Zambia (max Z = 3.8). By contrast, well-powered cross-region statistics were highly significantly non-zero, for example, f4(I8808 (Jawuoyo), I8821 (Kisese II); Mota, AncSA) > 0, Z = 7.8. We also used the qpWave program in ADMIXTOOLS to combine multiple f-statistic-based signals into a test for the number of distinct components of ancestry (relative to a specified outgroup set) present among the (sampled) ancient forager individuals (Methods). We found that at least three sources are necessary (P = 6.4 × 10−14 for rejecting a two-source model) but, interestingly (with our available statistical power), that three sources are also sufficient (P = 0.73; four versus three sources P = 0.15), even with Mota, San (here, both Juǀ'hoansi and ǂKhomani) and Mbuti among the outgroups. When we added the Mota individual to the test set, we found increased evidence for a fourth source, despite the less stringent outgroups (P = 0.07; four versus three sources P = 0.019) (Methods). This result could reflect a highly divergent ancestry component contributing to the Mota individual inferred in previous work16; additional lineages may also have been present among as-yet unsampled ancient individuals from these regions.

We attempted to estimate the dates of admixture (potentially involving any distinct sources of ancestry) for the ancient foragers using DATES21. With the caveat that our power is limited by data availability, we obtained only two robust estimates (Supplementary Table 8), both for previously published individuals, and both (given the additional results below) are probably connected to admixture from food producers: for I4421 (Chencherere II, no direct age, past approximately 5,000 years), a date of 10 ± 2 generations before the individual lived; and for I1048 (Makangale Cave, direct age, past approximately 1,500 years), 79 ± 24 generations before the individual lived.

Inter- and intraregional relationships

Next, we modelled the ancestry of the ancient foragers in an admixture graph framework to test additional hypotheses concerning their relationships, aided by a new methodology to increase available information from low-coverage data (Figs. 2 and 3, Methods, Supplementary Notes 6 and 7 and Extended Data Figs. 25). In model 1, along with other populations, we included three geographically and genetically diverse ancient eastern and south-central African individuals with high sequencing coverage: I4426 (Fingira, about 2.5 ka), I8821 (Kisese II) and I8808 (Jawuoyo). On the basis of the results in the previous section, we hypothesized that they could be fit with mixtures of three ancestry components: one related to the Mota individual (representing an ancient group of foragers from the northern part of eastern Africa), one related to central African foragers (represented by present-day Mbuti) and one related to southern African foragers (represented by four ancient individuals from South Africa). Indeed, we obtained a good fit to the data in model 1 (max residual Z = 2.0), even when specifying identical sources for all three individuals, and the relative ancestry proportions were as expected: Mota-related ancestry decreased from north to south, and Jawuoyo (I8808) had the highest ratio of central-African-related ancestry to southern-African-related ancestry. Omitting any of the three components for any of the individuals results in a poor fit (Z ≥ 4.0) (Supplementary Note 6). As in ref. 16, we also estimated around 30% of a separate and deeply diverged ‘ghost’ ancestry component in the Mota individual (replicated here using new higher-coverage diploid whole-genome data).

Fig. 2: Schematic of admixture graph results.
figure 2

Branch lengths are not to scale. The arrows denote admixture events, with the three primary components of ancestry shown as dashed arrows, and other inferred gene flow as small solid arrows (with colours corresponding to related groups). Subclusters of ancient eastern and south-central African foragers reflect the inferred instances of excess relatedness among individuals, with internal branch lengths shown in genetic-drift units. Mixture proportions are shown in Fig. 3 and Supplementary Table 9 and the full results are shown in Extended Data Fig. 4. Individual laboratory numbers are shown at the bottom (Extended Data Table 1). N., north; W., west.

Fig. 3: Distribution of main ancestry components.
figure 3

Kriged distribution of the proportions of each of the three main ancestry components (summing to 1) found in ancient eastern and south-central African foragers analysed in this study (details are provided in Supplementary Table 9). The approximate present-day Mbuti home region is from ref. 41. Individuals from the same site were included using locations that differed by 0.000001 decimal degrees latitude to ensure representation in the interpolation. Scale bars, 250 km. Topographical data are from the Shuttle Radar Topography Mission (SRTM)42. SA, southern African.

When we added more individuals to create models 2 and 3 (max residuals Z = 3.0 and Z = 3.7), we found that the overall inferred structure and parameters were similar to those of model 1 (Supplementary Tables 9 and 10; see below for specific individuals and regions). The Mota-related and southern-African-related ancestry sources are inferred to split deeply along their respective lineages, meaning that, in some sense, they represent ‘ghost’ populations (without closely related sampled representatives). The central-African-related component is inferred to be closer to Mbuti (including an ancestral admixture event; Supplementary Note 6) than to Aka, and therefore to not split as deeply relative to the initial divergence of the central African forager lineage. Almost all of the additional significant allele-sharing signals that we observed beyond those in model 1 can be attributed to one of the three following causes (Supplementary Table 11): (1) excess relatedness at short-distance scales (see below); (2) admixture from pastoralists and/or farmers more recent than our period of focus (four individuals); or (3) contamination (two individuals). In these cases, we adjusted our final model by (1) allowing shared history (that is, genetic drift) between the relevant individuals; (2) adding the inferred admixture events; or (3) incorporating extra admixture to represent the contamination source (Supplementary Note 6).

For sites in western Kenya, we found that all three individuals in model 3 have excess relatedness beyond the baseline expectation (Fig. 2). The individuals from Jawuoyo (I8808) and Nyarindi (NYA002/NYA003) are the closest, and they can be modelled with Mota-related, central-African-related and southern-African-related ancestry in respective proportions of about 62%, 19% and 19%, while the individual from Kakapel (KPL001) is inferred to have around 12% additional central-African-related ancestry (s.e. of approximately 2–4% with some assumptions) (Fig. 3 and Supplementary Note 6). For north-central Tanzanian sites, again all four individuals have signals of mutual excess allele sharing, with the three individuals from Gishimangeda (I13763, I13982 and I13983) being the closest. One of the three (I13763) shows excess relatedness to non-African individuals, which we interpret as evidence of a small proportion of contamination (Supplementary Notes 5 and 6); otherwise, all four can be fit as a clade with 54%, 12% and 34% Mota-related, central-African-related and southern-African-related ancestry, respectively. Similarly, the three island and coastal individuals (Makangale Cave I1048, Kuumbi Cave I10589, Panga ya Saidi I0595) display excess relatedness, with those from Kuumbi Cave and Panga ya Saidi closest to one another, and with 49% Mota-related, 12% central-African-related and 39% southern-African-related ancestry. These individuals also have ancestry admixed from populations that are associated with food production: Agaw-related for all three, plus western-African-related for Panga ya Saidi (I0595) (Supplementary Note 6).

In contrast to Kenya and Tanzania, we did not observe widespread signals of excess relatedness in Malawi and Zambia. After adjusting for ancestry proportions, most individuals within this geographical cluster are no more related to one another than they are to individuals from Kenya and Tanzania. The only notable exceptions that we found among those in model 3 (Supplementary Note 6) were as follows: (1) among individuals from Fingira (I4426, I4427 and I4468), in particular, two dating to about 6.1 ka; and (2) between the individuals from 9–8 ka from Hora 1 (I2966 and I2967). However, other individuals separated by as little as 100–150 km (Fingira-Hora 1 and Chencherere II-Kalemba) can be fit well with independent mixtures of the same ancestry sources used across the entire study region, including some individuals around 700–1,500 km away. At the same time, the inferred ancestry proportions for the individuals from Malawi and Zambia are quite similar (about 20–30% Mota-related, 5–10% central-African-related and 60–70% southern-African-related), with significant (but small) differences observed for I4426 from Fingira (approximately 11% additional central-African-related ancestry), I4421 from Chencherere (approximately 4% ancestry related to pastoralists), I10726 from Kalemba (approximately 5% less Mota-related ancestry than in Malawi) and I2966 from Hora 1 (a small amount of contamination). We also built an alternative version of our model in which we specified the Malawi individuals as forming a clade descended from a shared three-way admixture event (plus small proportions of additional admixture for the aforementioned individuals) that had only a slightly worsefit—confirming the very similar ancestry proportions among the individuals—but that featured zero shared drift at the base of the clade and almost none on the internal branches (Supplementary Note 6 and Extended Data Fig. 6).

We examined the relationship between geographical distance and genetic relatedness using a new approach based on the residuals of a model assuming that there is no excess shared genetic drift—that is, we observed the similarity of genotypes within pairs of individuals relative to that predicted solely by differential proportions of the three ancestry sources (Methods). Using pairs of individuals from either Kenya and Tanzania, or Malawi and Zambia, together with inter-region pairs to plot the residuals as a function of distance, we found greater relatedness at short distances, but with different length scales for the decay of the fitted curves (about 60 km and about 3 km, respectively) (Extended Data Fig. 7a). Similar patterns are also observed if we omit pairs of individuals that were buried at the same site (Extended Data Fig. 7b). Thus, with the caveats that our sampling is not uniform and that not all of the individuals lived contemporaneously, we found on average that (1) individuals from the same or nearby sites are more closely related than predicted solely on the basis of the broad regional genetic structure, but (2) this relatedness extends only over short distances, particularly within Malawi and Zambia.

For a comparative perspective from contemporaneous ancient foragers in temperate environments, where there are more extensive available data, we performed similar analyses for individuals from Mesolithic Europe (n = 36, about 12–7 ka) (Methods, Supplementary Table 12 and Extended Data Fig. 7c, d). Both western and eastern/northern Europe also show a pattern of greater relatedness at shorter distances; western Europe is similar to Malawi and Zambia in that almost all of the signal comes from same-site pairs, but eastern/northern Europe has a substantially longer geographical decay scale.

Finally, we compared the ancient individuals to the present-day Sandawe and Hadza groups in Tanzania, who historically or recently practiced foraging lifeways. Previous studies have shown that the Hadza and Sandawe have distinctive ancestry from their neighbours, with unusually high proportions of ancestry related to ancient African foragers11,13,14,22. We built an extended version of model 2 including both groups (Extended Data Fig. 8 and Supplementary Note 6). In contrast to the general pattern for ancient individuals, we could not fit Hadza and Sandawe into a simple regional clade, even after accounting for recent admixture that is probably related to incoming pastoralists and farmers (contributing a total of about 41% and about 62% ancestry for these Hadza and Sandawe individuals, respectively). In particular, both were inferred to share a lineage closest to ancient foragers from north-central Tanzania, but the Hadza had excess allele sharing with the Mota individual, while the Sandawe had excess allele sharing with southern African foragers.

Effective population sizes

We inferred recent (up to about 500 years before the individual’s birth) ancestral effective population sizes (Ne) for the higher-coverage ancient individuals by scanning for long runs of homozygosity (ROH), which are expected to be present in the genomes of individuals either from populations with small sizes or whose parents have familial relatedness (the latter resulting in especially long ROH) (Methods and Extended Data Fig. 9). The calculation of Ne depends on several factors in addition to census population size; in particular, Ne is a function of both population density and the distance scale of those social interactions that lead to reproduction. All of the ancient individuals are inferred to have at least one long ROH (> 4 centimorgans (cM)), consistent with broad worldwide trends towards smaller population sizes in more ancient societies23. However, the Ne estimates vary by an order of magnitude, from individuals with minimal ROH, suggesting relatively larger population sizes (I5950 (Mota): Ne = 5,470, 95%  confidence interval (CI) = 1,237 to unbounded; I8821 (Kisese II): Ne = 2,640, 95% CI = 881–16,424) to those with an ROH of longer than 100 cM, indicative of much smaller population sizes (for example, I8808 (Jawuoyo): Ne = 377, 95% CI = 229–678). Overall, the range is similar to many African forager groups today (Ne, around 500–1,500)24 and towards the low end when compared with present-day population sizes worldwide23.

Discussion

In contrast to previous studies, our results show that a two-way clinal model extending latitudinally from eastern to southern Africa is insufficient to explain observed patterns of genetic variation in ancient sub-Saharan African foragers. Here we demonstrate that central-African-related ancestry (closest to present-day Mbuti among sampled populations), along with Mota-related and southern African-related ancestry, was ubiquitous (in varying proportions) from southwestern Kenya to southeastern Zambia (Fig. 3), with all three components present by at least about 7 ka in Tanzania and about 16 ka in Malawi. Furthermore, when considering ancient African foragers from a wide range of time periods, ecological contexts and archaeological associations, geographical proximity remains the strongest predictor of genetic similarity5,11. Such a pattern may indicate that long-range migrations were rare in the terminal Pleistocene and Holocene, when these individuals lived. This hypothesis is supported by the signals in our admixture graphs of excess genetic relatedness at subregional scales but not at longer-distance scales. Although it is not possible at present to estimate when and how quickly this three-way cline emerged, it must have post-dated both the emergence of the Mota-related lineage around 80–60 ka12,16 and, with respect to the central-African-related ancestry, the split between Aka and Mbuti less than around 50 ka25,26.

Although the observed cline of ancestry remained stable for thousands of years, we propose that it initially arose closer to this split time than to the terminal Pleistocene, and under qualitatively different patterns of mobility and admixture than after it was established. Dispersals, interactions and extensive admixture across eastern and south-central Africa before around 16 ka are evidenced by substantial proportions of ancestry related to the Mota (Ethiopia) individual as far south as Zambia, and ancestry related to southern African foragers as far north as Kenya, in combination with a high degree of homogeneity of ancestry in each subregion after that time. If patterns of mobility and social interactions had remained consistent throughout the Late Pleistocene and Holocene, we would expect to find broad evidence of longer-range ancestry connections within eastern and south-central Africa and beyond, but we observed only two significant plausible instances among our sampled individuals (involving extra central-African-related ancestry in one individual each from Kenya and Malawi).

However, within the three-way population structure, we observed distinct regional trajectories. Individuals from Kenya and Tanzania form three clusters (western Kenya, north-central Tanzania and coastal/island), with individuals in the same cluster showing excess allele sharing even beyond what would be expected from having similar ancestry proportions. This suggests that there is elevated gene flow within each subregion, on a distance scale estimated as approximately 0–100 km. By contrast, the only signals of elevated relatedness detected for individuals from Malawi and Zambia involve those buried at the same site, and can span 1,000–3,600 years (for example, at Fingira). This pattern is best explained by low average human dispersal/interaction distances during much of the Late Pleistocene and Holocene, with the establishment of the broad-scale ancestry cline followed by, on average, more local interactions that differed by region. We observed a similar pattern in ancient foragers from western Europe, whereas those from northern and eastern Europe show longer distance scales of relatedness. This provides genetic evidence that the average distances between where people lived and where their ancestors lived (and therefore the average distances of human movement, especially with respect to reproductive partners) differed among foragers in different regions.

Our genetic findings offer new insights on demographic processes of the Late Pleistocene to Holocene that were previously studied using bioarchaeological, archaeological and linguistic evidence. Beginning approximately 300 ka, archaeological evidence attests to the long-distance movement of materials such as obsidian, presumably facilitated by social networks27. Exchange intensified through the Late Pleistocene to become a hallmark of the LSA, culminating in elaborate transport networks and shared material culture traditions by the Early Holocene1,4,28,29. However, the extent to which people were moving with objects remains an open question. Our genetic results support a scenario in which human mobility and longer-range gene flow occurred with the development and elaboration of long-distance networks approximately 80–20 ka, contributing to the formation of a population structure that persisted over tens of thousands of years during a period when people were living locally.

Genetic evidence also adds weight to arguments for changing Late Pleistocene interaction spheres, with limited gene flow accompanying changes in behaviour and possibly linguistic boundaries. However, at this juncture, we are unable to assess hypothesized population density shifts, based on heightened evidence for symbolic expression at LSA sites and the appearance and disappearance of specific artefact types8,9,30,31,32. Our genetic estimates of recent effective population size are consistent with those of at least some present-day African foragers24, but they are not good comparators due to demographic pressures recently placed on such groups33. Furthermore, small subpopulations with limited gene flow could result in low ancestral effective population sizes even if the region’s total population is high. Preservation of genetic diversity through the existence of many subpopulations over long time scales could also be a contributor to the high levels of genetic diversity observed in most present-day sub-Saharan African groups.

The LSA archaeological record testifies to the appearance of well-defined, temporally and spatially bounded material culture traditions34,35, a phenomenon that is sometimes referred to as regionalization. Faunal data indicate subsistence intensification after around 20 ka36,37, and linguistic data also suggest shifts toward local interactions, reflected in the fact that, today, communities that are presently or historically associated with foraging in central, eastern and southern Africa speak languages of different families (in central Africa, adopted from recent arrivals). At the same time, past regional connectivity and borrowing was such that linguists previously characterized ‘click’ languages as a single family, and the proposed grouping of Khoe–Kwadi–Sandawe strengthens evidence for longer-distance ties between eastern and southern Africa38,39. Our genetic results confirm that trends toward regionalization extended to human population structure, suggesting that decreasing gene flow accompanied changes in behaviour and possibly language.

Conclusions

Demographic transformations in the past approximately 5,000 years have fundamentally altered regional population structures and largely erased what was, by the Late Pleistocene, a well-established three-way cline of eastern-, southern- and central-African-related ancestry that extended across eastern and south-central Africa. Groups who historically forage have frequently been pushed to marginal environments and have experienced transformative demographic changes, making it difficult to learn about deep history from present-day DNA. Today, Africa houses the greatest human genetic diversity, but undersampling of both living and ancient individuals obscures the origins of this diversity40. We show that aDNA from tropical Africa can survive from the Pleistocene and reveal patterns that could not be inferred from populations that lived even a few millennia later, underscoring the breadth of African genetic diversity and the importance of eastern and south-central Africa as long-term reservoirs of human interaction and innovation.

Methods

Skeletal samples

The skeletal remains that were sampled in this study are curated at the National Museum of Kenya (Kisese II), the National Museum of Tanzania (Mlambalasi), the Malawi Department of Museums and Monuments (Hora 1 and Fingira) and the Livingstone Museum (Kalemba), and sampling permissions and protocols are described in Supplementary Note 3. Individuals were chosen based on their associated LSA archaeological contexts, and skeletal samples were selected to maximize the likelihood of yielding authentic aDNA and to minimize damage. The Fingira phalanx was an isolated find from a mixed excavation context, and too small to provide both aDNA and a direct date. A list of both successful and failing samples is provided in Supplementary Table 1. Direct radiocarbon dating was attempted on five of the six successful individuals at the Pennsylvania State University Radiocarbon Laboratory using established methods and quality control measures for collagen purification43,44 before accelerator mass spectrometry analysis (Supplementary Note 4). A list of direct date and stable isotopic results for the two successfully dated individuals, and indirect dates where available for the other individuals, is provided in Supplementary Tables 3 and 4. All dates were calibrated using OxCal (v.4.4)45, with a uniform prior (U(0,100)) to model a mixture of two curves: IntCal20 (ref. 46) and SHCal20 (ref. 47).

aDNA laboratory work

We successfully generated genome-wide aDNA data from a total of six human skeletal elements: five petrous bones and one phalanx. We processed an additional six petrous bones, eight teeth and 11 other bones in the same manner but did not obtain usable DNA (Supplementary Table 1). In clean room facilities at Harvard Medical School, we cleaned the outer surfaces of the samples and then sandblasted (petrous bones)48 or drilled (other bones and teeth) to obtain powder (additional information for the 15 previously published samples reported here with increased coverage can be found in refs. 11,13,15,16). We extracted DNA49,50,51 and prepared barcoded sequencing libraries (between one and six libraries for the six newly reported individuals, and between one and eight additional libraries for the previously reported individuals: from Mota Cave in Ethiopia15 (I5950); White Rock Point in Kenya13 (I8930); Gishimangeda Cave in Tanzania13 (I13763, I13982 and I13983); Chencherere II (I4421 and I4422), Fingira (I4426, I4427 and I4468) and Hora 1 (I2967) in Malawi11; and Shum Laka in Cameroon16 (I10871, I10872, I10873 and I10874), treating in almost all cases with uracil-DNA-glycosylase (UDG) to reduce aDNA damage artefacts52,53,54. We used two rounds of targeted in-solution hybridization to enrich the libraries for molecules from the mitochondrial genome and overlapping a set of around 1.2 million nuclear SNPs55,56,57,58 and sequenced in pools on the Illumina NextSeq 500 and HiSeqX10 machines with 76 bp or 101 bp paired-end reads. Further details on each library are provided in Supplementary Table 2. For the Mota individual (I5950), we also generated whole-genome shotgun sequencing data, using the same (pre-enrichment) library, with seven lanes with 101 bp paired-end reads (on Illumina HiSeq X Ten machines) yielding approximately 26× coverage (1,176,635 sites covered from the capture SNP set).

Bioinformatics procedures

From the raw sequencing data, we used barcode information to assign reads to the proper libraries (allowing at most one mismatch per read pair). We merged overlapping reads (at least 15 bases), trimmed barcode and adapter sequences from the ends, and mapped to the mtDNA reference genome RSRS59 and the human reference genome hg19 using BWA (v.0.6.1)60. After alignment, we removed duplicate reads and reads with mapping quality less than 10 (30 for shotgun data) or with length less than 30 bases. To prepare data for analysis, we disregarded terminal bases of the reads (2 for UDG-treated libraries and 5 for untreated, to eliminate most damage-induced errors), merged the .bam files for all libraries from each individual, and called pseudohaploid genotypes (one allele chosen at random from the reads aligning at each SNP). The high coverage for the Mota whole-genome shotgun data enabled us to call diploid genotypes; we used the procedure from ref. 26, including storing the genotypes in a fasta-style format that is easily accessible through the cascertain and cTools software. Code for bioinformatics tools and data workflows is provided at GitHub (https://github.com/DReichLab/ADNA-Tools and https://github.com/DReichLab/adna-workflow).

Uniparental markers and authentication

We determined the genetic sex of each individual according to the ratio of DNA fragments mapping to the X and Y chromosomes61. We called mtDNA haplogroups using HaploGrep2 (ref. 62), comparing informative positions to PhyloTree Build 17 (ref. 63) (Supplementary Table 6). For four individuals (I2967, I4422, I4426 and I19528) with evidence of haplogroups that split partially but not fully along more specific lineages, we use the notation [HaploGrep2 call]/[sub-clade direction] (for example, L0f/L0f3 for a split on the lineage leading to L0f3 but not within L0f3). For males, we called Y-chromosome haplogroups by comparing their derived mutations with the Y-chromosome phylogeny provided by YFull (https://yfull.com).

We evaluated the authenticity of the data first by measuring the rate of characteristic aDNA damage-induced errors at the ends of sequenced molecules. We next searched directly for possible contamination by examining (1) the X/Y ratio mentioned above (in case of contamination by sequences from the opposite sex), (2) the consistency of mtDNA-mapped sequences with the haplogroup call for each individual64 and (3) the heterozygosity rate at variable sites on the X chromosome (for males only)65. Two individuals (I2966 from Hora 1 and I13763 from Gishimangeda Cave) had non-negligible evidence of contamination from these metrics and also displayed excess allele sharing with non-Africans in the admixture graph analysis; we were able to fit them in the final model after allowing ‘artificial’ admixture from a European-related source (6% and 9%, respectively). We also restricted ourselves to damaged reads in making the mtDNA haplogroup call for I2966. Further details are provided in Supplementary Table 2 and Supplementary Note 5.

Familial relatives

We searched for close family relatives by computing, for each pair of individuals, the proportion of matching alleles (from all targeted SNPs) when sampling one read at random per site from each. We then compared these proportions to the rates when sampling two alleles from the same individual—mismatches are expected to be twice as common for unrelated individuals as for within-individual comparisons, with family relatives intermediate. We found one possible instance between the two individuals from White Rock Point (approximately second-degree relatives, but uncertain due to low coverage) (Extended Data Fig. 1b)

Dataset for genome-wide analyses

We merged our newly generated data with published data from ancient and present-day individuals11,12,13,14,16,25,26,66,67. We performed our genome-wide analyses using the set of autosomal SNPs from our target enrichment (about 1.1 million).

PCA

We performed a supervised PCA using the smartpca software68, using three populations (Juǀ'hoansi, Mbuti and Dinka; four individuals each, from ref. 26, were chosen to create a broad separation in the PCA between highly divergent ancestral lineages from southern, central and eastern Africa) to define a two-dimensional plane of variation, and projected all other present-day and ancient individuals (using the lsqproject and shrinkmode options). This procedure captures the genetic structure of the projected individuals in relation to the groups used to create the axes, reducing the effects of population-specific genetic drift in determining the positions of the individuals shown in the plot, as well as bias due to missing data for the ancient individuals.

f-statistics

We computed f-statistics in ADMIXTOOLS69, with standard errors estimated by block jackknife. To facilitate the use of low-coverage data, we used a new program, qpfstats (included as part of the ADMIXTOOLS package), together with the option ‘allsnps: YES,’ for both stand-alone f4-statistics and statistics for use in qpWave and qpGraph (see below). In brief, qpfstats solves a system of equations based on f-statistic identities to enable the estimation of a consistent set of statistics while maximizing the available coverage and reducing noise in the presence of missing data; full details are provided in Supplementary Note 7. We computed statistics of the form f4(Ind1, Ind2; Ref1, Ref2), where Ind1 and Ind2 are ancient individuals from Kenya, Tanzania or Malawi/Zambia, and Ref1 and Ref2 are either ancient southern African foragers (AncSA, listed in Extended Data Table 1), the Mota individual or present-day Mbuti. These groups were chosen in light of our PCA results and the previous evidence for ancestry related to some or all of them among ancient eastern and south-central African foragers5,11,14.

qpWave analysis

The qpWave software70 estimates how many distinct sources of ancestry (from 1 to the size of the test set) are necessary to explain the allele-sharing relationships between the specified test populations and the outgroups (where ‘distinct’ means different phylogenetic split points relative to the outgroups). Each test returns results for different ranks of the allele-sharing matrix, where rank k implies k + 1 ancestry sources. For absolute fit quality, we give the ‘tail’ P value, where a higher value indicates a better fit. We also give ‘taildiff’ P values as relative measures comparing consecutive rank levels, where a higher value indicates less improvement in the fit when adding another ancestry source. As our base test set, we used the 12 ancient eastern and south-central African forager individuals (3 from Kenya, 3 from Tanzania, 5 from Malawi and 1 from Zambia) from our admixture graph Model 3 who did not have evidence of either admixture from food producers or contamination. We also compared results when adding the Mota individual to the test set. As outgroups, we used Altai Neanderthal, Mota and the following eight present-day groups: Juǀ'hoansi, ǂKhomani, Mbuti, Aka, Yoruba, French, Agaw and Aari, with the last two (as well as Mota) omitted when we moved Mota to the test set.

Dates of admixture

We inferred dates of admixture using the DATES software21. We used a minimum genetic distance of 0.6 cM, a maximum of 1 M and a bin size of 0.1 cM. As reference populations, we used ancient southern African foragers together with one of Mota, Dinka, Luhya, Yoruba or European-American individuals (the latter three from 1000 Genomes: LWK, YRI and CEU). The results assume an average generation interval of 28 years, and standard errors were estimated by block jackknife.

Admixture graph fitting

We built admixture graphs using the qpGraph software in ADMIXTOOLS69. We chose to analyse each eastern and south-central forager individual separately rather than form subgroups (for example, by site or time period) to study both broad- and fine-scale structure (through relationships between individuals with both low and high degrees of ancestral similarity). Although such an approach was facilitated by our relatively manageable sample sizes, it also relied on the ability to compute f-statistics with our qpfstats methodology (further details are provided in Supplementary Note 7 and the ‘f-statistics’ section above) to make use of all available SNPs for individuals with low-coverage data. For all of the models, we used the options ‘outpop: NULL’, ‘lambdascale: 1’ and ‘diag: 0.0001.’ We also specified larger values of the ‘initmix’ parameter to explore the space of graph parameters more thoroughly: 100,000, 150,000 and 200,000 for models 1–3 (and additional models built from them), respectively.

We began with a version of the admixture graph from ref. 16, to which we added three high-coverage ancient forager individuals (from Jawuoyo, Kisese II and Fingira) to create model 1. We then extended our model to more individuals. We used a procedure in which we (1) added each other ancient individual one by one to model 1 and evaluated the fit; (2) built an intermediate-size model 2 including a total of 11 geographically diverse eastern and south-central African foragers; (3) added the remaining individuals one by one to model 2; and (4) built our final Model 3 with all 18 individuals above a coverage threshold of 0.05× (Supplementary Note 6). In steps (1) and (3), as a starting point, we assumed a simple form of admixture (as in model 1) whereby all eastern and south-central African individuals derived their ancestry from exactly the same three sources (in varying proportions). If we found that an individual did not fit well when added in this manner, we noted the specific violation(s) to determine whether the likely cause(s) were excess relatedness to certain other individuals, distinct source(s) for the three-way admixture, admixture from other populations, or contamination or other artefacts. For the two individuals (one from Hora 1 and one from Gishimangeda) with evidence of appreciable contamination, we included dummy admixture events contributing non-African-related ancestry. Full details on our fitting procedures are provided in Supplementary Note 6.

Excess relatedness analysis

To study excess relatedness between individuals after correcting for different proportions of Mota-related, central-African-related and southern-African-related ancestry, we built an admixture graph similar to our main model 3, but in which each forager individual is descended from an independent mixture of the three ancestry components, without accounting for excess shared genetic drift. We also included four additional individuals with lower coverage (three from Kenya and one from Chencherere II in Malawi), but excluded the two early individuals from Hora 1 due to their much greater time depth compared with other individuals in the model. Finally, for individuals modelled with admixture beyond the primary three sources (that is, pastoralist-related ancestry for four individuals, western-African-related ancestry for the Panga ya Saidi individual and the excess central-African-related ancestry for the Kakapel individual, plus dummy admixture for contamination), we locked the relevant branch lengths and mixture proportions at their values from model 3 to prevent compensation for the inaccuracies in the model by these parameters. We next used the residuals (fitted minus observed values) of each outgroup f3-statistic f3(Neanderthal; X, Y) to quantify the excess relatedness between individuals X and Y that is unaccounted for by the model. In other words, we fit each individual as we did during the add-one phase of the main admixture graph inference procedure (except here all simultaneously) but now, instead of using the model violations to inform the building of a well-fitting model, we used them directly as the output of the analysis.

We plotted the excess relatedness residuals for each pair of individuals as a function of great-circle distance between sites, as computed using the haversine formula (also adding a dummy value of 0.001 km to each distance). We fit curves to the data with the functional form 1/mx, additionally allowing for translation (full equation: y = 1/(mx + a) + b, where y is excess relatedness, x is distance, and m, a and b are fitted constants) through inverse-variance-weighted least squares. We also omitted the point corresponding to the pair of individuals from White Rock Point (Kenya) because of their evidence for close familial relatedness (see above). Finally, we computed a decay scale for the curves given by the formula (e – 1)× a/m (where e is Euler's number). We note that a residual (that is, y axis) value of zero has no special meaning in the plots.

For Mesolithic Europe, we performed two analogous analyses, one for the western part of the continent and one for eastern and northern. In the first analysis, we selected individuals with predominantly western hunter-gatherer (WHG)-related ancestry, while in the second analysis, we selected individuals who could be modelled as admixed with WHG as well as eastern hunter-gatherer (EHG)-related ancestry (Supplementary Table 12). In both cases, we built simple admixture graph models to estimate the residuals. For western Europe, we used the Upper Palaeolithic Ust’-Ishim individual from Russia71 as an outgroup and fit all of the test individuals as descending from a single ancestral lineage. For eastern and northern Europe, we used Ust’-Ishim as an outgroup, Mal’ta 1 from Siberia72 for a representative of ancient northern Eurasian ancestry, Villabruna from Italy73 for WHG, Karelia from Russia56,58,73 for EHG (admixed with ancestry related to Mal’ta and to Villabruna) and finally the test individuals each with independent mixtures of WHG and EHG-related ancestry in varying proportions.

Effective population size inference

We called ROH starting with counts of reads for each allele at the set of target SNPs (rather than our pseudohaploid genotype data), which we converted to normalized Phred-scaled likelihoods. We performed the calling using BCFtools/RoH74, which is able to accommodate unphased, relatively low-coverage data (at least for calling long ROH) and does not rely on a reference haplotype panel. The method is also robust to modest rates of genotype error, such as that which could occur here as a result of aDNA damage or contamination, although we recommend some caution in interpreting the results for I2966 (Hora 1) and I0589 (Kuumbi Cave; for this analysis only, we used the version of the published data with UDG-minus libraries included, for a total of around 2× average coverage). We also note that the nature of any possible effect on the final inferences is uncertain; errors could deflate the population size estimates by breaking up ROH, but they could also break very long ROH into shorter but still long blocks, which have the strongest influence on the population size estimates. In the absence of population-level data from related groups, we specified a single default allele frequency (‘--AF-dflt 0.4’) and no genetic map (although we subsequently converted physical positions to genetic distances using ref. 75, which we expect to be reasonably accurate at the length scales that we are interested in). For our analyses, we retained ROH blocks with length >4 cM. In three instances, we merged blocks with a gap of <0.5 cM and at most two apparent heterozygous sites between them.

From the ROH results, we applied the maximum likelihood approach from ref. 23 to estimate recent ancestral effective population sizes (Ne). We used all ROH blocks of longer than 4 cM, except for three individuals (KPL001 from Kakapel in Kenya, I9028 from St Helena, South Africa, and I9133 from Faraoskop, South Africa) with high proportions of very long ROH (a sign of familial relatedness between parents—approximately at the first-cousin level in these cases—rather than of longer-term low population size), for whom we used only blocks from 4–8 cM.

We note that, even within a randomly mating population, the number and extent of ROH can vary substantially between individuals, which is reflected in the large standard errors of the Ne estimates for small sample sizes. We also note that recent admixture can influence ROH (and therefore Ne estimates) by making coalescence between an individual’s two chromosomes less likely, but on the basis of the other results of our study, we do not expect a substantial effect for these individuals.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.