Main

The medieval and early modern Swahili culture of eastern Africa from the seventh century ad was defined by a set of shared features: a common language of African origin (Kiswahili), a shared predominant religion (Islam) and a geographic distribution in coastal towns and villages. People of the Swahili culture lived across a vast coastal region that included northern Mozambique, southern Somalia, Madagascar and the archipelagos of Comoros, Kilwa, Mafia, Zanzibar and Lamu1 (yellow outlines in Fig. 1a). Millions of present-day coastal people identify as Swahili, although for many this is a secondary identity, with primary identities often being more based on town of origin, family history or traditional social status5. How people who identify as Swahili in the present day relate to people of the medieval and early modern Swahili culture has been difficult to elucidate in the absence of ancient DNA.

Fig. 1: Dataset overview.
figure 1

a, Coastal areas associated with the medieval Swahili culture are shown in yellow. Sites represented in the ancient DNA samples are marked with black shapes. Numbers in parentheses are formatted X|Y, where X is the number of individuals for whom there are data, and Y is the number of individuals for whom we report high-resolution analyses. The chronology is given as the union of 95% confidence intervals for direct radiocarbon dates on the skeletons rounding to the nearest 50 years and is shown as calibrated (cal.) yr ad, or as ad for sites with only archaeological context (Supplementary Information; Extended Data Table 1). The base map was made with the Natural Earth R package using a CC BY license. Bodies of water were added from the RCMRD Geoportal with a CC BY license in R. All points were added and modifications were performed in R and Adobe Illustrator. The map is published for the first time in this Article. b, In a principal component analysis, eigenvector 1 correlates to variation maximized in sub-Saharan Africa, and eigenvector 2 correlates to Eurasian variation. c, Ancestry component assignment using ADMIXTURE with K = 9 clusters (selected on the basis of low cross-validation errors, high log-likelihood scores, and a low number of reference populations to not overfit; groups that maximize each components are shown on the right). Individuals with sufficient data for high-resolution analysis are plotted in approximate chronological order from left to right. Ancient individuals are labelled and plotted at four times the width of present-day individuals.

The medieval largely autonomous towns and polities known as the Swahili states arose out of fishing and agropastoral settlements on the eastern African coast during the late first millennium ad6. First millennium ad sites on the littoral, beginning in the seventh century, were part of a shared material culture and practice network across the eastern African region7. These sites were engaged in the Indian Ocean trading system, facilitated by southwest monsoons from May to October that enabled merchant vessels to travel from India or the Arabian Peninsula to the eastern African coast, and northeast monsoons from November to March enabling their return in the same year8.

Muslims were present from the eighth century ad, probably as a minority9. A major archaeological transition is evident during the eleventh century, with the establishment of new settlements and the elaboration of older ones with coral-built mosques and tombs, a set of changes generally understood as coinciding with the widespread adoption of Islam10. At this time, clearer distinctions also emerged between coastal ceramics and material traditions and those of inland assemblages2,11, even as many aspects of material culture remained deeply linked with inland African groups.

The political and administrative independence of Swahili towns diminished in the sixteenth century as the Portuguese naval and economic dominance in the Indian Ocean spread12. In the early eighteenth century, the Portuguese influence waned and the Sultanates of Oman and later Zanzibar became dominant4. In the nineteenth century, the growth of overseas trade, including in enslaved people, led to large-scale population movements from central regions of Africa and settlers from the Yemeni region of Hadramawt4,13. In the mid-nineteenth century, Britain and other European powers became dominant, leading to the settlement of Europeans, the arrival of labourers from South Asia, and further interactions with non-coastal eastern Africans.

In light of this multi-layered history, the extent to which people who identify as Swahili in the present day are genetically linked with people who built the medieval trading towns is unclear, as is the relationship of the medieval groups to earlier groups. Although the intercontinental connections maintained as part of the Indian Ocean trading network meant that foreigners were consistently present along the coast, the extent to which they had families with African residents has long been debated14.

Swahili traditions suggest that foreigners had an important impact; a set of common oral histories relates the founding of coastal towns to the arrival of a group known as the Shirazi, referring to a region in Persia3. This Shirazi tradition was put into writing in the Kilwa Chronicle in the sixteenth century15. These accounts of Shirazi roots were central to the narrative constructed by mid-twentieth century colonialist archaeologists, who interpreted second millennium coastal eastern African sites as having been built by Persian and Arab settlers, and focused on connections with the broader Indian Ocean world16.

However, narratives of foreign origin have the potential to be misleading, as Swahili social ‘elites’ used claims of foreign origin and rejection of cultural connections within Africa to establish their social status and to signal their religious and cultural affinities17,18.

Recent research has shown that archaeology during colonial times tended to ignore the evidence of deep African roots, emphasizing foreign objects at medieval Swahili sites rather than providing a balanced picture of the archaeological record2. Imports at most coastal sites typically comprise less than 5% of total assemblages2,9. Other aspects of the material culture also show continuity with earlier settlements, including the persistence of crops, domesticated animals, craft styles and ceramics9,19. Linguistic evidence provides additional evidence of African roots: Kiswahili is an African Bantu language with Asian loanwords20. However, without ancient DNA evidence, it is not possible to directly address questions of how genetic ancestry changed over time.

We generated ancient DNA data from the skeletal remains of individuals found at six coastal or island towns: Mtwapa, Manda, Faza, Kilwa, Songo Mnara and Lindi. These individuals date to ad 1250–1800 but provide insight into genetic events from the tenth century ad onwards. We also generated ancient DNA from the remains of individuals found at the site of Makwasinyi (postdating around ad 1650), about 100 km inland from the southern Kenyan coast, which was inhabited by people who were in cultural contact with coastal groups. We compare the newly reported data from the ancient individuals with that of present-day coastal Swahili speakers and with previously published data from diverse ancient and present-day eastern African and Eurasian groups.

Dataset overview

We generated 179 ancient DNA libraries from 156 distinct skeletal samples (Supplementary Data File 1 and Methods). We applied in-solution enrichment for a targeted set of about 1.2 million single nucleotide polymorphisms (SNPs) to obtain genome-wide data passing standard measures of ancient DNA authenticity from 80 distinct individuals. The individuals were buried at seven second millennium ad sites in eastern Africa (black shapes in Fig. 1a; individuals are listed in Extended Data Table 1 and Supplementary Data File 2; see Methods, ‘Inclusion and ethics’ and Supplementary Information for details of archaeological and genetic permissions and sampling). We obtained direct radiocarbon dates (Supplementary Data File 3) for 33 of the skeletons, and estimated date ranges for the other individuals on the basis of archaeological context or genetic evidence of relatedness to individuals for whom we had direct dates (Extended Data Table 1 and Supplementary Data File 2). Because of the reliance on seafood, old carbon entering the food chain (marine reservoir effects) could mean that the radiocarbon dates of some individuals are older than their true dates. Moreover, differences in the dependence on marine food across the archaeological sites could mean that the relative chronology of the coastal individuals and sites may not be possible to determine with full confidence. We also generated new genome-wide data on the Affymetrix Human Origins SNP array from 93 present-day individuals who identified as Swahili and indicated that their ancestors lived for many generations in coastal towns21 (Supplementary Data File 4). Finally, we generated new genome-wide data from 19 individuals from Madagascar, and 10 from the United Arab Emirates.

Three sites were northern coastal towns: Mtwapa (Supplementary Fig. 1, 48 individuals spanning ad 1250 to ad 1650), Faza on Pate Island (1 individual), and Manda Island (Supplementary Fig. 2, 8 individuals spanning ad 1450 to ad 1650). Three additional sites were southern coastal towns: Songo Mnara (Supplementary Fig. 3 and Supplementary Table 1, 7 individuals spanning ad 1300 to ad 1800), Lindi (1 individual at ad 1500 to ad 1650), and Kilwa Kisiwani (2 individuals spanning ad 1300 to ad 1600). The remains at Mtwapa, Manda and Songo Mnara were mainly from Muslim burials of elites, often located near mosques. We do not have enough context for the Faza, Kilwa and Lindi burials to know if they followed the same pattern. We also analysed 13 individuals from Makwasinyi (ad 1650–1950), approximately 100 km inland from the coast of present-day Kenya. Although these burials post-date the coastal sites, the Makwasinyi community traded with coastal peoples while remaining isolated in most respects. We hypothesized that the ancestry of Makwasinyi people might be a good proxy to represent inland African groups that may have been in contact with people from medieval towns on the northern Swahili coast in previous centuries22.

Of the 80 individuals for whom we report data, we exclude 26 in genome-wide analyses, although their data remain valuable. Of these, 18 had too few SNPs for high-resolution whole-genome analyses although they yielded useful data such as reliably determined mitochondrial sequences; 5 were first- or possibly second-degree relatives of other individuals in the dataset with higher-quality data; 2 showed evidence of contamination; and 1 was a population genetic outlier with limited data, raising the possibility of contamination (Supplementary Data File 2).

Ancient DNA data from four individuals from the eastern African coast have previously been published (Supplementary Data File 5), but none have been published from a Swahili town23. An individual from around ad 1400 whose remains were recovered from Makangale Cave on Pemba Island had ancestry predominantly related to western African groups23 (an ancestry common today in speakers of Bantu languages and prevalent in eastern Africa—hereafter referred to as ‘Bantu-associated’). Another individual from Makangale Cave on Pemba Island dated to around ad 600, an individual from around ad 600 from Kuumbi Cave on Zanzibar Island, and an individual from around ad 1500 from Panga ya Saidi in Kenya all had predominantly sub-Saharan African forager-associated ancestry23. There is no indication of Eurasian ancestry deriving from migrations in the last 2,000 years in any of these individuals, which differs from nearly all of the individuals from medieval coastal towns newly reported here.

In this Article, we use ‘African ancestry’ to refer to DNA deriving from people who can be genetically well-proxied by sub-Saharan Africans for whom there is published ancient DNA data dating to between 2000 bc and ad 1000. We use the terms ‘Eurasian’, ‘Persian’, ‘Arabian’ and ‘Indian’ to refer to ancestry that can be proxied by modern populations from these regions and that are not known to be similar to ancestry in sub-Saharan Africans between 2000 bc and ad 1000. The evidence that a proportion of the ancestors of Africans from between 2000 bc and ad 1000 may have come from Eurasia—for example, approximately 40% of the ancestry of people of the eastern African Pastoral Neolithic culture24—does not contradict these definitions, as all humans are mixed at multiple time depths of history. As long as we specify both the time and the geography for the source populations, we can be precise in the use of the term ‘African ancestry’25.

Genetic affinities

To obtain a qualitative picture of the sources of the ancestry in the ancient individuals, we performed principal component analysis (PCA) (Fig. 1b). We used 1,286 present-day Eurasian and African individuals to compute the axes (Supplementary Data File 6). We projected the newly reported ancient individuals onto this PCA and found that they form a cline, with one end overlapping with ancient and present-day African groups and the other falling between present-day Persians and Indians. This suggests mixtures of different proportions of source populations at either end of the cline, with these sources potentially having multiple deeper ancestry components. Some coastal individuals—particularly from Songo Mnara and Lindi, do not fall on this cline—suggesting additional complexity, although our power to understand this variability is limited by the small sample sizes. Similar patterns are evident with unsupervised clustering using ADMIXTURE, which further suggests sub-Saharan African-associated components, southwest Asian-associated components, and East Asian or Indian-associated components (Fig. 1c and Extended Data Fig. 1).

Proportions of African, Persian and Indian DNA

Using qpAdm26, we find that most medieval and early modern individuals can only be fit by a model with at least three ancestry components that can be proxied with ancient African, present-day Iranian and present-day Indian populations (Fig. 2a, Extended Data Table 2, Supplementary Tables 2–10 and Supplementary Information). Such a three-source model fits the pool of 48 Mtwapa individuals and the Faza individual (P value for fit = 0.23); the pool of Manda (P = 0.28) individuals, and at least one Songo Mnara individual (I19550) (P = 0.38). One Kilwa individual (I8816) had a relatively high proportion of ancestry related to inland sub-Saharan Africans, and so the Indian proportion, which is the smallest contributor, falls below the threshold of definitive detection (a two-source model fits; P = 0.27).

Fig. 2: Individual ancestry proportions.
figure 2

a, Inferences from qpAdm (see Extended Data Table 2 and Supplementary Information for model details and statistical fit). Blue represents African ancestry: the most common are Bantu-associated (common at southern sites) and Makwasinyi associated (northern sites), which itself is approximately 80% Bantu-related and 20% pastoralist-related. Yellow represents Southwest Asian ancestry: Persian or Arabian. Grey represents Indian ancestry. Bars represent s.e.m., computed using a block jackknife across all 5-centimorgan (cM) segments of the autosomes, and are meaningful even for single individuals as the genome contains information from a large sample size of ancestors. b, Ternary plot of Makwasinyi, Persian and Indian ancestry components in Mtwapa and Faza (red (high coverage) and yellow (low coverage)) and Manda (blue (high coverage) and green (low coverage)). Individuals with higher coverage (>100,000 SNPs overlapping positions on the Human Origins SNP array) are used to fit a linear regression (dashed line), which intersects at nearly 100% Makwasinyi and 0% Persian and Indian, consistent with a Makwasinyi-related population with little or no recent Asian ancestry mixing with an already-mixed Persian–Indian population. c, Bar graph showing P values from Hotelling T-squared tests for a qpAdm model with a mixed Persian–Indian source. The x-axis specifies the proportion of Persian ancestry in the source.

The type of African ancestry needed to make the models fit differed between individuals from the north (Mtwapa, Faza and Manda) and south (Kilwa and Songo Mnara) of the studied region. In Kenya, the best-fitting proxy African source is the inland Makwasinyi individuals (Extended Data Table 2), who are themselves well-modelled as mixtures of about 80% Bantu-associated and 20% ancient eastern African Pastoral Neolithic ancestry24 (Fig. 3a, Extended Data Table 3, Supplementary Tables 2 and 3 and Supplementary Information). In Tanzania, the best-fitting African proxy source is Bantu-associated without evidence of a Pastoral Neolithic contribution. We use the individual buried at around ad 1600 in Lindi as a proxy Bantu-associated source for the Kilwa individual and individual I19550 from Songo Mnara (Extended Data Table 2).

Fig. 3: Inferred admixture events along the eastern African coast.
figure 3

African populations are represented in shades of blue; Southwest Asian and Indian populations are represented in shades of yellow. Populations with both colours represent those that are admixed between the corresponding proxy source populations.

Although three continental sources are required to fit the data, the individuals from Manda, Faza and Mtwapa form a cline in PCA, suggesting two proximal source populations (Fig. 2b). Using linear regression, we extrapolate the ancestry of these two sources and infer that one was consistent with a 100% African origin (Supplementary Fig. 5 and Supplementary Information). The same analysis concludes that the other source had both Persian and Indian ancestry. This is consistent with sub-Saharan Africans mixing with a group that already had a mixture of Persian and Indian ancestry components. Given the two different African sources for the northern and southern individuals, there must have been at least two, but possibly more, admixture events. This would be expected if people of mixed Persian and Indian ancestry had children with people from different local African populations at different locations along the coast.

When we analyse the Mtwapa and Faza, Manda, Kilwa and Songo Mnara individuals separately, the estimated proportions of Eurasian ancestry from India overlap, which could be consistent with a homogeneous source population of mixed Indian–Persian ancestry for all sites (Fig. 2c). However, a variable proportion among the early immigrants cannot be ruled out, and thus we cannot distinguish between scenarios of two or more streams of Persian–Indian migrants. Our statistical power to detect Indian ancestry relies on pooling of data from multiple individuals. There are a number of individuals with limited data or low Eurasian ancestry, for example individual I8816 from Kilwa, for which we have little power to detect Indian ancestry and cannot definitely document it (Supplementary Information).

Males from Persia and females from Africa

We tested whether male and female ancestors contributed the same proportions of African-like, Persian-like and Indian-like ancestry to ancient individuals in the northern coastal sites and Kilwa (Table 1). To carry out this analysis, we used the fact that chromosomes 1–22, the X chromosome, mitochondrial DNA (mtDNA) and the Y chromosome are passed down to subsequent generations in different ways by males and females. We could not perform the same analysis at Songo Mnara because no individuals with high quality data fit the three-way model.

Table 1 Population mixture and evidence of sex bias in three groups from the Swahili coast

We first analysed mtDNA (Extended Data Table 1 and Table 1). Analysing 62 individuals with confidently determined mtDNA haplogroups (including relatives and individuals with low coverage genome-wide data), we find that 59 carry an L* haplogroup, which in the present day is almost entirely restricted to sub-Saharan Africans27. The exceptions are a pair of first- or second-degree relatives from Mtwapa carrying M30d1, which in the present day is largely restricted to South Asia27, and an individual with haplogroup R0+16189, which today is characteristic of Saudi Arabia and the Horn of Africa28. These results are consistent with female ancestry deriving overwhelmingly from African sources.

Analysing male-transmitted Y chromosome DNA, we find that two out of three non-first-degree related males from Manda carry haplogroup J2, and the third carries G2. Both haplogroups are characteristic of Southwest Asia (plausibly Persia) and are largely absent in sub-Saharan Africans29. The Kilwa individual also carries J2. Fourteen out of 19 males from Mtwapa have Y chromosome haplogroups in the J family, and two are of the R1a haplogroup, all considered typically non-African. Only 3 out of 19 Mtwapa males, along with the Faza male, are in the E1 family characteristic of sub-Saharan Africa.

We next compared chromosome X, which occurs as two copies in females and one in males and so reflects mostly female history, to the autosomes (chromosomes 1–22), which equally reflect female and male history (Methods and Supplementary Information). Chromosome X estimates of African ancestry are higher than on the autosomes at all sites, providing an independent line of evidence that African ancestry is primarily from females and Persian ancestry is primarily from males (Table 1). Assuming that the mixture occurred over just a few generations, we obtain quantitative estimates of the proportion of African ancestry from females as 100% at Manda, 69–97% at Mtwapa-Faza, and 69–100% at Kilwa (Methods and Table 1). We estimate Persian ancestry at Mtwapa-Faza and Kilwa to be 100%, and at Manda as 90–100%. If the mixture occurred over more generations, we cannot obtain a point estimate, but can nevertheless infer primarily African female and Persian male sources.

Together, these multiple lines of evidence show that the Southwest Asian ancestors of the Mtwapa and Faza, Manda and Kilwa individuals were almost entirely male, whereas the African ancestors were almost entirely female.

Mixing began by ad 1000

We estimated when mixture occurred on the basis of the sizes of stretches of ancestry inherited from the ancestral populations, which break up at a known rate every generation30. We calculated 95% confidence intervals for the inferred dates of ad 795–1085 for a pool of the northern Mtwapa, Faza and Manda individuals, and ad 708–1219 for a pool of the southern Kilwa and Songo Mnara I19550 individuals (Table 1 and Extended Data Fig. 3). The uncertainty intervals overlap from ad 795 to ad 1085. These estimates would be biased too old if there was a marine reservoir effect. The inferred dates also reflect an assumption that the mixture occurred all at once; however, mixture of Eurasians and Africans was certainly drawn out over a number of generations, and indeed the historical evidence and our genetic analysis that follows document continued incorporation of migrants from both inland Africa and Eurasia until the present. However, simulations show that mixture must have begun by the inferred date31, and thus we can be confident that already-mixed males with both Indian and Persian ancestry were present along the coast by around ad 1000, and began mixing with primarily female sub-Saharan Africans by that time.

Arabians and other migratory influences

Although almost all the coastal individuals we analysed had Asian ancestry, there were exceptions. Some early modern individuals at Lindi and Songo Mnara showed no evidence of recent Asian ancestry (I14001 and I7944) (Extended Data Fig. 2a, Extended Data Table 3, Supplementary Table 11 and Supplementary Information). We find possible Malagasy-associated ancestry in Songo Mnara (I19547) (Extended Data Fig. 2a, Extended Data Table 3 and Supplementary Information). Our finding of coastal individuals who differ from others from similar times or regions attest to continued exchange with people in the Indian Ocean trading network, although our sample size is too small to identify general patterns.

For some of the individuals in our study with Asian ancestry (from Manda, Kilwa and I19550 from Songo Mnara), there is only evidence of Persian or Persian–Indian ancestry. However, other individuals, particularly from Mtwapa, can only be modelled by sources with some Arabian-associated ancestry when using other Mtwapa or Manda individuals as a source (Supplementary Fig. 6, Supplementary Tables 12–15 and Supplementary Information). We were unable to determine the exact source of the Arabian-associated ancestry. However, we know that it is somewhere on the genetic gradient between Arabians and Persians. A proxy source that provides a fit for this Arabian-related ancestry at Mtwapa is present-day people who live on the shores of the Strait of Hormuz, which separates the Arabian Peninsula from Iran. The Strait of Hormuz and the Swahili coast were under Omani control by the end of the seventeenth century.

Direct genetic evidence for Arabian-associated migration comes from two individuals from Songo Mnara. Both date to the early modern period when contacts with Arabia are well documented, and they can only be modelled with Arabian-related ancestry in qpAdm (Fig. 2a). Analyses of present-day coastal populations also point to Arabian genetic influences. Whereas the Asian ancestry of some individuals can be modelled entirely as Persian–Indian as in medieval Manda, for other individuals, ancestry from the Strait of Hormuz is a better fit (coloured salmon in Supplementary Data File 7).

Relating medieval to modern Swahili

We assembled genome-wide data for two groups of present-day people who identify as Swahili: 89 with previously reported data32 and 93 for which we generated new data.

Most of the individuals in the previously published dataset (87 out of 89) have only a modest inheritance from people with ancestry resembling the medieval people that we sampled from nearby coastal areas (Extended Data Fig. 2a, Extended Data Table 3, Supplementary Table 16 and Supplementary Information). We estimate that 11 ± 4% have medieval Swahili-associated ancestry; 84 ± 3% have Bantu-associated ancestry; and 6 ± 3% have Pastoral Iron Age-associated ancestry24 (Extended Data Fig. 2a and Extended Data Table 3). These patterns are mirrored in the Y chromosome haplogroups, with 95% having typically African haplogroups—in contrast to the medieval coastal individuals, for whom almost all Y haplogroups are associated with Near Eastern people (Tables 1 and 2). However, the newly generated data show a much higher inferred proportion of ancestry from groups similar to the medieval ones, ranging from 46 to 77% medieval Swahili-associated ancestry (depending on whether we use a pool of Manda and Mtwapa individuals as the sources) (Extended Data Table 4 and Supplementary Data File 7). Y chromosome haplogroups in the new data are also consistent with a greater contribution from medieval peoples: the African-associated Y-haplogroup frequency was 45%, much larger than the 17% in the medieval individuals, but smaller than the 95% estimated in the previously published study (Table 2).

Table 2 Comparison of haplogroup distributions in medieval Swahili individuals with those from two studies of present-day Swahili people

The differences in the proportions of ancestry in the present-day individuals identifying as Swahili may reflect differences in how they were estimated. The previously published dataset included people from the coastal towns of Kilifi, Lamu and Mombasa in Kenya who indicated that their family had been Swahili-speaking for the past three generations. The newly published dataset included people from 13 locations along the Kenyan coast who indicated that their ancestors had lived in coastal towns and had a Swahili identity for many generations, thus enriching for more traditional upper-class Swahili people who plausibly retained more ancestry from medieval coastal individuals30. The greater medieval coastal ancestry may also reflect isolation. In the newly published dataset from the mainland, individuals from the site of Jomvu Kuu had significantly less medieval coastal ancestry than the individuals from the other sites, all of which were from islands that were plausibly more isolated from admixture with inland groups (P = 3.3 × 10−6, one-sided Wilcoxon rank sum test).

We used the lengths of African and Persian genetic segments to estimate the age of admixture in the newly published modern individuals30 at ad 1096–1410. These data are more recent than the medieval coastal samples, consistent with ongoing mixture with African or Asian populations since medieval times.

No recent Asian ancestry in inland people

The Makwasinyi individuals date to the past three centuries, and are from deep in the Tsavo region, nearly 50 km from the nearest population centre. The Makwasinyi group fits as a proxy source for African ancestry in qpAdm modelling of the Mtwapa, Faza and Manda individuals, but unlike these individuals, qpAdm finds no evidence of recent Asian ancestry in the Makwasinyi group (similar to present-day non-coastal populations; Supplementary Table 17). Instead, Makwasinyi individuals are similar in ancestry to the modern individuals identified as Swahili in the previously reported dataset32. They are well-modelled as 21.3 ± 1.2% Pastoral Neolithic-associated ancestry (from herders present in eastern African after 3000 bc) and 78.7 ± 1.2% Bantu-associated ancestry (from farmers present after 1000 bc) (Extended Data Fig. 2a). We did not detect sex bias in the history of the formation of this population (Methods, Supplementary Information). The average date for the Bantu-Pastoral Neolithic-associated mixture is around ad 300–1200 (Extended Data Fig. 3), with most of this range consistent with the archaeological evidence for the impact of the Bantu expansion on this region.

Discussion

A key finding of this study is evidence of mixture at roughly ad 1000 between peoples of African and Persian ancestries (Fig. 3 and Table 1). This is consistent with the Kilwa Chronicle, which describes the arrival of Persians on the Swahili coast and interactions between them and coastal residents. Whether or not this history has a basis in an actual voyage, the ancient DNA provides direct evidence for Persian-associated ancestry being derived overwhelmingly from males and arriving on the eastern African coast by about ad 1000. This timing coincides with archaeological evidence for a substantial cultural transformation on the coast, including the widespread adoption of Islam10. At Kilwa, coin evidence has dated a ruler linked to a Shirazi (Persian) dynasty, Ali bin al-Hasan, to the mid-eleventh century33. The genetic evidence suggests that this arrival was accompanied by mixture, which began by ad 1000, and continued later. People of both African and Asian ancestry made major contributions, with African proportions of approximately 57% on average at Mtwapa and Faza, 32% at Manda, 67% at Songo Mnara and 74% at Kilwa (Table 1, Extended Data Table 2 and Supplementary Table 8).

Archaeological evidence provides important context for our genetic findings. The individuals that we analysed lived in the thirteenth to eighteenth centuries, and were excavated mostly from elite contexts. However, coastal sites from around ad 1000, when the mixture occurred, showed little evidence for distinct societal elites. Three of the sites sampled here (Mtwapa, Songo Mnara and Faza) did not exist as towns in ad 1000, and so these admixed populations moved to those towns later. Thus, the elite inhabitants of Mtwapa and other sites developed from admixed populations and were not foreign migrants or colonists.

Linguistic evidence provides further context. Kiswahili is a Bantu language, and since most ancestry in medieval Swahili people derives from African people, our results suggest that the children of immigrant men of Asian origin adopted the languages of their mothers, a common pattern in matrilocal cultures34. However, Kiswahili also has non-African influences, reflecting a millennium and a half of intense interaction with societies around the Indian Ocean rim. Persian loanwords contribute up to 3% of Kiswahili, but it is unclear whether they are derived directly from Persian or through adoption into other Indian Ocean languages35. Arabic loanwords are the single largest non-Bantu element in Kiswahili35 (16–20% of words), and may be primarily due20 to incorporations after ad 1500.

A recurrent theme of our findings is the different participation of males and females in population mixture events. We find evidence of predominantly male Southwest Asian ancestors mixing with predominantly female African and, to a much lesser extent, female Indian ancestors in the lineages of medieval people on the Swahili coast. This provides evidence for asymmetric social interactions between groups as cultural contact occurred, although such genetic data cannot reveal the processes contributing to these patterns.

This study provides information from only a subset of times and places relevant to the medieval coastal civilization, and it is important to recognize these biases. The geographical coverage is skewed towards Kenya, with the individuals from Tanzanian sites such as Songo Mnara, Kilwa and Lindi being sufficient to identify similarities and differences in ancestry profiles from Kenya but not sufficient to define a general pattern. In addition, the individuals that we analysed were not fully representative of all social and economic groups in Swahili society. Nearly all of the coastal graves and tombs in this study occupied prominent positions in medieval and early modern townscapes (see Supplementary Information). With the possible exceptions of Kilwa, Lindi and Faza, we analysed elite individuals from high-profile coastal sites. However, the Swahili cultural world included many non-elite settlements, where ancestry might be systematically different36. Our analysis of data from two samplings of present-day people who identify as Swahili with different strategies for determining this identity also reveals qualitative differences in ancestry patterns, revealing how groups identified as Swahili retain high substructure and variation today.

These findings highlight multiple directions for future work on ancient DNA. One approach is to study individuals pre-dating the twelfth century, including before and after the major population mixtures that we show occurred around ad 1000. Another approach is to study individuals from unsampled parts of the Swahili world, including the present-day countries of Somalia, Mozambique, the Comoros Islands and Madagascar. However, the results presented here provide unambiguous evidence of ongoing cultural mixing on the East African coast for more than a millennium, in which African people interacted and had families with immigrants from other parts of Africa and the Indian Ocean world. Narratives of ancestry on the eastern African coast have a complex history, and the genetic findings of long-standing, sex-biased mixtures add to this complexity.

Methods

Inclusion and ethics

The present-day communities of the Swahili coast have strong traditions of connection to the people of the medieval coastal towns—including in some cases a tradition of descent from the people who lived in these communities, as well as shared language and religion—and thus community consultation is an important part of this work. Many medieval and present-day coastal people are also Muslims, and thus it is important to carry out any analysis in ways that are sensitive to Muslim proscriptions against disturbing the dead. The sampling for this study emerged from decades-long community-based archaeology projects—including by S.K. and C.M.K. in the Kenya region, and by S.W.-J., J.F., Tanzanian Antiquities and the Songo Mnara Ruins Committee in the Tanzania region—that involved participation in and return of results to local communities, following rules for handling and reburying of remains agreed by the communities. Prior to submission of this study for publication, the corresponding authors held return-of-results consultation meetings in Lamu, Songo Mnara and Kilwa Kisiwani, and feedback from these engagements was incorporated into the final manuscript. The community meeting held in Lamu was both in person and on zoom. Mtwapa no longer has a viable Swahili community. The descendants of the residents of Manda are no longer visible, but given Manda’s proximity to Lamu, we chose Lamu as the most viable community to host the forum. Makwasinyi elders were consulted in 2001 and 2002 during fieldwork and again to discuss the results of the present analysis.

Ancient DNA data generation

To generate the genetic data for the samples in Extended Data Table 1 and Supplementary Data File 2 (by-library metrics in Supplementary Data File 1), we used established protocols in dedicated clean rooms, involving first sampling typically 40 mg of powder from skeletal remains, then extracting DNA using methods designed to retain short, damaged fragments37, then building individually barcoded libraries38 after incubation with uracil DNA glycosylase (UDG) with the goal of greatly reducing the characteristic errors typical of ancient DNA. We amplified libraries and carried out in-solution enrichment of them for about 1.2 million SNPs39 as well as baits targeting the mitochondrial genome40. We sequenced the enriched products along with a small amount of unenriched library on Illumina NextSeq 500 or HiSeqX10 instruments. The resulting sequenced paired-end reads were separated into respective libraries and stripped of identification tags and adapter related sequences. Read pairs were merged prior to alignment, requiring a minimum of 15 base pair overlap, allowing one mismatch if base quality was at least 20, or up to three mismatches at lower base quality. We restricted to sequences that were at least 30 base pairs long. The resulting sequences were aligned using samse from bwa-v0.6.141 separately to both the human genome reference sequence (hg19) and the mitochondrial RSRS genome42 to allow the targeted nuclear and mitochondrial specific sequences to be assessed.

We performed a series of quality control measures. We inferred the rate of mismatch of the sequences mapping to the mitochondrial consensus sequence (of libraries with at least twofold coverage) and flagged libraries where the upper bound of the inferred match rate43 was at least 95%; we also examined the rate of polymorphism on the X chromosome in males and flagged individuals as appreciably contaminated if the lower bound of the 95% confidence interval44 was at least 1%. We tested if the ratio of Y chromosome to the sum of X and Y chromosome sequences was in the range expected for DNA entirely from a female (<3%) or male (>35%), and we tested for an elevated rate of cytosine to thymine substitution in the final nucleotide using a threshold of at least 1.8% (this was a lower threshold than is typical in ancient DNA studies due to the young age of the samples; the median damage rate of libraries is 9.5%). If a sample had sufficient evidence of contamination, we restricted to only damaged molecules of sequenced DNA, and these samples are denoted with ‘_d’ in their Genetic ID in Supplementary Data File 2. Several samples did not pass quality control; these are any samples where all libraries failed in Supplementary Data File 1. No individual from two sampled sites—Bungule and Shanga—passed quality control. For each individual, we represented each position in the genome based on a single randomly selected sequence

A total of 80 individuals passed quality control screening. However, the analyses in this study focus on the 54 non-outlier individuals that are not first- or second-degree relatives of another individual in the study for which we have better quality data and for which we have data on a minimum of 15,000 SNPs. Relative relationships are determined by a method similar to READ45.

Supplementary Data File 1 provides a full report of sequencing results on both libraries that yielded data passing standard measures of ancient DNA authenticity and libraries that failed screening; we also report negative results as a guide for future sampling efforts.

Genotyping of modern Swahili individuals

We genotyped 93 present-day Swahili individuals, originally collected in ref. 21, on the Affymetrix Human Origins SNP array26. As described in the study where mtDNA and Y chromosome haplogroup information for these individuals was initially reported21, individuals known to the research team to have origins in or near the targeted recruitment communities assisted in participant recruitment in a form of ‘snowball sampling’. On sampling days, potential participants belonging to families known to have long residency in the community were approached and, if they consented to participation, would be asked afterwards if they knew of anyone else not closely related to them who might be available that day who met the inclusion criteria of being a healthy individual of at least 18 years of age who is a resident of that town and whose grandparents all identify as belonging to a group typically included in the broader Swahili identity. Each participant was provided information about the project in either English or Swahili from Institutional Review Board (IRB)-approved forms and those who elected to continue provided written consent. Protocols for collecting saliva and genealogical information were reviewed and approved by the Lehman College (no. 141-09-070) and City University of New York Integrated IRB (no. 323935). The saliva samples and genealogical information were collected in December 2009 and January 2010 in the Kenyan towns of Faza, Jomvu Kuu, Kizingitini, Lamu, Mikindani, Ndau, Pate, Siyu, Tchundwa and Wasini. A total of 96 samples were chosen for whole-genome analysis to maximize geographic spread, with 16 individuals each from Faza, Kizingitini, Ndau, Pate, Wasini and Jomvu Kuu. We limited to individuals who self-reported that the origin of both their parents and grandparents are local and Swahili, with a priority for males. Data from a total of 93 individuals passed quality control and their data are reported and analysed.

We genotyped on the Human Origins SNP array 19 newly reported individuals from Madagascar, for whom samples were collected between 2007–2014 with approval by the Human Subjects’ Ethics Committees of the Health Ministry of Madagascar and by committees in France (Ministry of Research, National Commission for Data Protection and Liberties and Persons Protection Committee). Individuals all gave written consent before the study. Samples came from two locations in the north and south of the country. Sampled villages were founded before 1900, and individuals were 61 ± 15 years old, with the maternal grandmother and paternal grandfather born within 50 km of the sampling location.

We genotyped on the Human Origins SNP array 10 newly reported Emirati samples that were collected from Emirati nationals from the city of Al Ain, United Arab Emirates. Ethical approval was obtained from the Al Ain District Human Research Ethical Committee. The samples were collected from healthy adults, and first genotyped on another SNP array as controls for a rare disease study46.

A list of the newly reported individuals with data on the Human Origins SNP array and some relevant information can be found in Supplementary Data File 4.

Principal component analysis

We used smartpca version 18180 from EIGENSOFT version 8.0.047 with optional parameters (numoutlieriter: 2, numoutevec: 3, lsqproject: YES, newshrink: YES, and hiprecision: YES). We computed eigenvectors of the covariance matrix of SNPs from present-day individuals from Africa and Eurasia genotyped on the Affymetrix Human Origins (HO) SNP array, which targets approximately 600,000 SNPs that are a subset of those targeted by in-solution enrichment (see Supplementary Information for a list of populations used). Ancient individuals and present-day individuals (most genotyped on the HO SNP array, and some also genotyped on the Illumina Human Omni5 Bead Chip32) were projected into the three-dimensional (3D) eigenspace determined by the first three principal components.

ADMIXTURE clustering analysis

We prepared data for the ADMIXTURE48 plots in Fig. 1c and Extended Data Fig. 1 using PLINK249. We applied the maf 0.01 option, which only includes SNPs with minor allele frequency of at least 0.01. We pruned SNPs based on LD, using the indep-pairwise option with a window size of 200 variants, a step size of 25 variants, and a pairwise r2 threshold of 0.4. We ran 4 replicates of ADMIXTURE with random seeds with K = 4 to K = 12 ancestral reference populations.

Estimates of mixture proportions

We used qpAdm (ADMIXTOOLS version 7.0.2)26 to test if the Makwasinyi, Manda, Mtwapa and Faza, Kilwa, and Songo Mnara target populations can be formally modelled as having derived from a set of source populations (termed ‘left’ groups) relative to a set of reference populations (termed ‘right’ groups). See Supplementary Information for further details. qpAdm also provides a P value for fit to the model based on a block jackknife across all 5 cM segments of the autosomes.

In our application of qpAdm, we use a cycling approach, treating the target as a linear combination of all possible subsets of the candidate source populations, and moving the other candidate source population to the right. Cycling populations to the right allows us to test if a proposed set of left source populations is consistent with being more closely related to the target than other populations. Thus, we can build the closest admixture model within the constraints of our dataset and test its fit to the data.

Y and mitochondrial haplogroups

Mitochondrial haplogroups were determined using HaploGrep 250. Y chromosome haplogroups were determined according to the Yfull tree version 8.09 (https://github.com/YFullTeam/YTree/blob/master/ytree/tree_8.09.0.json)51.

Sex bias from X-autosome comparisons

We estimated the extent to which the ancestry from each source population was contributed by female and male ancestors. To do this, we compared the inferences of proportions of ancestry on chromosomes 1–22 (the autosomes) which reflect 50% female and 50% male inheritance (the autosomal coefficient for proportion of a specific ancestry, CA = (m + f)/2), and on chromosome X, which reflects 67% female and 33% male ancestry (the X chromosome coefficient for proportion of a specific ancestry, CX = (2f + m)/3). We determined Z-scores for differences between these two estimates as in ref. 52. For each of Mtwapa (including the one Faza individual), Manda, Kilwa and Makwasinyi, we quantified sex bias in the following manner. (1) We sampled 106 sets of coefficients by generating random numbers from a multivariate normal distribution based on the qpAdm-determined jackknife mean coefficients and the error covariance matrix for both the autosomes and chromosome X, separately. We ensured that the eigenvalues of the matrices are all greater than or equal to zero by adding a small offset to the matrices determined as the absolute value of its respective minimum eigenvalue. (2) We removed from consideration all infeasible sets of coefficients, namely those that include a coefficient below 0. (3) We next calculated the female and male proportions of ancestry for each source population. Given the coefficient CA on the autosomes and the coefficient CX on the X chromosome, we can determine the likely proportion of female and male ancestry from any given source by solving the system of two above equations, giving f = 3CX − 2CA and m = 4CA − 3CX. We calculate these f and m proportions for each source population for the sampled set of autosomal and X chromosome coefficients. (4) We normalize the proportions of female and male ancestry as \(\hat{f}=f/(f+m)\) and \(\hat{m}=m/\left(f+m\right)\) for each source for the sampled sets. (5) We found the mean and standard deviation of all the \(\hat{f}\) and \(\hat{m}\) values from the sampled set for each of the source populations. (3) We took 95% confidence intervals of all the sampled sets of ancestry proportions and record the values as ranges in Table 1. Under the simple model of all the mixture occurring at once, these formulae can be interpreted as estimating the fraction of ancestors at the time of mixing that are on the male or female side. However, if the mixture was more gradual, the interpretation is more complicated albeit still informative about sex bias.

Proximate sources of the coastal cline

We plot all the Mtwapa, Faza and Manda individuals on a ternary plot with their respective proportions of Makwasinyi, Iranian and Indian ancestry. We apply a linear regression to the high coverage individuals from Mtwapa (I19381, I19394, I19384, I19414, I19420, I19413, I19417, I19401, I23662, I17409, I19391, I13611, I17412, I17413, I23558, I19415, I23561, and I21475) and Manda (I7934) as seen in the ternary plot (see Fig. 2b and Supplementary Fig. 3). The associated Cartesian equation is f(x) = 0.09856x + 0.01301. The line intersects the Makwasinyi axis at 98 ± 6% (left circle in Fig. 2b and Supplementary Fig. 3), allowing for 2 ± 6% Indian ancestry. This is statistically consistent with 100% Makwasinyi and 0% Iranian and 0% Indian ancestries. The other end of the regression line intersects the Indian axis (right circle in Fig. 2b and Supplementary Fig. 3) at 12 ± 5% Indian ancestry and 88 ± 5% Iranian ancestry. We further refined these proportions by jackknifing the coefficient estimates. We estimate that one admixing source population is 98.4 ± 2.5% Makwasinyi-related, consistent with a parsimonious model of 100% Makwasinyi ancestry at that end of the cline. The other source population is 12.2 ± 2.8% Indian and 87.8 ± 2.8% Iranian (see Supplementary Information).

Determining the date of mixture

DATES measures the covariance between pairs of positions in the genome separated by a specified genetic distance, for an admixture model with two source populations30,31. This analysis is applied separately to all individuals, and the inferences are pooled across individuals to increase resolution. We pool the individuals from Mtwapa, Faza, and Manda into a northern group, and the individuals from Kilwa and Songo Mnara I19550 into a southern group. We use a 28 ± 2 year-to-generation conversion estimate53 to calculate the average date of the admixture.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.