Introduction

The Indian Ocean can be considered as a closed sea, an afro-Asiatic Mediterranean,1, 2 around which populations have migrated and mixed. In contrast to the Atlantic Ocean, which was a formidable natural barrier to East–West migration, the Indian Ocean with its seasonal monsoon winds favoured such exchanges, and most of the early trade routes were maritime. The Comoros archipelago is situated in the western Indian Ocean, midway between the island of Madagascar and the coast of East Africa at the northern end of the Mozambique Channel. The archipelago is composed of four main islands: Grand Comore (Ngazidja), Anjouan (Ndzuani), Mohéli (Mwali) and Mayotte (Maore). The settlement of the four islands was an integral part of migration within the Indian Ocean, as they represent a potential maritime crossroads and juncture, between Bantu African, Middle Eastern and Southeast Asian (SEA) spheres of influence. The modern Comorian population is the result of a long-term process of biocultural admixture, mainly related to ancient trade and colonisation in the Indian Ocean.

The Comoros and Madagascar share obvious signs of SEA influence including the cultivation of rice (phased out during twentieth century), bananas and coconuts, and the use of outrigger canoes. Evidence from plant translocation suggests a migration from SEA 1500 Years Before Present (YBP).1, 3, 4 Clear genetic evidence for the SEA influence has been found on neighbouring Madagascar.5, 6, 7, 8 On the basis of Y chromosome and mitochondrial variation, ethnic groups with the strongest SEA biocultural features in Madagascar were estimated to have approximately 50% SEA ancestry.5, 8 In contrast to Madagascar where the language, Malagasy, is an Austronesian language with origins in SEA, the languages spoken on the Comoros are of Bantu origin. They are distinct from, but have close affinity to, Swahili, both branching from the precursor Sabaki language, 1000–2000 YBP.9

The cultural contributions of Middle Eastern civilisation are equally evident on the Islands. By 2000 YBP, a thriving commercial maritime network already existed, extending from the Middle East to India, and as far South as Tanzania on the East African coast. The name ‘Comoros’ is from the Arabic Kmr, meaning ‘light in the sky’.3 From 1300 YBP, the Comoros archipelago served as a stepping stone, for Middle Eastern traders operating along the East African coast, and for SEA traders travelling to Madagascar and the East African coast.10, 11 By 1000 YBP, the Shirazi, traders with origins in the Persian city of Shiraz in present day Iran, had established themselves on the island of Kilwa. The Shirazi were responsible for the generalisation of Islam on the Swahili coast by 500 YBP. They had built mosques on Kilwa, Zanzibar and Anjouan by 800 YBP.12 Islam remains the religion of the Islands today.

An unambiguous genetic signal from the Middle East has not, however, been detected in East Africa further south than Ethiopia,13, 14 or in the ethnic groups sampled on Madagascar.5, 8 The Lemba people of South Africa, carrying a putative semitic Y chromosome, currently provide the only evidence for gene flow from the Middle East into southern Africa.15 The alleles of some autosomal genes found in the ex-patriot Comorian population living in Marseilles indicate a genetic contribution from Western Eurasia,16, 17, 18 but the populations living on the Comoros have until now not been studied.

In this context, the peopling of the Comoros is evidently integral to the movements of men and women across the entire Indian Ocean. To gain insights into this process, we therefore determined the Y chromosomal and mitochondrial genetic variation on the three Bantu-speaking islands of the Comoros Republic.

Materials and methods

Sample group

In February and March 2006, we obtained blood samples from 577 unrelated Comorian men (n=381) and women (n=196). We sampled the populations of three of the four islands of the Comoros archipelago (Grand Comore – 170 men, 67 women; Anjouan – 104 men, 69 women; and Moheli – 107 men, 60 women). In 2006, this represented approximately 0.1% of the total Comoros population of 690 000 people. Blood was collected in EDTA vacutainer tubes and DNA extracted using the salting-out method.19 Samples were collected from multiple towns and villages on each island (Supplementary Figure 1). Recruitment was achieved through contacts established by medical personnel who originated from each community sampled. Each donor included in the study had four grandparents who were born in the same island and were native speakers of the island's language (Shingazidja, Shindzuani or Shimwali). Informed consent was obtained from all participants.

Y-chromosome haplogroups and haplotypes

We typed 68 binary polymorphisms mainly by PCR-RFLP (Figure 1 and Supplementary Table 1). For 293 Y chromosomes, alleles of 17 short tandem repeat (STR) polymorphisms on the Y chromosome were amplified with the AmpFlSTR Yfiler PCR Amplification Kit (Applied Biosystems, Foster City, CA, USA). Y-STR haplotypes were determined for 15 of 38 E-M2(xM191,U209) and 19 of 84 E-M191 chromosomes, and for all other chromosomes (Supplementary Table 2).

Figure 1
figure 1

Frequencies (%) and numbers (n) of Y haplogroups in the Comoros population sample. Haplogroup names follow the 2008 nomenclature.20 Branches are labelled with the binary markers tested. Numbers without a letter represent ‘M’ prefixed Y markers (eg, 50=M50).21 Putative geographic origin is indicated for each haplogroup: Af – sub-Saharan Africa, WSA – West and Southwest Asia, SEA – Southeast Asia and ? – uncertain. Frequencies of less than 5% have been rounded up or down to the nearest unit.

Mitochondrial haplogroups

We typed 31 coding region polymorphisms in the mitochondrial genome mainly by PCR-RFLP (Figure 3, Supplementary Table 3). We also sequenced a 501-bp fragment, including HVS-I from all M, N(xR) and R samples (Supplementary Table 4 and GenBank: HM565257-HM565275). Choice of markers and branch designations was based on published data22 and trees presented at http://www.ianlogan.co.uk.

Data analysis

The genetic structure (haplogroup number, haplogroup diversity, population differentiation, Fst and Rst) of the study population sample was analysed using the ARLEQUIN package v. 3.01,23 and phylogenetic comparisons of Y-STR haplotypes and mitochondrial SNP haplogroups were examined by multidimensional scaling (MDS), based, respectively, on Rst or Fst distances,24, 25 using SPSS 10.00 software (Chicago, IL, USA). Admixture fractions were estimated using ADMIX 2.0.26 On the basis of our haplogroup and MDS analyses, and historical and linguistic data, we chose Borneo, Iran and East Africa (Y: Kenya and Tanzania and mitochondrial: Mozambique) as the most likely parent populations from published data.5, 13, 27, 28, 29, 30

Results

Y-chromosome diversity on the Comoros

We analysed 381 Y chromosomes from the Comoros and identified 28 distinct haplogroups belonging to 11 of the 20 major clades of the Y-chromosome tree as shown below (Figure 1).20 These fall into four groups, on the basis of the geographical distribution of haplogroups around the Indian Ocean: sub-Saharan African 59.6%; Western and Southern Asia 29.7%; Southeast Asia 6% and uncertain origin 4.7%. Four clades, E, J, O and R, have frequencies greater than 5% and represent 87.4% of the sample.

The paragroups C*(xC1-5), F*(xM282, M427), J* and K*(xLMNOPQRST) cannot be assigned an origin with certainty. Nevertheless, the high frequencies of C*-M216 (Borneo – 2.5–25%) and K* (2–30%),5, 31 in SEA, make an SEA origin probable. J* has been found in Bali (1.5%) but also on the island of Soqotra (71%) situated in the Gulf of Oman between Somalia and Yemen.32, 33 F*(xM282,M427) has been found mainly in the Indian subcontinent.34 A West or Southwest Asian origin is therefore more likely for the F* and J* chromosomes.

Y-STR analysis revealed a generally high variance (Table 1), which coupled with the large number of Y haplogroups, suggesting that genetic drift has not drastically reduced genetic diversity on the Comoros Islands.

Table 1 Variance of the principal Y haplogroups (n≥5) on the Comoros based on 15 Y-microsatellite loci

Sub-Saharan African Y chromosomes

The most common Comorian haplogroups, E1b1-M2 (41%) and E2-M90 (14%), are those that are frequent in sub-Saharan Africa.13, 35, 36, 37, 38 They are present, respectively, at 56 and 6.4%, in Madagascar.8 Two haplogroups were identified under E1b1-M2, derived for markers M191 (22%) and U209 (9%). The haplogroup E1b1a-M191 has been found in east and west sub-Saharan Africa, 19% in Tanzania and 57% in Benin.13 The marker U209 was identified in Afro-Americans,39 and has not, until now, been tested for in African populations.

The low incidence of E-M293 (0.8%) and A-M91 (0%) on the Comoros contrasts strongly with the frequency of these haplogroups in East African populations. E-M293 is found mainly in East Africa, Kenya and Tanzania (18%).40 Furthermore, on the African mainland, M293 chromosomes carry either 10, or 13 and more repeats at the DYS389I STR locus,40 whereas on the Comoros, they have 12 repeats. Haplogroup A has a frequency of 14% in Kenyan Bantu and 7% in Tanzania.13

Other haplogroups of likely sub-Saharan African origin on the Comoros are E-SRY4064(xM2,M35,M75) (1.3%) and B2a (1.6%). B2a has a low frequency in southern Iran and Qatar,29, 41 but this is thought to be a consequence of the Arab slave trade. We therefore treat B2a as an African chromosome in this study.

Y chromosomes from around the Arabian Sea

The northern Y chromosomes on the Comoros, E-V22, E-M123, F*(xF2, GHIJK), G2a, I, J1, J2, L1, Q1a3, R1*, R1a*, R1a1 and R2 (29.7%), make up a diverse group. G2a, J1 and J2 (16.5%) are thought to have originated in the Middle East.14, 42 J1-M267 has mainly spread south and west into the Arabic Peninsula, and into North and Northeast Africa, whereas J2-M172 lineages have expanded north into Europe and east into Asia.13, 14, 41, 43, 44, 45 The M78 subclade, E-V22, and E-M123 are believed to have originated in Northeast Africa, with E-V22 spreading to the west of North Africa and to the Arabic peninsula by the Levantine corridor (United Arab Emirates (UAE) 6.7%),41, 46 whereas M123 spread mainly to the East (Yemen 8%; Oman 12%; Turkey 5.5%; Iran 1%).13, 29, 41, 42 In contrast, the haplogroups L1, Q1a3, R1, R1a, R1a1 and R2 (10.5%) are thought to be of Central or Southern Asian origin and describe clines of decreasing frequency from India and Pakistan towards the Middle East.34

A comparison of the relative incidences of E-M78(V22), E-M123, G, J, L, Q and R on the Comoros with populations around the Arabian Sea shows greatest similarities with Southern Iran and, to a lesser extent, Turkey (Supplementary Figure 2).29, 42 The higher affinity to South Iran is also evident in the MDS analysis with the Comoros Y-STR data for the E-V22, E-M123, G, J, L Q and R haplogroups (Figure 2a). In the MDS, Comoros shows greatest affinity with UAE and South Iran. Southern Iran is the site of the first towns to develop in the Southern Middle East 2000–3000 years ago (Supplementary Figure 2).

Figure 2
figure 2

Multidimensional scaling (MDS) analysis plot of genetic distance (Rst) calculated from the incidence of alleles at eight Y-STR loci (DYS19, 389AB, 389CD, 390, 391, 392, 393, 439). The analysis was performed with subsets of the Comoros sample, which were created on the basis of putative haplogroup origin. (a) Middle East – haplogroups E-M123, E-V22, F, G, J, L, Q and R. (b) Southeast Asian – haplogroups O, C* and K*. The populations represented are the Comoros (COM), this study, Madagascar (MAD),8 Oman (OMA),13, 47 Turkey (TUR),42 North Pakistan (N-PAK), South Pakistan (S-PAK), North India (N-IND), South India (S-IND),34 Yemen (YEM), United Arab Emirates (UAE), Saudi Arabia (SAU),47 North Iran (N-IR),48 South Iran (S-IR),47, 48 Malaysia (MAL),49 Taiwan (Paiwan) (TAI),50 West Borneo (East Malaysia) (W-BOR)51 and Bangladesh (BAN).52

A possible source of the Northern Y chromosomes is therefore the Shirazi traders from Southern Iran who established trading posts on the Comoros by 800 YBP.12 It has previously been estimated that, at 9 Y-STR loci, 0–1 mutation will most likely separate the descendants of a single Y-chromosome haplotype after 40 generations (1000–1200 years).53, 54 Compatible with a Shirazi origin, we found that, at 9 Y-STR loci (DYS19, 389AB, 389CD, 390–393, 438 and 439), 42% of the Comoros Northern chromosomes differ by 0–1 mutation from chromosomes in Southern Iran.47, 48

SEA Y chromosomes

We found the O1 lineage (6%) in the Comoros sample, providing genetic evidence for an SEA influence. Haplogroup O has been found at highest frequencies in East Asia and Island Southeast Asia.55, 56 All but one of the Comorian O1 chromosomes are O1a-M50 (5.8%). The O1a-M50 Y chromosome has its highest incidence in SEA: Borneo (10–20%), Sulawesi (4%), Taiwanese aborigines (0–59%, mean 14%) and the Philippines (3–12%).5, 57, 58 It has not been detected in the Middle East or the Indian subcontinent.5, 29, 41, 34

We performed an MDS with our STR data for the Y haplogroups O, C* and K* together with available STR data from candidate SEA populations (Figure 2b). The Comoros show a low affinity to the populations selected, even when C* and K* are not included (not shown), suggesting that these populations are not the source of SEA chromosomes on the Comoros.

Mitochondrial diversity on the Comoros

We have tested 577 Comorian samples for mitochondrial SNPs, and we define 9 distinct haplogroups (Figure 3). As for the Y chromosome, the majority of mitochondrial haplogroups on the Comoros are of African origin. The haplogroups L0, L1, L2 and L3′4(xMN) compose 84.7% of the mitochondria in the Comoros sample, and their relative proportions are most similar to profiles found in East and South East Africa.22, 59 The higher affinity with sub-Saharan East African populations is also evident in the MDS analysis (Figure 4a and b).

Figure 3
figure 3

Frequencies (%) and numbers (n) of mitochondrial haplogroups in the Comoros population sample. Numbers on branches refer to the position of polymorphisms in the CRS (Cambridge reference sequence). HVS-I sequence was not determined for L0, L1, L2 or L3′4(xMN). The HVS-I SNPs are shown for M and N haplogroups, only where they provide further definition than the coding SNPs. Putative geographic origin is indicated for each haplogroup: Af – sub-Saharan Africa, SEA – Southeast Asia and ? – uncertain.

Figure 4
figure 4

Multidimensional scaling (MDS) analysis plot of genetic distance (Fst) calculated from mitochondrial haplogroup frequencies. M* and R* were excluded from these analyses. (a) Africa, SEA and Iran – all Comoros haplogroups, except M* and R*. (b) and (c) MDS performed with subsets of the Comoros sample, defined on the basis of putative haplogroup origin. (b) Africa – Comoros haplogroups L. (c) SEA – Comoros and Madagascar haplogroups B4a, B4a1a1-PM, F3b, M7c1c and R9. The populations are Comoros (COM), this study, Madagascar (MAD),8 Central Africa (AFC),59 Iran (IRA), Mozambique (MOZ), Kenya (KEN),60 Ethopia (ETH),61 Tunisia (TUN), Algeria (ALG), Morocco (MOR), Mauritania (MAU),62 Taiwan (TAI), Philippines (PHI), Malaysia (MAL), Borneo (BOR), Sumatra (SUM), Bali (BAL) and Java (JAV).27

The remaining 15.3% of the Comoros sample is composed almost exclusively of haplogroups that can either be unambiguously identified as SEA (B4a1a1-PM, F3b and M7c1c – 10.6%),27 or fall into the paragroup M(xD, E, M1, M2, M7) (4%) (Figure 3). The latter haplogroups are probably also originally from Southeast Asia, but of the 12 different M* HVS-I sequences on the Comoros, only two match published sequences: two M(xM7) mitochondria found on Madagascar.8 We found no haplogroups that could be assigned to the Middle East.

SEA mitochondria

Of the SEA mitochondrial haplogroups present on the Comoros, F3b and M7c1c, similar to the Y-Hg O1-M50, each define an area of distribution that extends from Taiwan through the Philippines to Borneo27 (Supplementary Figure 3). The MDS analysis with SEA populations shows greatest affinity to the Philippines and Borneo, although affinity is relatively weak (Figure 4c). Linguistic studies indicate Southeast Borneo to be the probable origin of the migration from SEA to Madagascar.63 B4a1a1-PM (0.7%) is the major haplogroup throughout Polynesia (78%),31 and on Madagascar (25%), but, within island SEA, it has not been found further West than South Borneo (1%).5, 7, 8

Male-biased gene flow from the Middle East

There are no mitochondrial lineages on the Comoros that are frequent in the Middle East (Figure 3). We have tested for, but did not find, the R haplogroups, H, J, T, U and V, or N(xR) that represent 80% of the mitochondria in Iran.60 There is therefore striking evidence for male-biased gene flow from the Middle East to the Comoros, even if the unassigned mt-Hg M* and R* are designated as western Asian: 103/381 Y vs 27/577 mitochondria – Fisher's exact test, one-sided, P<10−22. This is entirely consistent with male-dominated trade and religious proselytisation being the forces that drove the Middle Eastern gene flow to the Comoros. For African and SEA contributions, if Y haplogroups C* and K* are counted as SEA, the under representation of the male lineages are similar (Y to mt ratio: Africa 0.69, SEA 0.66).

An opposite female gene flow from Africa to the Middle East is clearly evident in Yemen (34% mt-Hg L; 4% Y-Hg E-M2), Iraq and the Levant.64 However, no mt-Hg L has been found in Iran (n=712),60 despite the presence of Y-Hg E-M2 (1.7%),29 supporting the idea that the elevated mt-Hg L frequency in the western Middle East is not exclusively a consequence of the Arab slave trade, but also of geography.61

Discussion

We reveal the Comoros population to be a genetic mosaic, the result of tripartite gene flow from the North, the East and the West. Admixture analysis of the maternal and paternal contributions indicates the gene pool to be predominately African (72%), with significant contributions from Western Asia (17%) and Southeast Asia (11%). Our study therefore provides the first unequivocal evidence that the Middle Eastern trade routes that developed along the East African coast, during the last 2000 years, have left a genetic trace. Male and female SEA gene flow has already been described on Madagascar, in populations that speak Austronesian languages,5, 8 but here we show that this extends beyond Madagascar, into African populations speaking languages from the Bantu family. This raises the question of whether the demic migration from SEA reached the East African mainland.

The frequencies of Y-Hg E-V22, E-M123, G2a, J1, J2, R1a1 and R2 in the Comorian sample are compatible with gene flow from Iran.29, 34 This concords with historical data, which attests to the presence of traders from Shiraz in Iran on the Comoros, and also the Comorian's own oral traditions, which recount that Shirazi princes came in ships and established colonies on the islands. On the island of Anjouan the term ‘Shirazi’ is used to refer to someone of Middle Eastern appearance. There is historical evidence that 1000 YBP Persian traders had an important role in trade along the East coast, and we therefore predict that an Iranian genetic signal will be detected among Swahili speakers at former Middle Eastern trading centres on the sub-Saharan East coast, such as the islands of Zanzibar and Kilwa off the coast of Tanzania.

Interestingly, there are a number of similarities between the genetic profile of the Comoros islanders and the Lemba of South Africa, a Bantu speaking people whose Semitic origins are evident at both the cultural and genetic level.15, 65 The Lemba have high frequencies of the Middle Eastern Y-chromosome HgJ-12f2a (25%), a potentially SEA Y, Hg-K(xPQR) (32%) and a Bantu Y, E-PN1 (30%) (similar to E-M2), raising the possibility that the Lemba and Comorian populations are consequences of similar demographic processes. The high-resolution genotyping of the Lemba Y chromosomes and mitochondria will elucidate this question.

The Comoros and Madagascar show similarities in the paternal and maternal contribution from SEA and Africa. The absence of a strong Middle Eastern signal on Madagascar could be due to sampling bias, as Arab or Persian traders are known to have established posts on the Northwest coast of Madagascar, whereas only populations from the centre and South of Madagascar have been studied to date.5, 8 The low frequencies of E-M293 and A-M91, on both the Comoros and Madagascar, contrasts with the high frequency found in inland populations from Tanzania and Kenya,13, 40 and could be characteristics of a genetic profile specific to sub-Saharan coastal East Africa.

The SEA haplogroups, shared with Madagascar, on the Comoros are O1a-M50 for the Y chromosome and M7c1c, F3b and B4a1a1-PM for the mitochondria. Consistent with their transit West across the Indian Ocean to the East African coast, O1a-M50, M7c1c and F3b are linked to maritime colonisation within island SEA. In contrast, B4a1a1-PM has not been found in island SEA further West than Southeast Borneo (1%) and has expanded mainly East into Polynesia, but also West to Madagascar where it predominates.27, 57 There are nevertheless several indicators that the Comoros’ history of gene flow from SEA is distinct from Madagascar's: the absence of Y HgO2-M95 and the very low frequency of B4a1a1-PM, on the Comoros, the higher frequency of F3b (Comoros 8%; Madagascar 3.7%), the dissimilarity of M* HVS-I sequences and the low affinity between the Comoros’ O1a-M50 chromosomes and those of Madagascar. The fact that O1a-M50, M7c1c, F3b and B4a1a1-PM have not been found at sites around the Indian Ocean,28, 29, 41, 34, 60 outside SEA and now East Africa, is consistent with a colonising migration from SEA to East Africa directly across the Indian Ocean.

The Comoros population represents an exceptionally diverse genetic mosaic created by the complex process of human settlement around the Indian Ocean. The genealogy of our large sample is well documented and will provide a solid base from which to explore human diversity in Madagascar and coastal sub-Saharan East Africa.