We genotyped 45 biallelic markers and 11 STR systems on the Y chromosome in 201 male Somalis. In addition, 65 sub-Saharan Western Africans, 59 Turks and 64 Iraqis were typed for the biallelic Y chromosome markers. In Somalis, 14 Y chromosome haplogroups were identified including E3b1 (77.6%) and K2 (10.4%). The haplogroup E3b1 with the rare DYS19-11 allele (also called the E3b1 cluster γ) was found in 75.1% of male Somalis, and 70.6% of Somali Y chromosomes were E3b1, DYS19-11, DYS392-12, DYS437-14, DYS438-11 and DYS393-13. The haplotype diversity of eight Y-STRs (‘minimal haplotype’) was 0.9575 compared to an average of 0.9974 and 0.9996 in European and Asian populations. In sub-Saharan Western Africans, only four haplogroups were identified. The West African clade E3a was found in 89.2% of the samples and the haplogroup E3b1 was not observed. In Turks, 12 haplogroups were found including J2*(xJ2f2) (27.1%), R1b3*(xR1b3d, R1b3f) (20.3%), E3b3 and R1a1*(xR1a1b) (both 11.9%). In Iraqis, 12 haplogroups were identified including J2*(xJ2f2) (29.7%) and J*(xJ2) (26.6%). The data suggest that the male Somali population is a branch of the East African population – closely related to the Oromos in Ethiopia and North Kenya – with predominant E3b1 cluster γ lineages that were introduced into the Somali population 4000–5000 years ago, and that the Somali male population has approximately 15% Y chromosomes from Eurasia and approximately 5% from sub-Saharan Africa.
East Africans are more related to Eurasians than to other African populations.1, 2, 3 Investigations of Y chromosome markers have shown that the East African populations were not significantly affected by the east bound Bantu expansion that took place approximately 3500 years ago, while a significant contact to Arab and Middle East populations can be deduced from the present distribution of the Y chromosomes in these areas.4, 5 The Y chromosome haplogroup E3a is found at high frequencies in the sub-Saharan, Bantu-speaking populations but at low frequencies in East Africa, while Eurasian haplogroups like J and K are found at various frequencies in East Africa.3, 4, 6, 7, 8, 9 However, the majority of Y chromosomes found in populations in Egypt, Sudan, Ethiopia and Oromos in Somalia and North Kenya (Boranas) belong to haplogroup E3b1 defined by the Y chromosome marker M78.9, 10 A special branch of E3b1, cluster γ, which was defined by the presence of the otherwise rare Y STR allele 11 in DYS19, was observed in high frequencies in small samples of male Boranas (Oromos) in North Kenya, Ethiopian Oromos and Somali males, while the E3b1 cluster γ was found in low frequencies in non-Oromos from Ethiopia, Bantus from Kenya, North Egyptians10 and was almost absent in populations outside the Horn of Africa. Other clusters of haplogroup E3b1 (α, β and δ) that are found in European, Arab, North and East African populations were not found in Oromos from North Kenya (Boranas) or Ethiopia, and found in only one of 23 Somali males.10
We typed a set of 45 biallelic markers and 11 STR systems on the Y chromosome in a large population of male Somali immigrants to Denmark in order to define their Y chromosome lineages in details. In addition, 65 sub-Saharan Western Africans, 59 Turks, and 64 Iraqis were typed for the biallelic Y chromosome markers. The results were compared to those obtained in other relevant populations.
Material and methods
Samples and DNA purification
A total of 389 DNA samples from unrelated males (the numbers of individuals are given in parentheses) from Turkey (59), Iraq (64), Somalia (201) (all immigrants to Denmark) and 65 sub-Saharan Western Africans from Mali (38), Ghana (16), Mauritania (three), Guinea Conakry (two), Liberia (two), Cote d'Ivoire (one), Guinea – Bissau (one), Senegal (one) and Cameroon (one) (all immigrants to the Canary Islands) were typed in duplicates for 45 biallelic Y chromosome markers. Blood on FTA cards or Qiagen-purified DNA was used. The protocol was approved by the Danish ethical committee (Ref. KF-01-037/03).
Biallelic marker typing
The PCR amplification, the single base extension (SBE) reaction and the determination of the biallelic markers were performed as described previously.11 The markers P2, M22, M70, M75, M128, M168, M201, M207, M269 and M304 were also typed using singleplex PCR conditions.4, 11, 12 For the marker V6,10 we used the PCR primer sequences: V6F: 5′-CCTATAGAGTCCCTGTCCCTGA-3′, V6R: 5′-CTTGCTGCTGAGTGAGCTTCT-3′ (0.4 μM of each primer). SBE primers not described by Sanchez et al11 (0.2 μM of each primer) are given in Table 1.
Y STR typing was performed using the PowerPlex® Y System kit (Promega) including DYS391, DYS389I, DYS389II, DYS439, DYS438, DYS437, DYS19, DYS392, DYS393, DYS390 and DYS385. The PCR conditions were as recommended by the manufacturer except that the number of cycles was 10 plus 16. The amplifications were performed in a Perkin–Elmer GeneAmp® PCR System 9700 thermal cycler. A total of 1 μl of PCR product was electrophoretically separated on an ABI 3100 Genetic Analyzer (Applied Biosystems) using performance-optimized polymer 4 (POP4) and Dye Set Z. Analyses of PCR fragments were performed using GeneScan 3.7 and Genotyper 3.7 NT (Applied Biosystems). The alleles were assigned by comparison to allelic ladders using Genotyper macros supplied with the PowerPlex® Y System. All 201 Somali males and the Turkish and Iraqi males previously assigned to belong to the haplogroup E3b1 were typed in duplicate. The nomenclature for DYS389II reflects the total number of repeats minus the number of repeats of DYS389I.14
As suggested by de Knijff,15 Y chromosomes identified by STRs are designated ‘haplotypes’. Y chromosomes that are defined only by biallelic markers are called ‘haplogroups or clades’ and the combination of biallelic markers and Y-STRs are called ‘lineages’.
Typing of 15 autosomal STRs was performed using the AmpFlSTR Identifiler PCR Amplification Kit (Applied Biosystems) according to the instructions of the manufacturer. A total of 198 of the 201 Somali males were typed in duplicate. The last three Somali males were not typed due to technical problems.
Comparative data analyses
In order to compare the proportions of Y-STR haplotypes in Somalis with those in other populations, we searched the worldwide Y-STR Haplotype Reference Database (http://www.yhrd.org/index.html). In addition, we compared the Somali Y STR data with results in African and Anatolian populations available in the literature. Phylogenetic relationships between Somali STR haplotypes within haplogroups were reconstructed in a median-joining network16 using the programme Network 220.127.116.11 (http://www.fluxus-technology.com). A haplogroup-specific weight proportional to the reciprocal of the microsatellite variance was used. Reduced-median and median-joining (ɛ=0) procedures were applied sequentially.
The software package Arlequin, version 2.00019 (http://lgb.unige.ch/arlequin) was used to determine the molecular variance (AMOVA) at various levels of hierarchical groupings based on haplogroup frequencies20 and the mean pair wise differences, which are the mean number of mutational steps observed between all pairs of haplotypes in the sample.
The significance of the variance components and the corresponding Φ statistics (F statistics analogs) was assessed by comparisons of the observed values with the distribution of 10 000 permutations obtained by randomization under the null hypothesis of no population structure. The AMOVA was made considering the subpopulation relative to the total population (ΦST) and the geographical group of a subpopulation relative to the total population (ΦCT).6, 8 Principal component analysis of the haplogroup frequencies was performed using the Multivariate Statistical Package (MVSP) v. 3.1 (http://www.kovcomp.co.uk/mvsp/mvspwbro.html) and presented graphically in two dimensions.
Y chromosome STR data were used to estimate the expansion time using both a model that does not explicitly specify a demography (through a star-like genealogy) and a Bayesian-based coalescence analysis procedure21 assuming a stepwise mutation model. The first approach required identification of the ancestral haplotype within each haplogroup using the ΔA statistics described by Stumpf and Goldstein.22 To estimate the time to the most recent common ancestor (TMRCA), we used the average squared difference (ASD)23 and the averaged effective mutation rate described by Zhivotovsky et al.24 We used 95% confidence intervals (CIs) estimated by Monte Carlo simulations using a coalescent model with exponential growth scaled in units of the TMRCA including the uncertainty of the mutation rate. This is implemented in the programme Ytime described by Behar et al.25 Generation times of 25 and 30 years were assumed. Thus, the 95% CI takes into account the uncertainty in mutation rate, the population growth and (where appropriate) subdivision, but not the generation time. We also calculated the TMRCA using the variance of repeat scores observed (averaged over loci) within a haplogroup.26, 27
Bayesian analysis of trees with internal node generation (BATWING)28 was used to estimate the expansion times of a set of Somali Y chromosomes. The demographic model assumed exponential growth from an initially constant-sized population beginning at time Beta. Priors were chosen to be as uninformative as possible in order to minimize the impact on the results. Thus, we specified a gamma (1, 0.001) distribution as the prior for the growth rate and a gamma (1.1, 0.0001) distribution as the initial population size.12, 29 The prior distribution for the STR mutation rate was specified as a gamma distribution with a mean of 6.9 × 10−4 per locus per 25 or 30 years,24 and a broad, uniform Beta prior (0, 15) distribution was assigned. The estimated time of population expansion, Beta, was expressed as a fraction of the initial population size multiplied by the generation time to generate standard units of time. A total of 20 000 initial rearrangements were discarded, and the posterior distributions were estimated from the subsequent 50 000 rearrangements using the R computer software Version 1.8.1 (http://www.r-project.org/).
The results of microsatellite loci DYS385 were excluded from the statistical comparison analysis except for the network analysis because it was impossible to assign alleles to a specific locus. Estimation of the genetic diversity was measured using Nei's unbiased h statistics.30 The variance of the allele distribution was calculated in EXCEL (Microsoft) for each locus independently and then averaged across the 10 loci.
Although the Horn of Africa is considered a geographic part of sub-Saharan Africa, we have analysed the Somali population separately in order to be able to compare the results with previously published data from other African populations.
Autosomal STR typing
The genotypes of the 15 autosomal STR systems in 198 Somalis were in Hardy–Weinberg equilibrium (χ2=31.37, df=30, P=0.40), and the Fis values ranged from −0.047 to 0.038.
Y chromosome haplogroup variation
We identified a total of 23 Y chromosome haplogroups in 389 males from Somalia, sub-Saharan West Africa, Turkey and Iraq. Figure 1 shows the genealogical relationship of the haplogroups and their frequencies.
In Somali males, 14 haplogroups were identified. The frequency of the clade E3b was 81.1%, including 77.6% of the haplogroup E3b1 defined by the M78 mutation. The Eurasian haplogroup K2 was found in 10.4%, and 3.0% of the Somali Y chromosomes belonged to the major clade J. Only 3.0% of the Somalis had the sub-Saharan African haplogroups A3, B and E3a*(xE3a4). Less than 2.0% of the Somalis belonged to the Northwest African E3b2 lineage. In the present study, no individual belonging to E3b* chromosomes carried the V6 mutation, which identifies a subset of chromosomes assigned to E3b* (E-M35*).10
Among the sub-Saharan Western Africans, only four haplogroups were identified. The West African clade E3a was found in 89.2%. Only one individual carried the major clade E3b (1.5%), and the haplogroup E3b1 was not observed.
In Turks, 12 haplogroups were found. The four haplogroups J2*(xJ2f2) (27.1%), R1b3*(xR1b3d, R1b3f) (20.3%), E3b3 and R1a1*(xR1a1b) (both 11.9%) were the most frequent ones.
In Iraqis, 12 haplogroups were identified. The haplogroup J2*(xJ2f2) was the most frequent one (29.7%) followed by J*(xJ2) (26.6%).
Geographic distribution of the Y chromosome haplogroup E3b1
To examine the relationship between the Y chromosome haplogroups in the Somali, other African and non-African populations, we compared our data with the data from literature (Table 2). The high frequency (77.6%) of haplogroup E3b1 was characteristic of male Somalis. The frequency of E3b1 was significantly lower in Ethiopian Oromos (35.9%), Ethiopian Amharas (22.9%), Egyptians (20.0%), Sudanese (17.5%), Kenyans (15.1%),10 Iraqis (6.3%), Northern Africans (6.1%), Southern Europeans (0.5–5.1%) and sub-Saharan populations (below 1%).
We analysed the variation in the frequencies of E3b1 in 17 geographically defined populations using the AMOVA test.20 A total of 61.2% of the variance (significantly greater than zero, P<10−4) could be attributed to differences within populations. The population data were grouped into (1) sub-Saharan Africans (four populations), (2) North and East Africans (seven populations) and (3) non-Africans (six populations), and the Φ statistics were calculated. High degrees of both inter- and intragroup variabilities (ΦCT=0.31, P<10−4; ΦSC=0.19, P<10−4) were observed. In North and East African populations (ΦST=0.25, P<10−4), the variability was mainly due to the high frequency of the haplogroup E3b1 in the Somalis (77.6%) compared to those in Sudanese (17.5%) and Northern Africans (6.1%). When the Somali population data were removed from the North and Eastern African group, the ΦST value decreased from 0.25 to 0.16.
Figure 2 shows a principal component analysis of the haplogroups. We observed a similar pattern in a neighbour joining, unrooted tree (data not shown). The two principal components accounted for 79.4% of the genetic variance observed mainly due to differences in the frequencies of the clade E and clade BR*(xE) (first component) and the E3a and E3b lineages (second component). The first component separated the non-Africans and the sub-Saharan Western Africans characterized by high frequencies of the clade E3a (Figure 2, axis 1). The North and East Africans were separated from the rest of the sub-Saharan populations (Figure 2, axis 2) mainly by different frequencies of haplogroup E3b1. The position of the Ethiopian Oromos was close to that of the Somalis due to the relatively high frequencies of the haplogroups E3b*(xE3b1) and E3b1. The Ethiopian Amharas and the Egyptian population were positioned between the African and the non-African populations, primarily due to the high frequencies of the clade BR*(xE).
Y chromosome haplotype diversity
The allele and haplotype frequencies of 11 Y-STRs were estimated in the 201 male Somalis. Table 3 shows the Y-STR allele frequencies in all males, in haplogroup E3b1 and in non-E3b1 males. In eight of the 11 Y-STR systems, a predominant allele with a frequency above 0.75 was found. The frequencies of the predominant alleles ranged from 0.90 to 0.99 in E3b1 Y chromosomes.
A comparison between the Y-STR allele frequencies in our and another sample of 104 Somali males typed for the eight common Y-STR loci37 showed that the distributions of Y-STRs were very similar in the two populations (Fst=0.0007; P=0.285).
In Somalis, 96 haplotypes were identified with the 11 Y-STRs. The haplotype diversity was 0.9726±0.005. No microsatellite haplotype was found in more than one haplogroup. The haplotype diversity of the eight Y-STRs (minimal haplotype) was 0.9575±0.007 compared to an average of 0.9974 and 0.9996 in European and Asian populations in the ‘‘Y-STR Haplotype Reference Database’ (http://www.yhrd.org/index.html), 0.9884 in the Mozambican population38 and 0.9838 in the Tunisian population.39
Haplogroup E3b1 with the rare allele 11 of DYS19 (E3b1 cluster γ)10 was found in 75.1% of all Somali males, and 96.8% of the E3b1 Y chromosomes carried DYS19-11. The majority of the E3b1 DYS19-11 Y chromosomes were characterized by DYS392-12, DYS437-14, DYS438-11 (96.2%) and further by DYS393-13 (91.0%).
The average difference in numbers of repeat units between STR haplotypes typed for 10 Y-STRs was 0.7±0.1, and the variance in the allele size distribution was 0.59 repeat units indicating that the Somali Y chromosome haplotypes were very closely related to each other. Figure 3a shows a median joining network analysis of the relatedness of the 64 Y STR haplotypes of the E3b1 haplogroup. The network displayed star-like features, and the four most frequent haplotypes, which accounted for 47% of the entire E3b1 cluster γ lineage, occupied central positions in the network.
The network of the Figure 3b was constructed by combining our data with previously published data for individuals belonging to haplogroup E3b1 (12, 40 and personal communication). It displayed star-like features with a clear geographic structure. The main branch of the E3b1 cluster γ lineages is located on a branch defined by DYS392-12 in the Ethiopian part of the network.
The Somali E3b1 haplotype data were compared with results from Anatolia,12 the only E3b1 data available for the same set of Y-STR loci. The two populations shared only one haplotype (DYS19-13/DYS389I-13/DYS389II-17/DYS390-23/DYS391-10/DYS392-11/DYS393-13/DYS439-12). The E3b1 cluster γ lineages were not found in Turkish or Iraqi males.
In the 21 Somali males belonging to haplogroup K2, the Y-STR haplotypes were organized with a common ancestor into three branches with several mutation steps between the haplotypes (data not shown).
Estimates of expansion time and population size
Table 4 presents estimates of ages and expansion times of the Somali E3b1 cluster γ and the K2 lineages based on Y-STR data using different estimation procedures. By defining the ancestral haplotype as that with the modal allele for each STR system and calculating the average squared distance41 as well as the variance between this haplotype and other variants, we estimated the time back to the most recent common ancestor (TMRCA). In Somalis, the TMRCA was estimated to be 4000–5000 years for the haplogroup E3b1 cluster γ and 2100–2200 years for the haplogroup K2 assuming a generation time of 25 years. Calculations based on a Bayesian coalescence approach (BATWING expansion time) indicated that the growth of the E3b1 cluster γ in the Somali population started 1200 years ago (Table 4) with an initial population size of 1037 individuals. A similar analysis of haplogroup K2 resulted in a calculated expansion time of approximately 3300 years in a small male population of 109 individuals. The results did not change significantly when different prior probability distributions were applied (data not shown).
The present study demonstrates that male Somalis has the highest frequency of the haplogroup E3b1 (77.6%) observed in any population studied until now. The great majority of the Somali E3b1 Y chromosomes studied (96.8%) carried the otherwise rare allele 11 of the DYS19 STR locus and, thus, belonged to the cluster γ defined by Cruciani et al.10 The E3b1 cluster γ was previously reported in five of seven (71.4%) male Boranas (Oromos) from North Kenya, in 52.2% of Somali males, and in 32.0% Ethiopian males.10 The majority of the E3b1 Y chromosomes (91.0%) were further characterized by the DYS19-11, DYS392-12, DYS437-14, DYS438-11, and DYS393-13 alleles. The Eurasian clade FR had a frequency of 15.4% and the typical sub-Saharan haplogroups A, B, E3a*(xE3a4), E2 and E3b* were found in only 5% of Somali males.
The network of the E3b1 lineages in the present Somali population sample (Figure 3a) displayed star-like features and we observed a low Y STR haplotype diversity and a very limited spread in the sizes of the STR alleles (Table 3), suggesting a coherent, common, recent ancestry. The network of the E3b1 lineages of previously published data of East African populations and our data (Figure 3b) demonstrate that the E3b1 cluster γ lineages of the present Somali population sample are part of the East African E3b1 lineages. E3b1 cluster γ lineages were observed in low frequencies in Bantus from Kenya, North Egypt, Morocco and Niger10 (Figure 4). In the present study, haplogroup E3b1 was found in 6.3% of Iraqis and none of them belonged to cluster γ. Only 11 subjects with a DYS19-11/DYS392-12 pattern were reported outside the Horn of Africa in 26 654 subjects analysed in a worldwide set of 236 populations by November 2004 (http://www.yhrd.org/index.html). Taken together, the data suggest that the E3b1 cluster γ DYS392-12 lineage was expanded in the Somali population.
Cruciani et al10 suggested that the E3b1 cluster γ lineages originated in East Africa and estimated that the TMRCA was approximately 9600 years. We estimated that the E3b1 cluster γ DYS392-12 lineages of the present Somali population sample originated 4000–5000 years ago, and that the expansion of the E3b1 cluster γ DYS392-12 lineages in these Somalis involved a relatively small number of Y chromosomes (around 1000 males).
The time of the eastbound Bantu expansion was estimated to be 3400±1100 years ago.24 Bantu populations have high frequencies of E3a haplogroups.4 We have observed only a few individuals with the E3a haplogroup in our Somali population, thus, supporting the view that the Bantu migration did not reach Somalia.42 It has been suggested that a barrier against gene flow exist in the region.43 The barrier seems to be the Cushitic languages and cultures to which Somalis belongs. The Cushitic languages belong to the Afro-Asiatic languages that are spoken in Northern and Eastern Africa. The Cushitic languages and cultures are mainly found in the Somalis and the Oromos, one of the two main groups inhabiting Ethiopia.44, 45, 46. The Somali and Oromo languages have a high degree of similarity and the two populations share many cultural characteristics. The Somali and Oromo people live in clans with special patterns of marriage and the Somali and Oromo people have complex, interwoven pedigrees.44, 45
The very high frequency of the E3b1 cluster γ in our Somali population sample could be due to ascertainment bias or special clan or family relationships in the present sample of Somali immigrants to Denmark. No reliable information on geographic origin or clan relationship in the present Somali population sample was available. However, the genotypes of the autosomal STR systems were in Hardy–Weinberg equilibrium, indicating random mating in at least the last generation, and the distribution of Y-STR haplotypes in our Somali population was similar to that in a sample of Somali immigrants to Norway,37 indicating that these two population samples came from a larger, homogenous population of Somalis.
The haplogroup K2 was found in 10.4% of Somali males. Haplogroup K2 was suggested to have arisen in Eurasia.4, 9 K2 has a patchy distribution in Cameroon (18.0%), Egypt (8.2%), Ethiopia (4.8%), Tanzania (3.8%) and Morocco (3.6%), probably due to back migration.3, 7, 8, 9 Luis et al9 estimated an expansion time of 13.7–17.5 ky for the K2 lineages in Egypt. The BATWING expansion time estimated for K2 in our Somali population (3.3 ky) is consistent with an African southward dissemination of the K2 haplogroup. The observation of two Somali males with the M17 mutation (haplogroup R1a1*(xR1a1b)) may indicate a recent gene flow by migration from Eurasia.47, 48 A possible explanation is offered by the fact that from the 7th century onward, immigrant Muslim Arabs and Persians established trading posts along the Somali cost51, although also British, French and Italian people were present in Somalia in the region in the 19th and 20th century.
The distribution of the haplogroups J2*(xJ2f2) (0.5%) and J*(xJ2) (2.5%) in Somalis support the recent gene flow hypothesis. Haplogroup J*(xJ2) was probably spread by the Arab people.40 The ratio between the haplogroups J2/J*(xJ2) may be an indicator of the genetic components from populations like (1) Balkans, Turks, Georgians and Muslim Kurds and (2) Bedouin and Palestinian Arabs, respectively.40, 52 The ratio was 0.26 in the Oman population.9 The J2/J*(xJ2) ratio of 0.2 in the present Somali sample suggest a predominant gene flow of Arab Y chromosomes.
In conclusion, the data suggest that the male Somali population is a branch of the East African population – closely related to the Oromos in Ethiopia and North Kenya (Boranas) – with predominant E3b1 cluster γ DYS392-12 lineages that probably were introduced into the Somali population 4000–5000 years ago, approximately 15% Y chromosomes from Eurasia and approximately 5% from sub-Saharan Africa. Work is in progress in order to study closely related populations with new informative markers to obtain a better understanding of the E3b1 lineages settlement process in East Africa.
We thank Ms Annemette Holbo Birk for technical assistance and Anders Buchard, Anders Hansen and Bo Simonsen for helpful discussions. The work was supported by grants to JJ Sanchez from Ellen and Aage Andersen's Foundation and by Grant No. PI2000/053 to A Hernandez from Consejería de Educación y Cultura (DGUI), Gobierno de Canarias.
About this article
‘Mutiny on the Bounty’: the genetic history of Norfolk Island reveals extreme gender-biased admixture
Investigative Genetics (2015)