Genome-wide genotype and sequence-based reconstruction of the 140,000 year history of modern human ancestry

Shriner, Daniel; Tekola-Ayele, Fasil; Adeyemo, Adebowale; Rotimi, Charles N.

doi:10.1038/srep06055

Download PDF

Article
Open access
Published: 13 August 2014

Genome-wide genotype and sequence-based reconstruction of the 140,000 year history of modern human ancestry

Daniel Shriner¹,
Fasil Tekola-Ayele¹,
Adebowale Adeyemo¹ &
…
Charles N. Rotimi¹

Scientific Reports volume 4, Article number: 6055 (2014) Cite this article

8449 Accesses
43 Citations
22 Altmetric
Metrics details

Subjects

Abstract

We investigated ancestry of 3,528 modern humans from 163 samples. We identified 19 ancestral components, with 94.4% of individuals showing mixed ancestry. After using whole genome sequences to correct for ascertainment biases in genome-wide genotype data, we dated the oldest divergence event to 140,000 years ago. We detected an Out-of-Africa migration 100,000–87,000 years ago, leading to peoples of the Americas, east and north Asia and Oceania, followed by another migration 61,000–44,000 years ago, leading to peoples of the Caucasus, Europe, the Middle East and south Asia. We dated eight divergence events to 33,000–20,000 years ago, coincident with the Last Glacial Maximum. We refined understanding of the ancestry of several ethno-linguistic groups, including African Americans, Ethiopians, the Kalash, Latin Americans, Mozabites, Pygmies and Uygurs, as well as the CEU sample. Ubiquity of mixed ancestry emphasizes the importance of accounting for ancestry in history, forensics and health.

Ancient DNA reveals admixture history and endogamy in the prehistoric Aegean

Article Open access 16 January 2023

Accurate detection of identity-by-descent segments in human ancient DNA

Article Open access 20 December 2023

More than a decade of genetic research on the Denisovans

Article 18 September 2023

Introduction

Several diversity projects have been performed to investigate the ability of genetic data to reveal the migratory history and geographical structuring of modern human populations. The recent origin of modern humans is widely thought to reflect migration(s) from sub-Saharan Africa, with gene flow estimated to have ended anywhere from 140,000 to 12,000 years ago^1,2,3,4. Li et al.⁵ focused on continental-level ancestry, identifying seven ancestral components: sub-Saharan Africa, the Middle East, Europe, south and central Asia, east Asia, Oceania and (Native) America. Following a more detailed characterization of the genetic history of African peoples⁶, these results were refined into 14 ancestral components: Fulani, Cushitic, Nilo-Saharan, Chadic-Saharan, Niger-Kordofanian, Southern African/Khoesan/Mbuti, western Pygmy, Hadza and Sandawe ancestral components in Africa and Oceanian, European, Indian, Native American and East Asian ancestral components in the rest of the world.

Here, we meta-analyzed ancestry from 12 global and regional diversity projects^{5,7,8,9,10,11,12,13,14,15,16,17}. We collected genome-wide genotype data for 3,528 unrelated individuals from 163 samples from around the world (Fig. 1). Our analysis revealed 19 ancestral components, providing greater resolution of ancestry worldwide. Our inferred African ancestral components were largely consistent with the earlier results for sub-Saharan Africa⁶, with the notable addition of Omotic-speaking peoples in Ethiopia. Using whole genome sequence data, we corrected for ascertainment biases in chip-based genotype data in estimation of genetic differentiation and heterozygosity. We then estimated the divergence times of the ancestral components and compared these divergence times to historical records. We observed that multiple divergence events coincided with the Last Glacial Maximum. The oldest divergence event dated to ~140,000 years ago.

Results and Discussion

Unsupervised ancestry analysis of 19,372 autosomal single nucleotide polymorphisms genotyped for 3,528 individuals from 163 samples revealed 19 ancestral components (Fig. 2, Table 1 and Supplementary Fig. 1). The 19 identified ancestral components were Click Speaker in south Africa; Pygmy in central Africa; Niger-Congo across west, east and south Africa; Lowland East Cushitic, Nilo-Saharan and Omotic in east Africa; Berber in north Africa; Indian and Kalash in south Asia; Chinese, Japanese and southeast Asian in east Asia; Siberian in north Asia; Native American in the Americas; Melanesian in Oceania; southern and northern European; and Arabian and Levantine-Caucasian in the Middle East and in the Caucasus (Fig. 2 and Fig. 3). Consistent with prior findings⁶, 94.4% of individuals had mixed ancestry, independent of self-identified ethno-linguistic group labels. Based on the estimated standard errors, our analysis was powered to detect an ancestral component present at a proportion of at least 2.5%.

Table 1 Ancestral components and proxy samples

Full size table

Traditional analysis of F_ST between samples is complicated by recent admixture. In contrast, ancestral components are constructed to be ancestrally homogeneous and consequently unaffected by recent admixture. Therefore, we analyzed F_ST between ancestral components. Using hierarchical clustering analysis, the six sub-Saharan ancestral components clustered together; the south Asian ancestral components clustered with the European, Middle Eastern, Caucasian and Berber ancestral components; and the east Asian ancestral components clustered with the north Asian, Native American and Oceanic ancestral components (Fig. 4). To assess ascertainment bias in these F_ST estimates resulting from the use of chip-based genotype data, we used the 1000 Genomes sequence data. Since the 1000 Genomes samples showed heterogeneous ancestry, we limited this comparison to the JPT and YRI samples, both of which had only one ancestry (Japanese and Niger-Congo, respectively) and the FIN sample, which was the least ancestrally heterogeneous sample from Europe; that is, the FIN, JPT and YRI samples and the Northern European, Japanese and Niger-Congo ancestral components represented the closest matches between sequenced samples and ancestral components. F_ST values for the FIN/YRI, FIN/JPT and JPT/YRI pairs were 0.0754, 0.0524 and 0.0879, respectively. In comparison, F_ST values for the Northern European/Niger-Congo, Northern European/Japanese and Japanese/Niger-Congo pairs were 0.163, 0.121 and 0.177, respectively. Thus, we estimated that pairwise F_ST values between ancestral components were inflated by an average of 2.16-fold. To account for this inflation, we divided all pairwise F_ST values between ancestral components by 2.16.

To estimate divergence times from F_ST, we need estimates of the effective population size, N_e. Given allele frequencies per marker per ancestral component, we first estimated heterozygosity for each ancestral component. Heterozygosity estimates ranged from 0.255 to 0.327 (Table 2), similar to the range of 0.20 to 0.31 for the 52 samples in the Human Genome Diversity Project⁵. To assess ascertainment bias in our heterozygosity estimates, we again used the 1000 Genomes sequence data. Across all 14 of the 1000 Genomes samples, ascertainment for common variation compared to all variation resulted in slight overestimation of heterozygosity, with heterozygosity for polymorphic markers ranging from 0.142 to 0.264 (Table 3). Ascertainment for variation resulted in massive overestimation of heterozygosity, with heterozygosity for all sites ranging from 0.000671 to 0.000966 (Table 3). Rather than attempting to correct the heterozygosity for the ancestral components in light of these ascertainment biases, we estimated the inbreeding effective population size based on heterozygosity for the 1000 Genomes samples (Table 3). We used the average N_e of 21,780 from the ASW, LWK and YRI samples for the Click Speaker, Lowland East Cushitic, Niger-Congo, Nilo-Saharan, Omotic and Pygmy ancestral components, the average N_e of 15,281 from the CHB, CHS and JPT samples for the Chinese, Japanese, Melanesian, Native American, Siberian and southeast Asian ancestral components and the average N_e of 16,446 from the CEU, FIN, GBR, IBS and TSI samples for the Arabian, Berber, Indian, Kalash, Levantine-Caucasian, northern European and southern European ancestral components. If these N_e values are too small for any ancestral component, then divergence times will be underestimated. Conversely, if these N_e values are too large, then divergence times will be overestimated.

Table 2 Heterozygosity by ancestral component

Full size table

Table 3 Heterozygosity and effective population size (N_e) estimates based on whole genome sequence data

Full size table

After correcting the pairwise F_ST values between ancestral components for ascertainment bias as described above, we estimated divergence times using the three sequence-based N_e values. Mean divergence times for the ancestral components ranged from 256 generations to 4,664 generations (Fig. 5), corresponding to ~7,700 to ~140,000 years ago, assuming a generation time of 30 years^18,19. Note that the order of appearance of ancestral components in the ADMIXTURE analysis (Fig. 2) reflects a composite of individuals' ancestry proportions and ancestry-specific allele frequencies, the order of divergence of ancestral components by F_ST (Fig. 4) reflects a composite of ancestry-specific allele frequencies and time and the order of divergence of ancestral components by time (Fig. 5) reflects only time.

At the global scale, the oldest divergence event dated to 4,664 generations or ~140,000 years ago (Fig. 5). This time is consistent with estimates of the coalescence time for the major haplogroups of the Y chromosome of 138,000 years ago²⁰ and 142,000 years ago²¹ as well as an estimate of ~140,000 years ago for African vs. Eurasian divergence based on multilocus resequencing⁴. The next divergence occurred 3,326 generations or ~100,000 years ago giving rise to the cluster of east and north Asian, Native American and Oceanic ancestral components (Fig. 5). A separate divergence event occurred 2,041 generations or ~61,000 years ago giving rise to Caucasian, European, Middle Eastern and south Asian ancestral components (Fig. 5). We detected two Out-of-Africa migrations principally due to the inclusion of samples allowing for the inference of a Lowland East Cushitic ancestral component (Supplementary Fig. 2). If we assume an African origin for the Lowland East Cushitic ancestral component, then these results are consistent with an Out-of-Africa migration giving rise to east and north Asian/Native American/Oceanic ancestral components, followed by another Out-of-Africa migration giving rise to Caucasian/European/Middle Eastern/south Asian ancestral components, followed by back migration to north Africa giving rise to the Berber ancestral component. Alternatively, if we assume a non-African origin for the Lowland East Cushitic ancestral component, then these results are consistent with an Out-of-Africa migration giving rise to east and north Asian/Native American/Oceanic ancestral components, followed by back migration into Africa, followed by an Out-of-Africa migration giving rise to Caucasian/European/Middle Eastern/south Asian ancestral components, followed by another back migration to north Africa giving rise to the Berber ancestral component. The former interpretation is more parsimonious.

The rate of admixture of archaic lineages into modern humans has been estimated to be higher in East Asians than in Europeans²². Furthermore, the maximum-likelihood estimates of the times of admixture of archaic lineages are 55,100 years ago for Europeans and 75,800 years ago for East Asians²³. We detected an Out-of-Africa migration 100,000–87,000 years ago, leading to peoples of the Americas, east and north Asia and Oceania. We also detected another migration 61,000–44,000 years ago, leading to peoples of the Caucasus, Europe, the Middle East and south Asia. Taken together, these results suggest that introgression of archaic lineages occurred at two different times and places: an older event in East Asia involving migrants from the first Out-of-Africa migration and a more recent event in the Middle East before dispersal of migrants from the second Out-of-Africa migration into the Caucasus, Europe and south Asia.

Africa

Sub-Saharan ancestral components diverged 2,426 generations or ~73,000 years ago (Fig. 5). The Pygmy ancestral component diverged 1,686 generations or ~51,000 years ago from the Click Speaker ancestral component (Fig. 5). The Mbuti Pygmy sample had 99.1% ± 3.1% (mean ± standard error) Pygmy ancestry (Supplementary Table 1), indicating ancestral homogeneity and implying a lack of admixture. The Biaka Pygmy sample showed evidence of admixture, with 77.9% ± 3.3% Pygmy ancestry and 21.6% ± 2.9% Niger-Congo ancestry (Supplementary Table 1). These results are consistent with a higher level of gene flow between western Pygmies (e.g., Biaka Pygmies) and agricultural populations than between eastern Pygmies (e.g., Mbuti Pygmies) and agricultural populations²⁴. In contrast, using the Yoruba and San samples and assuming two-way admixture, Loh et al.²⁵ inferred that the Mbuti Pygmy sample showed evidence of admixture ~ 28 generations ago with ~15.9% Yoruba-related ancestry and that the Biaka Pygmy sample showed evidence of admixture ~ 38 generations ago with ~28.8% Yoruba-related ancestry. Use of divergent reference samples for parental populations of admixed samples leads to estimation of admixture proportions that are biased towards equal proportions for all referent samples and estimation of generations since admixture that are upward biased. We also detected small amounts of Pygmy ancestry in multiple samples throughout central and south Africa (Fig. 3, Supplementary Fig. 3 and Supplementary Table 1). The Click Speaker ancestral component was the major ancestral component in several Khoesan samples from south Africa (Fig. 3, Supplementary Fig. 3 and Supplementary Table 1). The Ju/'hoan sample had 96.5% ± 2.2% Click Speaker ancestry and the San sample had 94.8% ± 2.0% Click Speaker ancestry and 5.2% ± 2.0% Pygmy ancestry, whereas the other Khoesan samples had ≤75.0% Click Speaker ancestry and various amounts of other ancestries, most notably Niger-Congo ancestry (Supplementary Table 1).

The Omotic ancestral component diverged from the sub-Saharan cluster 1,602 generations or ~48,000 years ago (Fig. 5). The Omotic ancestral component showed a distribution mostly limited to Ethiopia (Fig. 3 and Supplementary Fig. 3). The majority of the ancestry of the Ari Blacksmith and Ari Cultivator samples was Omotic (Supplementary Table 1). The Omotic ancestral component was also the largest component in the Wolayta sample (Supplementary Table 1).

The Niger-Congo ancestral component included non-Bantu speakers from Senegambia and Nigeria as well as Bantu speakers from east and south Africa (Fig. 3, Supplementary Fig. 3 and Supplementary Table 1). Several samples from South Africa, such as amaXhosa, showed mixed ancestry between Click Speaker and the Niger-Congo components, consistent with linguistic evidence that isiXhosa is a language in the Niger-Congo family with ~15% Khoekhoe vocabulary [http://www.ethnologue.com/language/xho]. The Niger-Congo and Nilo-Saharan ancestral components diverged 917 generations or ~28,000 years ago (Fig. 5), possibly reflecting expansion of the Sahara around the time of the Last Glacial Maximum²⁶. The Nilo-Saharan ancestral component was the major component in the Anuak, Sudanese, Gumuz and Bulala samples across Chad, South Sudan and Ethiopia (Fig. 3, Supplementary Fig. 3 and Supplementary Table 1). The clustering of the Niger-Congo and Nilo-Saharan ancestral components is consistent with grouping in the Niger-Congo family Kordofanian languages that are spoken in the Nuba Mountains in what is presently the Republic of the Sudan.

The Lowland East Cushitic ancestral component was the major ancestral component in Somali from Ethiopia and Somalia (Fig. 3, Supplementary Fig. 3 and Supplementary Table 1), but it may be capturing some Central Cushitic ancestry if the Afar sample is actually Agaw (the sample was collected from the Wag Hemra Zone and the language was listed as Xamtan⁸). Lowland East Cushitic ancestry diverged from the Caucasian/European/Middle Eastern/south Asian cluster 2,041 generations or ~61,000 years ago (Fig. 5). The MKK (Maasai in Kinyawa, Kenya) sample showed mostly Nilo-Saharan ancestry, some Lowland East Cushitic ancestry and smaller amounts of Niger-Congo and Click Speaker ancestry, whereas the LWK (Luhya in Webuye, Kenya) and BantuKenya samples showed predominantly Niger-Congo ancestry, some Nilo-Saharan ancestry and a small amount of Pygmy ancestry, but no Lowland East Cushitic ancestry (Supplementary Fig. 3 and Supplementary Table 1).

All of the north African samples showed significant amounts of Berber ancestry (Fig. 3, Supplementary Fig. 3 and Supplementary Table 1), presumably reflecting Imazighen peoples. The Berber and Arabian ancestral components diverged 888 generations or ~27,000 years ago (Fig. 5). This divergence time is ~21,000 years before the E-M81 or E1b1b1b Y chromosome haplogroup (referred to as the Berber marker) originated in north Africa^27,28. The Berber ancestral component clustered with the Caucasian/European/Middle Eastern/south Asian ancestral components, not with the sub-Saharan ancestral components (Fig. 5). We detected Niger-Congo ancestry (7.6%) but no European ancestry in the Mozabite sample (Supplementary Table 1), inconsistent with admixture between individuals with ancestry similar to the YRI (Yoruba in Ibadan, Nigeria) and CEU (Utah Residents with Northern and Western European Ancestry) ~ 100 generations ago²⁹.

Our data set included five samples of South African Coloureds, one from the Eastern Cape, two from the Northern Cape and two from the Western Cape. Whereas all five samples showed European ancestry, the samples from the Western Cape showed more Indian, Melanesian and southeast Asian ancestry whereas the samples from the Eastern and Northern Capes showed more Click Speaker and Niger-Congo ancestry (Fig. 3 and Supplementary Table 1). Our data set also included the admixed African American sample ASW (Americans of African Ancestry in SW USA). Niger-Congo ancestry represented the major African ancestry in the ASW, but we also detected a significant amount of Pygmy ancestry (Supplementary Table 1). No Pygmy ancestry was detected in either sample of Yoruba individuals (Supplementary Table 1), indicating that the Yoruba and YRI samples are not adequate proxies of African ancestry in the ASW sample and therefore possibly inadequate for other samples of African Americans. In our data set, there is no single sample that might serve as a better proxy; therefore, we suggest adding Western Pygmies (e.g., the Biaka Pygmy sample) as an additional parental population for ancestry analysis of African Americans.

The Amhara, Oromo and Wolayta samples from Ethiopia had Lowland East Cushitic, Nilo-Saharan, Omotic and Arabian ancestry and the Tygray sample also had a small amount of Levantine-Caucasian ancestry (Supplementary Table 1). These samples of Ethiopians had no Niger-Congo or European ancestry (Supplementary Table 1). These results indicate that the YRI and CEU samples are not optimal choices as proxies for the parental populations of Ethiopians. Furthermore, these Ethiopian samples have four or five ancestries and therefore should not be modeled by two-way admixture. As with the Mozabite sample, use of the YRI and CEU samples as proxies for the parental populations for the Ethiopians will lead to reconstruction of excessively short haplotypes, estimation of excessively long times since admixture began and poor estimates of admixture proportions.

Previously, nine ancestral components were identified among Africans: Chadic-Saharan, Cushitic, Fulani, Hadza, Niger-Kordofanian, Nilo-Saharan, Sandawe, Southern African/Khoesan/Mbuti and western Pygmy⁶. In comparison, we identified seven ancestral components: Berber, Click Speaker, Lowland East Cushitic, Niger-Congo, Nilo-Saharan, Omotic and Pygmy. Our data set lacked samples of Chadic speakers, Fulani, Hadza and Sandawe but included samples of Berbers and Omotic speakers. Our data set included more Khoesan samples, revealing divergence between Click Speaker and Pygmy ancestral components, implying a more recent divergence of eastern vs. western Pygmy²⁴. The other ancestral components appear directly comparable.

The Americas, Asia and Oceania

Ancestral components in Asia grouped into two clusters: one in south Asia containing the Indian and Kalash ancestral components and the other in east and north Asia containing the Native American, Melanesian, Siberian, Southeast Asian, Chinese and Japanese ancestral components (Fig. 5). The south Asian ancestral components diverged from the Caucasian/European/Middle Eastern ancestral components 1,452 generations or ~44,000 years ago (Fig. 5). Kalash and Indian ancestral components subsequently diverged 1,090 generations or ~33,000 years ago (Fig. 5). The Kalash ancestral component predominantly identified the Kalash sample and appeared in small amounts (<10%) in any other sample (Fig. 3, Supplementary Fig. 4 and Supplementary Table 1). This result is consistent with the Kalash people representing a population isolate. We detected no evidence of Arabian or southern European ancestry (Supplementary Table 1), indicating that the Kalash people are not of Arab or Greek origin. The Indian ancestral component was detected in several samples throughout central and south Asia, the Middle East and the Caucasus and South Africa (Fig. 3, Supplementary Figs. 3, 4 and 5 and Supplementary Table 1).

The Melanesian ancestral component diverged 2,907 generations or ~87,000 years ago (Fig. 5). The Melanesian ancestral component was the major component in the two samples from Oceania and was present in small amounts in samples from Singapore, India and South Africa (Fig. 3, Supplementary Figs. 3 and 4 and Supplementary Table 1), suggesting some degree of representation of island southeast Asia as well as Oceania. The Native American ancestral component diverged 1,777 generations or ~53,000 years ago (Fig. 5). This divergence time predates most estimates of the time(s) of the crossing of Beringia, consistent with isolation in Beringia prior to migration to the Americas. The Native American ancestral component was the major component in several samples from the Americas and was undetected in all east Asian and European samples (Fig. 3, Supplementary Figs. 4 and 5 and Supplementary Table 1)³⁰. The Siberian ancestral component was the next to diverge, 1,095 generations or ~33,000 years ago (Fig. 5). The Siberian ancestral component was predominant in the Yakut sample, with a significant presence in several samples from Manchuria, Mongolia and north China (Fig. 3, Supplementary Fig. 4 and Supplementary Table 1).

The southeast Asian, or perhaps more precisely mainland southeast Asian, ancestral component diverged 658 generations or ~20,000 years ago (Fig. 5). Wangkumhang et al.³¹ also identified one major ancestral component common to four Thai populations. Chinese and Japanese ancestral components diverged 256 generations or ~7,700 years ago (Fig. 5). The Chinese ancestral component was the major ancestral component in several samples from both south and north China (Fig. 3, Supplementary Fig. 4 and Supplementary Table 1). The Japanese ancestral component was the major component only in the two samples from Japan (Fig. 3, Supplementary Fig. 4 and Supplementary Table 1).

The Uygur sample showed highly heterogeneous ancestry: 20.8% Chinese, 18.0% Siberian and 9.7% Japanese; 9.4% Indian and 4.6% Kalash; and 15.4% Levantine-Caucasian and 12.3% northern European (Supplementary Fig. 4 and Supplementary Table 1). These proportions indicate south Asian and Middle East/Caucasus ancestry in addition to east Asian and European ancestry, consistent with trade on the Silk Road. The non-Jewish Uzbekistan and Hazara samples showed similar ancestry to the Uygur sample (Supplementary Figs. 4 and 5 and Supplementary Table 1).

The CLM (Colombians from Medellín, Colombia), MXL (Mexican Ancestry from Los Angeles, USA) and PUR (Puerto Ricans from Puerto Rico) samples from the Americas all showed mixtures of predominantly Native American and European ancestry with <10% Niger-Congo ancestry (Supplementary Fig. 4 and Supplementary Table 1). Additionally, the PUR sample showed a significant amount of Berber ancestry, which likely did not derive from a Spanish parental population as none of the three Spanish samples (Spain_Basque, IBS (Iberian population in Spain) and Spain) showed significant amounts of Berber ancestry (Supplementary Fig. 4 and Supplementary Table 1)³². Furthermore, the CLM and PUR samples showed more Arabian ancestry plus Berber ancestry than the MXL sample (7.9% and 10.8% vs. 5.3%, respectively). Given that Arabian and Berber ancestral components cluster with European ancestral components, divergence of the “Latino-specific European component” from the presumed Iberian parental populations may reflect imprecise usage of “European ancestry”³².

Europe

The Levantine-Caucasian and European ancestral components diverged 842 generations or ~25,000 years ago (Fig. 5). Northern and southern European ancestral components subsequently diverged 795 generations or ~24,000 years ago (Fig. 5). The northern European ancestral component was the major ancestral component in samples from Finland, Lithuania, Russia and Belorussia (Fig. 3, Supplementary Fig. 5 and Supplementary Table 1). The northern European ancestral component clustered with the Caucasian/European/Middle Eastern/south Asian ancestral components (Fig. 5), inconsistent with an origin of northern European ancestry in north Asia. However, Siberian ancestry was detected in the Russian and FIN (Finnish in Finland) samples (6.1% and 4.2%, respectively, Supplementary Table 1), consistent with a small amount of westward migration from Siberia to north Europe. The Spanish and Italian samples showed southern and northern European ancestry with varying amounts of Levantine-Caucasian, Arabian and Berber ancestry (Supplementary Fig. 5 and Supplementary Table 1). In contrast, the Basque samples showed only southern and northern European ancestry (Supplementary Fig. 5 and Supplementary Table 1), consistent with genetic isolation. Also, we detected more Arabian than Berber ancestry in Spain and Italy³³. The oft-used CEU sample showed northern European, southern European and Levantine-Caucasian ancestry, similar to the GBR (British in England and Scotland) and French samples (Supplementary Fig. 5 and Supplementary Table 1).

The Middle East and the Caucasus

Arabian and Levantine-Caucasian ancestral components diverged 1,044 generations or ~31,000 years ago (Fig. 5). The Arabian ancestral component was the major ancestral component in the Qatari and Bedouin samples (Supplementary Fig. 5 and Supplementary Table 1). The Arabian ancestral component had a decreasing presence westward across north Africa (Fig. 3).

The Levantine-Caucasian ancestral component was the major ancestral component in only the Georgia sample, but held a plurality in several samples across the Middle East and the Caucasus and was detected in south Asian samples (Fig. 3, Supplementary Figs. 4 and 5 and Supplementary Table 1). The sample of Ethiopian Jews lacked Levantine-Caucasian ancestry but had ancestry similar to the Amhara (Supplementary Table 1), consistent with conversion of indigenous Ethiopians to Judaism. Similarly, the Kochi Jews and Mumbai Jews had large amounts of Indian ancestry (Supplementary Table 1), consistent with conversion. In contrast, Moroccan Jews differed from the other samples from Morocco by having Levantine-Caucasian ancestry but less Berber ancestry (Supplementary Table 1), consistent with migration of Jewish people. Paired analysis of Jews and non-Jews were available for seven countries: Georgia, Iran, Morocco, Romania, Turkey, Uzbekistan and Yemen. Compared to non-Jews, Jews had more Southern European ancestry (21.9% vs. 13.8%), Arabian ancestry (18.9% vs. 10.8%), Levantine-Caucasian ancestry (33.5% vs. 27.8%) and Lowland East Cushitic ancestry (4.4% vs. 2.5%)³⁴.

To contextualize these findings, six points should be kept in mind. One, markers were not ascertained for ancestry informativeness. However, markers were ascertained for common polymorphisms. Using whole genome sequence data, we estimated and corrected for the effects of ascertaining for (1) common vs. lower frequency polymorphisms and (2) segregating sites. Two, genetic history revealed by autosomal markers need not be identical to genetic histories of uniparentally inherited markers (the Y chromosome or mitochondria). Three, estimated times since divergence of ancestral components assumed the absence of gene flow. These times more likely reflect the recent past than the distant past. Four, genetics and self-identified ethno-linguistic labels do not perfectly correlate. Five, unsupervised ancestry analysis does not require the investigator to choose external reference samples to serve as proxies of parental populations for putative admixed samples and is amenable as-is for analysis of multi-way ancestry. Importantly, unsupervised ancestry analysis takes advantage of ancestry across the entire data set, increasing confidence by increasing the effective sample size by ancestral component. This can be seen by noting that the average number of individuals per sample was 21.6 whereas the average number of individuals per ancestral component was 185.7. However, unsupervised ancestry analysis does not allow for exact identification of parental populations in terms of real-world samples. Six, the time period of history revealed by our data set is the Late Pleistocene. That is, our conclusions are unaffected by recent population growth during the Holocene. Furthermore, the inbreeding effective population size captures the effects of bottlenecks.

In summary, we showed that ancestry of modern humans covered 140,000 years of history, with two major Out-of-Africa migrations. Eight divergence times occurred between ~33,000 to ~20,000 years ago, coinciding with the Last Glacial Maximum. We recommend that ancestry analyses should be globally comprehensive, even if interest is regional, because redefining an existing ancestral component or defining a new ancestral component will impact the definitions of other ancestral components. Characterization of human ancestry is ongoing as sampling of some ancestries is poorer than others. To name a few examples, the Melanesian ancestral component has the lowest effective sample size, Chadic- and Cushitic-speaking peoples are not well represented in our data set and Polynesian samples are absent. In contrast, some ancestries are well sampled, including Chinese. We anticipate that most unsampled lineages reflect recent divergence events. However, it is possible that an unsampled lineage could reflect a divergence event older than 140,000 years. Also, the limited density of markers precludes accurate dating of potential admixture events because many ancestral switches will be missed. Our findings strongly inform control for population stratification in genetic association studies and inference of local ancestry in admixed individuals. Shared ancestry provides another layer of insight into human evolution, particularly with respect to migrations.

Methods

We collected genome-wide genotype data on autosomal single nucleotide polymorphisms (SNPs) from publicly available human genomic diversity projects. The global data set included 916 individuals from the Human Genome Diversity Project⁵, 1,092 individuals from the 1000 Genomes Project⁷, 222 individuals from east Africa⁸, 268 individuals from the Singapore Genome Variation Project⁹, 75 individuals from Lebanon¹⁰, 145 individuals from north Africa and the Basque Country¹¹, 323 individuals from south Africa^12,13, 18 Arabs from Qatar¹⁴, 106 individuals from west and central Africa¹⁵, 133 Maasai from the International HapMap Project¹⁶ and 462 individuals from a study of the Jewish Diaspora¹⁷. Data management and quality control were performed using PLINK version 1.07³⁵. Graphics were generated using R³⁶. Maps were drawn using the R libraries maps and plotrix.

Individuals or markers with genotyping call rates < 95% were excluded. We also removed individuals identified as identical samples, 1^st degree relatives, or 2^nd degree relatives. After quality control, the global data set comprised 3,528 individuals from 163 samples. The mutual intersection of all data sets yielded 19,372 diallelic, autosomal SNPs with experimentally determined genotypes (i.e., no imputation of missing genotypes was performed). The genotyping call rate in the remaining individuals was 99.8%. The average distance between markers was 142.8 kb (135.4 kb excluding centromeres). Due to very small sample sizes for some samples, no additional pruning of markers based on linkage disequilibrium was performed.

Principal components analysis was first performed on the cleaned data set of 3,528 individuals and 19,372 SNPs to confirm the expected continental-level structure (Supplementary Fig. 6)³⁷. We then performed unsupervised ancestry analysis using ADMIXTURE³⁸ with the number of ancestral components K ranging from 1 to 30. The optimal value of K was determined by five-fold cross-validation, averaged over three runs with different starting seeds. For each ancestral component, the sample with the largest proportion of that ancestral component was identified as an exemplar. Conditioned on the optimal value of K, ADMIXTURE analysis was repeated with the addition of 200 bootstrap replicates to obtain standard errors for the proportions of ancestral components for each individual. Average ancestry proportions and 95% confidence intervals for each sample were calculated accounting for both within and between individual variance. Average proportions for which the 95% confidence intervals included 0 were zeroed out (Supplementary Table 1).

ADMIXTURE produces two files: the .P file contains an estimated allele frequency for each marker for each ancestral component and the .Q file contains an estimated proportion for each individual for each ancestral component. Heterozygosity for each marker within each ancestral component was estimated from the .P file. The mean heterozygosity for each ancestral component was estimated by averaging heterozygosity across all markers. ADMIXTURE reports pairwise divergence between each ancestral component as assessed by F_ST but without accompanying confidence intervals (Supplementary Table 2). These confidence intervals require estimates of the variances of the allele frequencies for each ancestral component.

To account for ascertainment biases in F_ST and heterozygosity estimated from chip-based genotype data, we estimated F_ST and heterozygosity using the 1000 Genomes sequence data⁷ (a total of 36,820,992 variable sites across a total of 2,881,033,286 sites). We estimated pairwise F_ST using the definition , in which H_T is the mean of the expected heterozygosity across samples and H_S is the mean of the observed heterozygosity across samples³⁹. This estimator of F_ST is robust to the proportion of polymorphic sites because H_T and H_S scale identically. We estimated the effective population size N_e within samples two different ways. One, we used the estimators and ³⁹, in which S is the number of segregating sites, for a sample size n and μ = 1.1 × 10⁻⁸ mutations/generation/site⁴⁰. Two, we used the estimators and ³⁹, in which H is the mean of the observed heterozygosity within the sample. Note that S does not make use of allele frequencies whereas H does. Pairwise divergence times between ancestral components were estimated using the relationship ³⁹, with t in generations and N_e being the harmonic mean for the two ancestral components being compared, assuming that F_ST = 0 at t = 0.

Ethics

This project was determined to be excluded from IRB Review by the National Institutes of Health Office of Human Subjects Research Protections, Protocol #12183.

References

Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).
Article CAS PubMed PubMed Central Google Scholar
Gronau, I., Hubisz, M. J., Gulko, B., Danko, C. G. & Siepel, A. Bayesian inference of ancient human demography from individual genome sequences. Nat. Genet. 43, 1031–1034 (2011).
Article CAS PubMed PubMed Central Google Scholar
Harris, K. & Nielsen, R. Inferring demographic history from a spectrum of shared haplotype lengths. PLoS Genet. 9, e1003521 (2013).
Article CAS PubMed PubMed Central Google Scholar
Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H. & Bustamante, C. D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, J. Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104 (2008).
Article ADS CAS PubMed Google Scholar
Tishkoff, S. A. et al. The genetic structure and history of Africans and African Americans. Science 324, 1035–1044 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Pagani, L. et al. Ethiopian genetic diversity reveals linguistic stratification and complex influences on the Ethiopian gene pool. Am. J. Hum. Genet. 91, 83–96 (2012).
Article CAS PubMed PubMed Central Google Scholar
Teo, Y. Y. et al. Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations. Genome Res. 19, 2154–2162 (2009).
Article CAS PubMed PubMed Central Google Scholar
Haber, M. et al. Genome-wide diversity in the Levant reveals recent structuring by culture. PLoS Genet. 9, e1003316 (2013).
Article CAS PubMed PubMed Central Google Scholar
Henn, B. M. et al. Genomic ancestry of North Africans supports back-to-Africa migrations. PLoS Genet. 8, e1002397 (2012).
Article CAS PubMed PubMed Central Google Scholar
Petersen, D. C. et al. Complex patterns of genomic admixture within southern Africa. PLoS Genet. 9, e1003309 (2013).
Article CAS PubMed PubMed Central Google Scholar
Schlebusch, C. M. et al. Genomic variation in seven Khoe-San groups reveals adaptation and complex African history. Science 338, 374–379 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Hunter-Zinck, H. et al. Population genetic structure of the people of Qatar. Am. J. Hum. Genet. 87, 17–25 (2010).
Article CAS PubMed PubMed Central Google Scholar
Bryc, K. et al. Genome-wide patterns of population structure and admixture in West Africans and African Americans. Proc. Natl. Acad. Sci. USA 107, 786–791 (2010).
Article ADS CAS PubMed Google Scholar
The International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Behar, D. M. et al. The genome-wide structure of the Jewish people. Nature 466, 238–242 (2010).
Article ADS CAS PubMed Google Scholar
Fenner, J. N. Cross-cultural estimation of the human generation interval for use in genetics-based population divergence studies. Am. J. Phys. Anthropol. 128, 415–423 (2005).
Article PubMed Google Scholar
Tremblay, M. & Vézina, H. New estimates of intergenerational time intervals for the calculation of age and origins of mutations. Am. J. Hum. Genet. 66, 651–658 (2000).
Article CAS PubMed PubMed Central Google Scholar
Poznik, G. D. et al. Sequencing Y chromosomes resolves discrepancy in time to common ancestor of males versus females. Science 341, 562–565 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Cruciani, F. et al. A revised root for the human Y chromosomal phylogenetic tree: the origin of patrilineal diversity in Africa. Am. J. Hum. Genet. 88, 814–818 (2011).
Article CAS PubMed PubMed Central Google Scholar
Wall, J. D. et al. Higher levels of Neanderthal ancestry in East Asians than in Europeans. Genetics 194, 199–209 (2013).
Article PubMed PubMed Central Google Scholar
Lohse, K. & Frantz, L. A. Neandertal admixture in Eurasia confirmed by maximum-likelihood analysis of three genomes. Genetics 196, 1241–1251 (2014).
Article PubMed PubMed Central Google Scholar
Patin, E. et al. Inferring the demographic history of African farmers and Pygmy hunter-gatherers using a multilocus resequencing data set. PLoS Genet. 5, e1000448 (2009).
Article CAS PubMed PubMed Central Google Scholar
Loh, P.-R. et al. Inferring admixture histories of human populations using linkage disequilibrium. Genetics 193, 1233–1254 (2013).
Article PubMed PubMed Central Google Scholar
Clark, P. U. et al. The Last Glacial Maximum. Science 325, 710–714 (2009).
Article ADS CAS PubMed Google Scholar
Arredi, B. et al. A predominantly Neolithic origin for Y-chromosomal DNA variation in North Africa. Am. J. Hum. Genet. 75, 338–345 (2004).
Article CAS PubMed PubMed Central Google Scholar
Cruciani, F. et al. Phylogeographic analysis of haplogroup E3b (E-M215) Y chromosomes reveals multiple migratory events within and out of Africa. Am. J. Hum. Genet. 74, 1014–1022 (2004).
Article CAS PubMed PubMed Central Google Scholar
Price, A. L. et al. Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet. 5, e1000519 (2009).
Article CAS PubMed PubMed Central Google Scholar
Raghavan, M. et al. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505, 87–91 (2014).
Article ADS CAS PubMed Google Scholar
Wangkumhang, P. et al. Insight into the peopling of mainland southeast Asia from Thai population genetic structure. PLoS ONE 8, e79522 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Moreno-Estrada, A. et al. Reconstructing the population genetic history of the Caribbean. PLoS Genet. 9, e1003925 (2013).
Article CAS PubMed PubMed Central Google Scholar
Botigué, L. R. et al. Gene flow from North Africa contributes to differential human genetic diversity in southern Europe. Proc. Natl. Acad. Sci. USA 110, 11791–11796 (2013).
Article ADS PubMed PubMed Central Google Scholar
Elhaik, E. The missing link of Jewish European ancestry: contrasting the Rhineland and the Khazarian hypotheses. Genome Biol. Evol. 5, 61–74 (2013).
Article CAS PubMed Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing. (R Foundation for Statistical Computing: Vienna, Austria, 2013).
Shriner, D. Investigating population stratification and admixture using eigenanalysis of dense genotypes. Heredity 107, 413–420 (2011).
Article CAS PubMed PubMed Central Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central Google Scholar
Hartl, D. L. A Primer of Population Genetics. (Third edn, Sinauer Associates, Inc.: Sunderland, Massachusetts, 2000).
The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

Download references

Acknowledgements

The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official view of the National Institutes of Health. This research was supported by the Intramural Research Program of the Center for Research on Genomics and Global Health (CRGGH). The CRGGH is supported by the National Human Genome Research Institute, the National Institute of Diabetes and Digestive and Kidney Diseases, the Center for Information Technology and the Office of the Director at the National Institutes of Health (Z01HG200362).

Author information

Authors and Affiliations

Center for Research on Genomics and Global Health, National Human Genome Research Institute, Building 12A, Room 4047, 12 South Drive, Bethesda, Maryland, 20892, USA
Daniel Shriner, Fasil Tekola-Ayele, Adebowale Adeyemo & Charles N. Rotimi

Authors

Daniel Shriner
View author publications
You can also search for this author in PubMed Google Scholar
Fasil Tekola-Ayele
View author publications
You can also search for this author in PubMed Google Scholar
Adebowale Adeyemo
View author publications
You can also search for this author in PubMed Google Scholar
Charles N. Rotimi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.S. conceived and designed the study. D.S., F.T.-A. and A.A. collected the data. D.S. performed the analyses and wrote the manuscript. D.S., F.T.-A., A.A. and C.N.R. interpreted the data, discussed the results and commented on the manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/

Reprints and permissions

About this article

Cite this article

Shriner, D., Tekola-Ayele, F., Adeyemo, A. et al. Genome-wide genotype and sequence-based reconstruction of the 140,000 year history of modern human ancestry. Sci Rep 4, 6055 (2014). https://doi.org/10.1038/srep06055

Download citation

Received: 05 March 2014
Accepted: 28 July 2014
Published: 13 August 2014
DOI: https://doi.org/10.1038/srep06055

This article is cited by

Admixture mapping identifies African and Amerindigenous local ancestry loci associated with fetal growth
- Fasil Tekola-Ayele
- Marion Ouidir
- Cuilin Zhang
Human Genetics (2021)
Pharmacogenomic implications of the evolutionary history of infectious diseases in Africa
- J L Baker
- D Shriner
- C N Rotimi
The Pharmacogenomics Journal (2017)
Human ancestry correlates with language and reveals that race is not an objective genomic classifier
- Jennifer L. Baker
- Charles N. Rotimi
- Daniel Shriner
Scientific Reports (2017)
Ancestry Testing and the Practice of Genetic Counseling
- Brianne E. Kirkpatrick
- Misha D. Rashkin
Journal of Genetic Counseling (2017)
Ancient Human Migration after Out-of-Africa
- Daniel Shriner
- Fasil Tekola-Ayele
- Charles N. Rotimi
Scientific Reports (2016)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.