Quantitating and Dating Recent Gene Flow between European and East Asian Populations

Historical records indicate that extensive cultural, commercial and technological interaction occurred between European and Asian populations. What have been the biological consequences of these contacts in terms of gene flow? We systematically estimated gene flow between Eurasian groups using genome-wide polymorphisms from 34 populations representing Europeans, East Asians, and Central/South Asians. We identified recent gene flow between Europeans and Asians in most populations we studied, including East Asians and Northwestern Europeans, which are normally considered to be non-admixed populations. In addition we quantitatively estimated the extent of this gene flow using two statistical approaches, and dated admixture events based on admixture linkage disequilibrium. Our results indicate that most genetic admixtures occurred between 2,400 and 310 years ago and show the admixture proportions to be highly correlated with geographic locations, with the highest admixture proportions observed in Central Asia and the lowest in East Asia and Northwestern Europe. Interestingly, we observed a North-to-South decline of European gene flow in East Asians, suggesting a northern path of European gene flow diffusing into East Asian populations. Our findings contribute to an improved understanding of the history of human migration and the evolutionary mechanisms that have shaped the genetic structure of populations in Eurasia.

Papuan was used as an out-group. French and Dai were used as the surrogates of ancestral EUR and EAS, respectively.

Supplementary Figure S3 | Linear regression model for ancestry estimation.
We got well-fitting linear regression models for (a) EAS, r 2 = 0.80 and (b) EUR, r 2 = 0.90 but not for (c) CSA. A1 denotes number of generations since the first (ancient) admixture event, while A2 denotes number of generations since the second (recent) admixture event. W denotes the weights of the second admixture, i.e. the proportion of ancestry contribution of the second admixture to the current admixed population.

Supplementary Figure S5 | Simulation for estimation of admixture using both real ancestors and surrogates.
In the model below, population B and C are the real ancestor who directly contributed genetic material to admixed population X. Population B' and C' were the surrogates of ancestor B and C, respectively. Parameters for demographic history could be found in the command line of ms (basic effective population size is 10,000). Results of estimations of admixture proportion could be found in Supplementary Fig. S6.
Basic command line used to produce the simulated data by ms is ./ms 120 50000 -t 1 -I 6 20 20 20 20 20 20 -n 1 8 -n 2 2.5 -n 3 5 -n 4 1.5 -n 5 1.5 -n 6 2 -es 0. In the model below, population B and C are the real ancestor who directly contributed 50% genetic material to admixed population X. Population C is pre-mixed who inherited 5% genetic material from D. Parameters for demographic history could be found in the command line of ms (basic effective population size is 10,000). Results of estimations could be found in Supplementary Fig. S8.
Basic command line used to produce the simulated data by ms is . In simulations (based on the model in Supplementary Fig. S7), we considered both full markers (red box) and markers with minor allele frequency > 5% (blue box). One ancestor is pre-mixed who inherited 5% genetic material from European and contributed 50% to the new admixed population (expected admixture proportion for the new admixed population is 2.5%). Admixture proportion was re-estimated by F 4 ratio based on simulated data. Results showed neither ascertainment bias nor pre-mixed ancestor would affect the estimations of admixture proportion.

Supplementary Figure S9 | Simulation and dating admixture when the real ancestor is pre-mixed.
(a) Admixture scenario for simulations (Similar to the history of Xibo). Ancestor C was pre-mixed at 110 generations ago. Population X was admixed by A and C at 10 generations ago. (b) Admixture time was estimated by ROLLOFF. Simulation and dating were repeated with 50 times.

Supplementary Figure S10 | Definition of haplotype segments we used to estimate admixture time.
Number of segments with one allele from ancestry A and the other allele from ancestry B were used to estimate expected admixture time based on Equation 5.
Blue bars are the segments from ancestry A, red bars are segments from ancestry B. S is the segment we count.