Mapping Post-Glacial expansions: The Peopling of Southwest Asia

Archaeological, palaeontological and geological evidence shows that post-glacial warming released human populations from their various climate-bound refugia. Yet specific connections between these refugia and the timing and routes of post-glacial migrations that ultimately established modern patterns of genetic variation remain elusive. Here, we use Y-chromosome markers combined with autosomal data to reconstruct population expansions from regional refugia in Southwest Asia. Populations from three regions in particular possess distinctive autosomal genetic signatures indicative of likely refugia: one, in the north, centered around the eastern coast of the Black Sea, the second, with a more Levantine focus, and the third in the southern Arabian Peninsula. Modern populations from these three regions carry the widest diversity and may indeed represent the most likely descendants of the populations responsible for the Neolithic cultures of Southwest Asia. We reveal the distinct and datable expansion routes of populations from these three refugia throughout Southwest Asia and into Europe and North Africa and discuss the possible correlations of these migrations to various cultural and climatic events evident in the archaeological record of the past 15,000 years.


Included studies
In addition, Y-Hg and haplotype data were also incorporated from prior studies (Table S1) (Table S2), from which most informative derived sets were constructed. The YHRD tree and ISOGG 2011 tree were compared, with YHRD being used as the base tree for derivation (also Table S2).

Y Haplogroup expansions
BATWING population splitting was applied to individual haplogroups drawn from both wholepopulation and haplogroup-specific studies. BATWING assumes a single-step mutation model, and coalescence, and it models population splitting by dividing the total parent population effective population size among the child populations. In practice, it has been noted that subsampling by haplogroups as opposed to random sampling appears to yield biased estimates of effective population sizes 16 , revealing structure. We argue that the mutation and coalescence formulas are correct. Further we suggest that the "fixed, then expanding" population growth profile option, along with relatively wide confidence intervals, may provide a spline to the actual effective populations governing coalescence. Together, these would allow BATWING to model haplogroup expansions.
We justify using BATWING for this purpose as follows. BATWING constructs multiple trees and sampling population parameters (e.g. mutation rates and effective population sizes), computing likelihoods for each configuration employing Metropolis-Hastings Monte Carlo sampling, seeking likely regions of its range of parameters. After equilibration, BATWING presents various population-split tree hypotheses in proportion to their probabilities. Migration events produce minority trees if all haplogroups are combined: the modal trees reflect the dominating founding lineages represented in the populations 17 . In order to isolate and amplify the minority trees, we computed the population split times marked by specific haplogroups that likely marked the migration events. Coalescence within haplogroups are governed by the overall effective populations' sizes in which the evolution occurred; estimates of effective population size will reflect the coalescence rates marking the haplogroups' expansions.

Contour Map, PCA, MDS, BATWING Parameters, and other Details
Frequency ( Figure S2) and variance ( Figure S3) contour maps of the J1 and J2 haplogroups were constructed using the Kriging procedure 18 with Surfer 8 (Golden Software).
Principal component analysis (PCA) 19 was applied to relative haplogroup frequencies using prcomp 20,21 in R 21 . Kruskal's stress majorization multidimensional scaling 22 (MDS) was computed using isoMDS 20 in R 21 . The parameter set for BATWING 16 was that of Zhivotovsky's 23 mutation rate implemented in Xue et al. 24 Generally, burn-in cycles to achieve equilibration and convergence between pools ranged from 1.5 million to 3 million recorded samples. For Y-chromosome analysis, AMOVA 25 was provided in ARLEQUIN 26 . RST = ( 2 ) of the total distance, so that a relatively small number of components of may capture most of the genetic distance information simplifying the task of relating geographic associations between populations of data. The distribution of eigenvalues expected by chance has been described previously. 34 In PCA, refugia would be expected to be marked by the largest genetic divergences between population clusters in a region reflecting the greatest periods of isolation. Generally, these correspond to the ancestral populations found by ADMIXTURE. Regions marked by larger divergences from others in PCA tend to be dominated by one or another ancestral population in ADMIXTURE. Pairwise FST statistics, computed using will also tend to be larger for more divergent autosomal samples.