Late Pleistocene human genome suggests a local origin for the first farmers of central Anatolia

Anatolia was home to some of the earliest farming communities. It has been long debated whether a migration of farming groups introduced agriculture to central Anatolia. Here, we report the first genome-wide data from a 15,000-year-old Anatolian hunter-gatherer and from seven Anatolian and Levantine early farmers. We find high genetic continuity (~80–90%) between the hunter-gatherers and early farmers of Anatolia and detect two distinct incoming ancestries: an early Iranian/Caucasus related one and a later one linked to the ancient Levant. Finally, we observe a genetic link between southern Europe and the Near East predating 15,000 years ago. Our results suggest a limited role of human migration in the emergence of agriculture in central Anatolia.

transition 6 , indicating that the hunter-gatherers of these regions locally transitioned to a food producing subsistence strategy.
Central Anatolia has some of the earliest evidence of agricultural societies outside the Fertile Crescent 3 and thus is a key region in understanding the early spread of farming. While archaeological evidence points to cultural continuity in central Anatolia 3  Here, we report new genome wide data from eight prehistoric humans (Fig.1A, Table 1, table S1), including the first Epipalaeolithic Anatolian hunter-gatherer sequenced to date (labeled 10 AHG; directly dated to 13,642-13,073 BCE, excavated from the site of Pınarbaşı, Turkey), 5 early Neolithic Aceramic Anatolian farmers (labeled AAF; c. 8300-7800 cal BCE, one directly dated to 8269-8210 cal BCE 3 , from the site of Boncuklu, Turkey), adding to previously published genomes from this site 7 , and two Early Neolithic (PPNB) farmers from the southern Levant (One labeled KFH2, directly dated to c. 7,700-7,600 BCE; from the site of Kfar 15 HaHoresh, Israel and the second labeled BAJ001, c. 7027-6685 BCE, from the site of Ba'ja, Jordan). This data comprises a genetic record stretching from the Epiplaeolithic into the Early Holocene, spanning the advent of agriculture in the region.
By analyzing this data, we find that the Anatolian hunter-gatherers are genetically distinct from other reported late Pleistocene populations and thus represent a previously undescribed 20 population. We reveal that Neolithic Anatolian populations derive a large fraction of their ancestry from the Epipaleolithic Anatolian population, suggesting farming was adopted locally by the hunter-gatherers of central Anatolia. We also detect distinct genetic interactions between the populations of central Anatolia and earlier farming centers to the east, during the late Pleistocene/early Holocene as well as with European hunter-gatherers to the west during the Late Pleistocene. 5 We extracted DNA from the ancient human remains and prepared it for next-generation sequencing 8,9 which resulted in human DNA yields lower than 2% (data table S1), comparable with low DNA preservation previously reported in the region 6,7 . To generate genome wide data, despite the low DNA yields we performed in-solution DNA enrichment targeting 1.24 million genome-wide single nucleotide polymorphisms (SNPs) ('1240k capture') 10 , which resulted in 10 129,406 to 917,473 covered SNPs per individual. We estimated low mitochondrial contamination levels for all eight individuals (1-6%; Materials and Methods and table S2) and could further test the males for nuclear contamination, resulting in low estimates (0.05-2.23%; table S2). For population genetic analyses, we merged genotype data of the new individuals with previously published datasets from 587 ancient individuals and 254 present-day populations 15 (data table S2).

Results
To estimate how the ancient individuals relate to the known west Eurasian genetic variation, we projected them onto the top two dimensions of present-day principal component analysis (PCA) 6 (Fig. 1B). Strikingly, the AHG individual is positioned near both AAF and later Anatolian Ceramic farmers 10 (labeled ACF; 7,000 -6,000 cal BCE) with a slight leftward shift. 20 These three prehistoric Anatolian populations (AHG, AAF and ACF), that represent a temporal transect spanning the transition into farming, are positioned between Mesolithic western  (table   S3) indicates that AHG is distinct from both the WHG and Epipaleolithic/Neolithic Levantine 10 populations and yet shares extra affinity with each when compared to the other. Accordingly, we find an adequate two-way admixture model using qpAdm 12 (χ 2 p = 0.158), in which AHG derives around half of his ancestry from a Neolithic Levantine-related gene pool (48.0 ± 4.5 %; estimate ± 1 SE) and the rest from the WHG-related one (tables S4 and S5). These results support a late Pleistocene presence of both ancestries in a mixed form in central Anatolia. Notably, the genetic 15 connection with the Levant predates the advent of farming in this region by at least five millennia and potentially correlates with evidence of human interactions between central Anatolia and the Levant during the Epipalaeolithic 13 .
In turn, AAF is slightly shifted upwards compared to AHG in the PCA, to the direction where ancient and modern Caucasus and Iranian groups are located. Likewise, when compared 20 to AHG by D(AAF, AHG; test, Mbuti), the AAF early farmers show extra affinity with early Holocene populations from Iran or Caucasus and with present-day South Asians, who have also been genetically linked with Iranian/Caucasus ancestry 14,15 (Fig. 2A, fig. S2 and data table S3).
A mixture of AHG and Neolithic Iranians provides a good fit to AAF in our qpAdm modeling (χ 2 p = 0.296), in which they derive most of their ancestry (89.7 ± 3.9 %) from a population related to AHG (tables S4 and S6). This suggests a long-term genetic stability in central Anatolia over five millennia despite changes in climate and subsistence strategy. Given that our admixture model for AHG does not require the Neolithic Iranian ancestry, it presumably diffused into In contrast, we find that the later ACF individuals share more alleles with the early 10 Holocene Levantines than AAF do, as shown by positive D(ACF, AAF; Natufian/Levant_N, Mbuti) ≥ 3.84 SE (Fig. 2B, fig. S3 and data table S3). Ancient Iran/Caucasus populations and contemporary South Asians do not share more alleles with ACF (|D| < 3.3 SE). Likewise, qpAdm modeling suggests that the AAF gene pool still constitutes more than 3/4 of the ancestry of ACF 2,000 years later (78.7 ± 3.5 %; tables S4 and S7) with additional ancestry well modeled 15 by the Neolithic Levantines (χ 2 p = 0.115) but not by the Neolithic Iranians (χ 2 p = 0.076; the model estimated infeasible negative mixture proportions) (tables S4 and S7  18 . We find that Iron Gates HG can be modeled as a three-way mixture of Near-Eastern hunter-gatherers (25.8 ± 5.0 % AHG or 11.1 ± 2.2 % Natufian), WHG (62.9 ± 7.4 % or 78.0 ± 4.6 % respectively) and EHG (11.3 ± 3.3 % or 10.9 ± 3 % respectively); (tables S4 and S9). The affinity detected by the above D-statistic can be explained by gene flow from Near-Eastern hunter-gatherers into the ancestors of Iron Gates or by a gene flow from a population ancestral to Iron Gates into the Near-Eastern hunter-gatherers as well as by a combination of both. To distinguish the direction of the gene flow, we examined the Basal Eurasian ancestry 5 component (α), which is prevalent in the Near East 6 but undetectable in European huntergatherers 17 . Following a published approach 6 , we estimated α to be 24.8 ± 5.5 % in AHG and 38.5 ± 5.0 % in Natufians (Fig. 3B, table S10), consistent with previous estimates for the latter 6 .
Under the model of unidirectional gene flow from Anatolia to Europe, 6.4 % is expected for α of Iron Gates by calculating (% AHG in Iron Gates HG) × (α in AHG). However, Iron Gates can be 10 modeled without any Basal Eurasian ancestry or with a non-significant proportion of 1.6 ± 2.8 % ( Fig. 3B, table S10), suggesting that unidirectional gene flow from the Near East to Europe alone is insufficient to explain the extra affinity between the Iron Gates HG and the Near-Eastern hunter-gatherers. Thus, it is plausible to assume that prior to 15,000 years ago there was either a bidirectional gene flow between populations ancestral to Southeastern Europeans of the early 15 Holocene and Anatolians of the late glacial or a dispersal of Southeastern Europeans into the Near East. Presumably, this Southeastern European ancestral population later spread into central Europe during the post-last-glacial maximum (LGM) period, resulting in the observed late Pleistocene genetic affinity between the Near East and Europe.
The uniparental marker analysis placed AHG within the mitochondrial sub-haplogroup 20 K2b and within the Y-chromosome haplogroup C1a2, both rare in present-day Eurasians (Table   1 and data table S6). Mitochondrial Haplogroup K has so far not been found in Paleolithic hunter-gatherers 19 . However, Y-haplogroup C1a2 has been reported in some of the earliest European hunter-gatherers 17,20,21 . The early farmers belong to common early Neolithic mitochondrial (N1a, U3 and K1a) and Y chromosome types (C and G2a), with the exception of the Levantine BAJ001 which represents the earliest reported individual carrying the mitochondrial N1b group (Table 1 and data table S6).
We examined alleles related to phenotypic traits in the ancient genomes (data table S7). 5 Notably, three of the AAF carry the derived allele for rs12193832 in the HERC2 (hect domain and RLD2) gene that is primarily responsible for lighter eye color in Europeans 22 . The derived allele is observed as early as 14,000 -13,000 years ago in individuals from Italy and the Caucasus 17, 23 but had not yet been reported in early farmers or hunter-gatherers from the Near East. 10

Discussion
By analyzing genome-wide-data from pre-and early-Neolithic Anatolians and Levantines, we describe the demographic developments leading to the formation of the Anatolian early farmer population that later replaced most of the European hunter-gatherers and 15 represents the largest ancestral component in present-day Europeans 4, 5 .
We report a long-term persistence of the local Anatolian hunter-gatherer gene pool over seven millennia and throughout the transition from foraging to farming. This demographic pattern is similar to those previously observed in earlier farming centers of the Fertile Crescent 6 and differs from the pattern of the demic diffusion-based spread of farming into Europe 4, 5 . Our 20 results provide a genetic support for archaeological evidence 3 suggesting that Anatolia was not merely a stepping stone in a movement of early farmers from the Fertile Crescent into Europe but rather a place where local hunter-gatherers adopted ideas, plants and technology that led to agricultural subsistence.
Interestingly, while the local population structure remains highly stable, a pattern of genetic interactions with neighboring regions is observed from as early as the Late Pleistocene and into the early Holocene. External genetic contributions, associated with two distinct early and further processed using EAGER (v 1.92.54) 28 . First, adapter sequences were clipped and reads shorter than 30 bp were discarded using AdapterRemoval (v 2.2.0) 29 . Adapter-clipped 5 reads were subsequently mapped with the BWA aln/samse programs (v 0.7.12) 30 against the UCSC genome browser's human genome reference hg19 with a lenient stringency parameter ("n 0.01"). We retained reads with Phred-scaled mapping quality scores ≥ 20 and ≥ 30 for the whole genome and the mitochondrial genome, respectively. Duplicate reads were subsequently removed using DeDup v0.12.2 28 . Pseudo-diploid genotypes were generated for each individual 10 using pileupCaller which randomly draws a high quality base (Phred-scaled base quality score ≥ 30) mapping to each targeted SNP position (https://github.com/stschiff/sequenceTools). To prevent false SNP calls due to retained DNA damage, two terminal positions in each read were clipped prior to genotyping. The genotyping produced between 129,406 to 917,473 covered targeted SNPs and a mean coverage ranging between 0.16 and 2.9 fold per individual (Table 1). 15
To minimize bias from differences in analysis pipelines, we re-processed the raw read data deposited for previously published Neolithic Anatolian genomes 7 (labeled Tepecik_pub and Boncuklu_pub) in the same way as described for the newly reported individuals.
aDNA authentication and quality control We estimated authenticity of the ancient data using multiple measures. First, blank controls 5 were included and analyzed for extractions as well as library preparations (Data table S8).
Second, we assessed levels of DNA damage in the mapped reads using mapDamage (v 2.0) 36 .
Third, we estimated human DNA contamination on the mitochondrial DNA using schmutzi 37 .
Last, we estimated nuclear contamination in males with ANGSD (v 0.910) 38 , which utilizes haploid X chromosome markers in males by comparing mismatch rates of polymorphic sites and 10 adjacent ones (that are likely to be monomorphic). The genetic sex of the reported individuals was determined by comparing the genomic coverage of X and Y chromosomes normalized by the autosomal average coverage. To avoid bias caused by grouping closely related individuals into a population, we calculated the pairwise mismatch rates of the Boncuklu individuals following a previously reported method 39 (Data table S9). 15 Five of the twelve individuals reported here were excluded from the population genetic analysis: two due to a high genomic contamination level (> 5 %) and three due to low amount of analyzable data (< 10,000 SNPs covered); (Data table S1).
Principal component analysis (PCA) 20 We used the smartpca software from the EIGENSOFT package (v 6.0.1) 40 with the lsqproject option to construct the principal components of 67 present-day west Eurasian groups and project the ancient individuals on the first two components (fig. S4).

Modeling ancestry proportions
We used the qpWave (v400) and qpAdm (v 632) programs of ADMIXTOOLS 6, 12 to test and model admixture proportions in a studied population from potential source populations (reference populations). As the explicit phylogeny is unknown, a diverse set of outgroup populations ( Supplementary Information sections 1.2-1.4) was used to distinguish the ancestry of the reference populations. 5 For estimating admixture proportions in the tested populations, we used a basic set of seven outgroups including present-day populations (Han, Onge, Mbuti, Mala, Mixe) that represent a global genetic variation and published ancient populations such as Natufian 6 , that represents a Levantine gene pool outside of modern genetic variation and the European Upper Palaeolithic individual Kostenki14 20 . As a prerequisite for the admixture modeling of the target population 10 we tested whether the corresponding set of reference populations can be distinguished by the chosen outgroups using qpWave 6 (Supplementary text S3). In some cases, when a reference population did not significantly contribute to the target in the attempted admixture models, it was removed from the reference set and added to the basic outgroup set in order to increase the power to distinguish the references. In cases where "Natufian" was used as a reference 15 population, we instead used the present-day Near-Eastern population "BedouinB" as an outgroup.
For estimations of Basal Eurasian ancestry, we followed a previously described qpAdm approach 6 that does not require a proper proxy for the Basal Eurasian ancestry, which is currently

Fig. 3. Genetic links between Near-Eastern and European hunter-gatherers. (A)
Genetic affinity between Near-Eastern and European hunter-gatherers increases after 14,000 years ago as measured by the statistic D(European HG, Kostenki14; Natufian/AHG, Mbuti) Vertical lines mark ±1 SE. Kostenki14 serves here as a baseline for the earlier European hunter-gatherers. 5 Statistics including all analyzed European hunter-gatherers are listed in data table S5. Individuals marked with an asterisk did not reach the analysis threshold of over 30,000 SNPs overlapping with Natufian/AHG. (B) Basal Eurasian ancestry proportions (α) as a marker for Near-Eastern gene flow. Mixture proportions inferred by qpAdm for AHG and the Iron Gates HG are schematically represented 6 . The lower schematic shows the expected α in Iron Gates HG under Table 1. An overview of ancient genomes reported in this study. For each individual the analysis group is given (AHG = Anatolian hunter-gatherer; AAF = Anatolian Aceramic early farmer; Levant_Neol = Levantine early farmer). When 14 C dating results are available the date is given in cal BCE in 2-sigma range, otherwise a date based on the archaeological context is provided (detailed dating information is provided in Supplementary text S1 and table S1). The proportion of human DNA and the mean coverage on 1240K target sites in the '1240K' enriched libraries are given. Uniparental haplogroups (mt = mitochondrial; Ychr = Y chromosome) 5 are listed. Detailed information on the uniparental analysis can be found in Supplementary text S1 and data table S6.