The origins and spread of domestic horses from the Western Eurasian steppes

Domestication of horses fundamentally transformed long-range mobility and warfare1. However, modern domesticated breeds do not descend from the earliest domestic horse lineage associated with archaeological evidence of bridling, milking and corralling2–4 at Botai, Central Asia around 3500 bc3. Other longstanding candidate regions for horse domestication, such as Iberia5 and Anatolia6, have also recently been challenged. Thus, the genetic, geographic and temporal origins of modern domestic horses have remained unknown. Here we pinpoint the Western Eurasian steppes, especially the lower Volga-Don region, as the homeland of modern domestic horses. Furthermore, we map the population changes accompanying domestication from 273 ancient horse genomes. This reveals that modern domestic horses ultimately replaced almost all other local populations as they expanded rapidly across Eurasia from about 2000 bc, synchronously with equestrian material culture, including Sintashta spoke-wheeled chariots. We find that equestrianism involved strong selection for critical locomotor and behavioural adaptations at the GSDMC and ZFPM1 genes. Our results reject the commonly held association7 between horseback riding and the massive expansion of Yamnaya steppe pastoralists into Europe around 3000 bc8,9 driving the spread of Indo-European languages10. This contrasts with the scenario in Asia where Indo-Iranian languages, chariots and horses spread together, following the early second millennium bc Sintashta culture11,12.


The origins and spread of domestic horses from the Western Eurasian steppes
Domestication of horses fundamentally transformed long-range mobility and warfare 1 . However, modern domesticated breeds do not descend from the earliest domestic horse lineage associated with archaeological evidence of bridling, milking and corralling [2][3][4] at Botai, Central Asia around 3500 bc 3 . Other longstanding candidate regions for horse domestication, such as Iberia 5 and Anatolia 6 , have also recently been challenged. Thus, the genetic, geographic and temporal origins of modern domestic horses have remained unknown. Here we pinpoint the Western Eurasian steppes, especially the lower Volga-Don region, as the homeland of modern domestic horses. Furthermore, we map the population changes accompanying domestication from 273 ancient horse genomes. This reveals that modern domestic horses ultimately replaced almost all other local populations as they expanded rapidly across Eurasia from about 2000 bc, synchronously with equestrian material culture, including Sintashta spoke-wheeled chariots. We find that equestrianism involved strong selection for critical locomotor and behavioural adaptations at the GSDMC and ZFPM1 genes. Our results reject the commonly held association 7 between horseback riding and the massive expansion of Yamnaya steppe pastoralists into Europe around 3000 bc 8,9 driving the spread of Indo-European languages 10 . This contrasts with the scenario in Asia where Indo-Iranian languages, chariots and horses spread together, following the early second millennium bc Sintashta culture 11,12 . We gathered horse remains encompassing all suspected domestication centres, including Iberia, Anatolia and the steppes of Western Eurasia and Central Asia (Fig 1a). The sampling targeted previously under-represented time periods, with 201 radiocarbon dates spanning 44426 to 202 bc, and five beyond 50250 to 47950 bc (Supplementary Table 1).
The DNA quality enabled shotgun sequencing of 264 ancient genomes at 0.10× to 25.76× average coverage (239 genomes above 1× coverage), including 16 genomes for which further sequencing added to previously reported data. Enzymatic 13 and computational removal of post mortem DNA damage produced high-quality data with derived mutations decreasing with sample age, as expected if mutations accumulate through time (Extended Data Fig. 1). We added ten published modern genomes, and nine ancient genomes characterized with consistent technology or covering relevant time periods and locations, to obtain the most extensive high-quality genome time series for horses.

Pre-domestication population structure
Neighbour-joining phylogenomic inference revealed four geographically defined monophyletic groups (Fig 1b). These closely mirrored clusters identified using an extension of the Struct-f4 method 5 (Fig 1d-f, Extended Data Fig. 2, Supplementary Methods), except for the Neolithic Anatolia group (NEO-ANA), where the tree-to-data goodness of fit suggested phylogenetic misplacement (Fig 1c, Supplementary Methods).
The most basal cluster included Equus lenensis (ELEN), a lineage identified in northeastern Siberia from the Late Pleistocene to the late fourth millennium bc 5,14,15 . A second group covered Europe, including Late Pleistocene Romania, Belgium, France and Britain, and the region from Spain to Scandinavia and Hungary, Czechia and Poland during the sixth-to-third millennium bc. The third cluster comprised the earliest known domestic horses from Botai and Przewalski's horses, as previously reported 3 , and extended to the Altai and Southern Urals during the fifth-to-third millennium bc. Finally, modern domestic horses clustered within a group that became geographically widespread and prominent following about 2200 bc and during the second millennium bc (DOM2). This cluster appears genetically close to horses that lived in the Western Eurasia steppes (WE) but not further west than the Romanian lower Danube, south of the Carpathians, before and during the third millennium bc. Significant correlation between genetic and geographic distances, and inference of limited long-distance connectivity with estimated effective migration surface 16 (EEMS), confirmed the strong geographic differentiation of horse populations before about 3000 bc (Fig 2a, Extended Data Fig. 3a).
Horse ancestry profiles in Neolithic Anatolia and Eneolithic Central Asia, including at Botai, maximized a genetic component (coloured green in Fig. 1e, f) that was also substantial in Central and Eastern Europe during the Late Pleistocene (RONPC06_Rom_m34801) and the fourth or third millennium bc (Figs. 1e, 3a, Extended Data Fig. 4). It was, however, absent or moderately present in the Romanian lower Danube (ENEO-ROM), the Dnieper steppes (Ukr11_Ukr_m4185) and the western lower Volga-Don (C-PONT) populations during the sixth to third millennia bc. This indicates possible expansions of Anatolian horses into both Central and Eastern Europe and Central Asia regions, but not into the Western Eurasia steppes. The absence of typical NEO-ANA ancestry rules out expansion from Anatolia into Central Asia across the Caucasus mountains but supports connectivity south of the Caspian Sea prior to about 3500 bc.

The origins of DOM2 horses
The C-PONT group not only possessed moderate NEO-ANA ancestry, but also was the first region where the typical DOM2 ancestry component (coloured orange in Fig. 1e, f) became dominant during the sixth millennium bc. Multi-dimensional scaling further identified three horses from the western lower Volga-Don region as genetically closest to DOM2, associated with Steppe Maykop (Aygurskii), Yamnaya (Repin) and Poltavka (Sosnovka) contexts, dated to about 3500 to 2600 bc (Figs. 2a, b, 3a). Additionally, genetic continuity with DOM2 was rejected for all horses predating about 2200 bc, especially those from the NEO-ANA group (Supplementary Table 2), except for two late Yamnaya specimens from approximately 2900 to 2600 bc (Turganik (TURG)), located further east than the western lower Volga-Don region (Figs. 2a, b, 3a). These may therefore have provided some of the direct ancestors of DOM2 horses.
Modelling of the DOM2 population with qpADM 17 , rotating 18 all combinations of 2, 3 or 4 population donors, eliminated the possibility of a contribution from the NEO-ANA population, but indicated possible formation within the WE population, including a genetic contribution of approximately 95% from C-PONT and TURG horses (Supplementary Table 3). This was consistent with OrientAGraph 19 modelling from nine lineages representing key ancestry combinations, which confirmed the absence of NEO-ANA genetic ancestry in DOM2 and confirmed DOM2 as a sister population to the C-PONT horses (Fig. 3b).
Identifying discrete populations and modelling admixture as single unidirectional pulses, however, was highly challenging given the extent of spatial genetic connectivity. Indeed, the typical DOM2 ancestry component was maximized in the C-PONT group, but declined sharply eastwards (TURG and Central Asia) in the third millennium bc as the proportion of NEO-ANA ancestry increased (Fig. 2a) Article cline of genetic connectivity east of the Western Eurasia steppes and Central Asia, ruling out DOM2 ancestors further east than the western lower Volga-Don and Turganik. A similar genetic cline characterized the region located west of C-PONT, where the typical DOM2 ancestry component declined steadily in the Dnieper steppes, Poland, Turkish Thrace and Hungary in the fifth to third millennia bc. This eliminates the possibility of DOM2 ancestors further west than C-PONT and the Dnieper steppes. Furthermore, patterns of spatial autocorrelations in the genetic data 20 indicated Western Eurasia steppes as the most likely geographic location of DOM2 ancestors (Fig. 3c). Combined, our results demonstrate that DOM2 ancestors lived in the Western Eurasia steppes, especially the lower Volga-Don, but not in Anatolia, during the late fourth and early third millennia bc.

Expansion of steppe-related pastoralism
Analyses of ancient human genomes have revealed a massive expansion from the Western Eurasia steppes into Central and Eastern Europe during the third millennium bc, associated with the Yamnaya culture 8,9,11,12,21 . This expansion contributed at least two thirds of steppe-related ancestry to populations of the Corded Ware complex (CWC) around 2900 to 2300 bc 8 . The role of horses in this expansion remained unclear, as oxen could have pulled Yamnaya heavy, solid-wheeled wagons 7,22 . The genetic profile of horses from CWC contexts, however, almost completely lacked the ancestry maximized in DOM2 and Yamnaya horses (TURG and Repin) (Figs. 1e, f, 2a, b) and showed no direct connection with the WE group, including both C-PONT and TURG, in OrientAGraph modelling (Fig. 3b, Extended Data Fig. 5).
The typical DOM2 ancestry was also limited in pre-CWC horses from Denmark, Poland and Czechia, associated with the Funnel Beaker and early Pitted Ware cultures (FB/PWC, FB/POL and ENEO-CZE, respectively). DOM2 ancestry reached a maximum 12.5% in one Hungarian horse dated to the mid-third millennium bc and associated with the Somogyvár-Vinkovci Culture (CAR05_Hun_m2458). qpAdm 17 modelling indicated that its DOM2 ancestry was acquired following gene flow from southern Thrace (Kan22_Tur_m2386), but not from the Dnieper steppes (Ukr11_Ukr_m4185) (Supplementary Table 3). Combined with the lack of increased horse dispersal during the early third millennium bc (Fig. 2b, Extended Data Fig. 3b), these results suggest that DOM2 horses did not accompany the steppe pastoralist expansion north of the Carpathians.
By around 2200-2000 bc, the typical DOM2 ancestry profile appeared outside the Western Eurasia steppes in Bohemia (Holubice), the lower Danube (Gordinesti II) and central Anatolia (Acemhöyük), spreading across Eurasia shortly afterwards, eventually replacing all pre-existing lineages (Fig 2c, Extended Data Fig. 3c). Eurasia became characterized by high genetic connectivity, supporting massive horse dispersal by the late third millennium and early second millennium bc. This process involved stallions and mares, indicated by autosomal and X-chromosomal variation (Extended Data Fig. 3d), and was sustained by explosive demographics apparent in both mitochondrial and Y-chromosomal variation (Extended Data Fig. 3e, f). Altogether, our genomic data uncover a high turnover of the horse population in which past breeders produced large stocks of DOM2 horses to supply increasing demands for horse-based mobility from around 2200 bc.
Of note, the DOM2 genetic profile was ubiquitous among horses buried in Sintashta kurgans together with the earliest spoke-wheeled chariots around 2000-1800 bc 7,9,23,24 (Extended Data Fig. 6). A typical DOM2 profile was also found in Central Anatolia (AC9016_Tur_m1900), concurrent with two-wheeled vehicle iconography from about 1900 bc 25,26 . However, the rise of such profiles in Holubice, Gordinesti II and Acemhöyük before the earliest evidence for chariots supports horseback riding fuelling the initial dispersal of DOM2 horses outside their core region, in line with Mesopotamian iconography during the late third and early second millennia bc 27 . Therefore, a combination of chariots and equestrianism is likely to have spread the DOM2 diaspora in a range of social contexts from urban states to dispersed decentralized societies 28 .

DOM2 biological adaptations
Human-induced DOM2 dispersal conceivably involved selection of phenotypic characteristics linked to horseback riding and chariotry. We therefore screened our data for genetic variants that are over-represented in DOM2 horses from the late third millennium bc (Extended Data Fig. 7). The first outstanding locus peaked immediately upstream of the GSDMC gene, where sequence coverage dropped at two L1 transposable elements in all lineages except DOM2. The presence of additional exons in other mammals suggests that independent L1 insertions remodelled the DOM2 gene structure. In humans, GSDMC is a strong marker for chronic back pain 29 and lumbar spinal stenosis, a syndrome causing vertebral disk hardening and painful walking 30 .
The second most differentiated locus extended over approximately 16 Mb on chromosome 3, with the ZFPM1 gene being closest to the selection peak. ZFPM1 is essential for the development of dorsal raphe Nature | Vol 598 | 28 October 2021 | 637 serotonergic neurons involved in mood regulation 31 and aggressive behaviour 32 . ZFPM1 inactivation in mice causes anxiety disorders and contextual fear memory 31 . Combined, early selection at GSDMC and ZFPM1 suggests shifting use toward horses that were more docile, more resilient to stress and involved in new locomotor exercise, including endurance running, weight bearing and/or warfare.

Evolutionary history and origins of tarpan horses
Our analyses elucidate the geographic, temporal and biological origins of DOM2 horses. This study features a diverse ancient horse genome dataset, revealing the presence of deep mitochondrial and/ or Y-chromosomal haplotypes in non-DOM2 horses ( Supplementary  Fig 1). This suggests that yet-unsampled divergent populations contributed to forming several lineages excluding DOM2. This was especially true in the Iberian group (IBE), where the expected genetic distance to the donkey was reduced (Extended Data Fig. 5f), but also in NEO-ANA according to OrientAGraph modelling (Fig 3b). Disentangling exact divergence and ancestry contributions of such unsampled lineages is difficult with the currently available data. It can, however, be stressed that Iberia and Anatolia represent two well-known refugia 33 , where populations could have survived and mixed during Ice Ages.

Article
Finally, our analyses have solved the mysterious origins of the tarpan horse, which became extinct in the early 20th century. The tarpan horse came about following admixture between horses native to Europe (modelled as having 28.8-34.2% and 32.2-33.2% CWC ancestry in Ori-entAGraph 19 and qpAdm 17 , respectively) and horses closely related to DOM2. This is consistent with LOCATOR 20 predicting ancestors in western Ukraine (Fig 3c) and refutes previous hypotheses depicting tarpans as the wild ancestor or a feral version of DOM2, or a hybrid with Przewalski's horses 34 .

Discussion
This work resolves longstanding debates about the origins and spread of domestic horses. Whereas horses living in the Western Eurasia steppes in the late fourth and early third millennia bc were the ancestors of DOM2 horses, there is no evidence that they facilitated the expansion of the human genetic steppe ancestry into Europe 8,9 as previously hypothesized 7 . Instead of horse-mounted warfare, declining populations during the European late Neolithic 35 may thus have opened up an opportunity for a westward expansion of steppe pastoralists. Yamnaya horses at Repin and Turganik carried more DOM2 genetic affinity than presumably wild horses from hunter-gatherer sites of the sixth millennium bc (NEO-NCAS, from approximately 5500-5200 bc), which may suggest early horse management and herding practices. Regardless, Yamnaya pastoralism did not spread horses far outside their native range, similar to the Botai horse domestication, which remained a localized practice within a sedentary settlement system 2,36 . The globalization stage started later, when DOM2 horses dispersed outside their core region, first reaching Anatolia, the lower Danube, Bohemia and Central Asia by approximately 2200 to 2000 bc, then Western Europe and Mongolia soon afterwards, ultimately replacing all local populations by around 1500 to 1000 bc. This process first involved horseback riding, as spoke-wheeled chariots represent later technological innovations, emerging around 2000 to 1800 bc in the Trans-Ural Sintashta culture 7 . The weaponry, warriors and fortified settlements associated with this culture may have arisen in response to increased aridity and competition for critical grazing lands, intensifying territoriality and hierarchy 37 . This may have provided the basis for the conquests over the subsequent centuries that resulted in an almost complete human and horse genetic turnover in Central Asian steppes 11,21 . The expansion to the Carpathian basin 38 , and possibly Anatolia and the Levant, involved a different scenario in which specialized horse trainers and chariot builders spread with the horse trade and riding. In both cases, horses with reduced back pathologies and enhanced docility would have facilitated Bronze Age elite long-distance trade demands and become a highly valued commodity and status symbol, resulting in rapid diaspora. We, however, acknowledge substantial spatiotemporal variability and evidential bias towards elite activities, so we do not discount additional, harder to evidence, factors in equine dispersal.
Our results also have important implications for mechanisms underpinning two major language dispersals. The expansion of the Indo-European language family from the Western Eurasia steppes has traditionally been associated with mounted pastoralism, with the CWC serving as a major stepping stone in Europe 39-41 . However, while there is overwhelming lexical evidence for horse domestication, horse-drawn chariots and derived mythologies in the Indo-Iranian branch of the Indo-European family, the linguistic indications of horse-keeping practices at the deeper Proto-Indo-European level are in fact ambiguous 42 (Supplementary Discussion) . The limited presence of horses in CWC assemblages 43 and the local genetic makeup of CWC specimens reject scenarios in which horses were the primary driving force behind the initial spread of Indo-European languages in Europe 44 . By contrast, DOM2 dispersal in Asia during the early-to-mid second millennium bc was concurrent with the spread of chariotry and Indo-Iranian languages, whose earliest speakers are linked to populations that directly preceded the Sintashta culture 11,12,45 . We thus conclude that the new package of chariotry and improved breed of horses, including chestnut coat colouration documented both linguistically (Supplementary Discussion) and genetically (Extended Data Fig. 8

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-021-04018-9.  Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Radiocarbon dating
A total of 170 new radiocarbon dates were obtained in this study. Dating was carried out at the Keck Carbon Cycle AMS Laboratory, UC Irvine following collagen extraction and ultra-filtration from approximately 1 g of osseous material. IntCal20 calibration 48 was performed using OxCalOnline 49 .

Genome sequencing
All samples were collected with permission from the organizations holding the collections and documented through official authorization letters for partially destructive sampling from local authorities. Samples were processed for DNA extraction, library construction and shallow sequencing in the ancient DNA facilities of the Centre for Anthropobiology and Genomics of Toulouse (CAGT), France. The overall methodology followed the work from Seguin-Orlando and colleagues 50 . It involved: (1) powdering a total of 100-590 mg of osseous material using the Mixel Mill MM200 (Retsch) Micro-dismembrator; (2) extracting DNA following the procedure Y2 from Gamba and colleagues 51 , tailored to facilitate the recovery of even the shortest DNA fragments; (3) treating DNA extracts with the USER (NEB) enzymatic cocktail to eliminate a fraction of post mortem DNA damage 13 ; (4) constructing from double-stranded DNA templates DNA libraries in which two internal indexes are added during adapter ligation and one external index is added during PCR amplification; and (5) amplification, purification and quantification of DNA libraries before pooling 20-50 DNA libraries for low-depth sequencing on the Illumina MiniSeq instrument (paired-end mode, 2 × 80). All three indexes of each library were unique in a given sequencing pool.
Raw fastQ files were demultiplexed, trimmed and collapsed when individual read pairs showed significant overlap using AdapterRemoval2 52 (version 2.3.0), disregarding reads shorter than 25 bp. Processed reads were then aligned against the nuclear and mitochondrial horse reference genomes 53,54 , and appended with the Y-chromosome contigs from 55 using the Paleomix bam_pipeline (version 1.2.13.2) with the mapping parameters recommended by Poullet and Orlando 56 . Sequencing reads representing PCR duplicates or showing a mapping quality below 25 were disregarded. DNA fragmentation and nucleotide misincorporation patterns were assessed on the basis of 100,000 random mapped reads using mapDam-age2 57 (version 2.0.8). Paleomix returned provisional estimates of endogenous DNA content and clonality, as defined by the fraction of retained reads mapping uniquely against the horse reference genomes and those mapping at the same genomic coordinates, respectively. These numbers guided further experimental decisions, including (1) the sequencing effort to be performed per individual library; (2) the preparation of additional libraries from left-over aliquots of USER-treated DNA extracts, or following treatment of DNA extract aliquots with the USER enzymatic cocktail; and (3) the preparation of additional DNA extracts. After initial screening for library content, sequencing was carried out on the Illumina HiSeq4000 instruments from Genoscope (paired-end mode, 2 × 76; France Génomique), except for four samples (BPTDG1_Fra_m11800, Closeau3_Fra_m10400, Novoil1_Kaz_m1832 and Novoil2_Kaz_m1832), for which sequencing was done at Novogene Europe on an Illumina NovaSeq 6000 instrument (S4 lanes, paired-end mode, 2 × 150). Overall, we obtained sequence data for a total of 264 novel ancient horse specimens and 1,029 DNA libraries (980 new), summing up to 31.86 billion sequencing read pairs and 100.82 billion collapsed read pairs, which was sufficient to characterize 226 novel ancient genomes showing a genomic depth-of-coverage of at least 1× (median 2.80-fold, maximum 25.76-fold) (Supplementary Table 1).
Error rate estimates ranged between 0.000337 and 0.003966 errors per site and revealed that nucleotide C→T and G→A misincorporation rates were still inflated relative to their reciprocal substitution types (T→C and A→G), despite USER treatment. Therefore, individual BAM alignment files were processed to further reduce nucleotide misincorporation rates. To achieve this, we used PMDtools 61 (version 0.60) to bin apart reads likely containing post mortem DNA damage (--threshold 1; DAM) from those that did not (--upperthreshold 1; NODAM). NODAM-aligned reads were then directly trimmed by 5 bp at their ends, where individual base qualities generally drop. The base quality of aligned DAM reads was first rescaled using mapDam-age2 57 (version 2.0.8), penalizing all instances of potential derivatives of post mortem cytosine deamination, then further trimmed by 10 bp at both ends. The resulting NODAM and DAM aligned reads were merged again to obtain final BAM sequence alignments. Final error rate estimates ranged between 0.000080 and 0.000933 errors per site (Supplementary Table 1).

Uniparentally inherited markers and coat colouration
Mitochondrial genomes for the 264 newly sequenced samples were characterized from quality-filtered BAM alignment files (minMapQ=25, minQ=30), using a majority rule requiring at least five individual reads per position. Their resulting complete mitochondrial genome sequences were aligned together with a total of 193 sequences previously characterized 3,5,14,15,58,62,63 using mafft 64 (version 7.407). Sequence alignments were split into six partitions, following previous work 5 , including the control region, all tRNAs, both rRNAs and each codon position considered separately. Maximum-likelihood phylogenetic reconstruction was performed using RAxML 65 (version 8.2.11) with default parameters, and assessing node support from a total of 100 bootstrap pseudo-replicates. The same partitions were provided as input for BEAST 66 (version 2.5.1), together with calibrated radiocarbon years (Supplementary Table 1). Specimens lacking direct radiocarbon dates or identified as not belonging to the DOM2 cluster were disregarded (Supplementary Table 1). While the former ensured precise tip-calibration for molecular clock estimation (assuming uncorrelated log-normal relaxed model), the latter prevented misinterpreting spatial variation in the population structure as changes in the effective population size 67 . The best substitution model was selected from ModelGenerator 68 (version 0.85) and Bayesian Skyline plots 69 were retrieved following 1,000,000,000 generations, sampling 1 every 1,000 and disregarding the first 30% as burn-in. Convergence was visually checked in Tracer 70 (version 1.7.2).
The Y-chromosome maximum-likelihood tree was constructed calling individual haplotypes from trimmed and rescaled BAM sequence alignments against the contigs described by Felkel and colleagues 55 , filtered for single copy MSY regions. The final multifasta sequence alignment included sites covered in at least 20% of the specimens, pseudo-haploidizing each position and filtering out transitions, as done with autosomal data. It was further restricted to specimens showing at least 20% of the final set of positions covered. This represented a total of 3,195 nucleotide transversions for 142 specimens. The final tree was computed using IQtree (version 1.6.12), following AICc selection of the best substitution model and 1,000 ultrafast bootstrap approximation for assessing node support 71,72 . The Y-chromosome Bayesian Article skyline plot was obtained following the same procedure as above. Maximum-likelihood trees and Bayesian skyline plots are shown in Supplementary Fig 1 and Extended Data Fig. 3e, f, respectively. The presence of alleles associated with or causative for a diversity of coat colouration changes was investigated using individual BAM read alignments. For a total of 43 genomic locations representing biallelic SNPs, we simply counted the proportion of reads supporting the associated or causative allele. Results were summarized in the heat map shown in Extended Data Fig. 8, with respect to the sample ordering displayed in the neighbour-joining phylogenetic reconstruction, and limited to those 13 loci that were polymorphic in our horse panel for clarity.

Neighbour-joining phylogeny, genetic continuity and population modelling
Phylogenetic affinities were first estimated by performing a BioNJ tree reconstruction with FastME 73 (version 2.1.4), based on the pairwise matrix of genetic distances inferred from the bed2diffs_v1 program 16 . Node supports were assessed using a total of 100 bootstrap pseudo-replicates. The 'goodness-of-fit' of the neighbour-joining tree to the data was evaluated by comparing the patristic distances and raw pairwise distances. Patristic distances were obtained from the ape 74 R package (version 5.5) and their ratios to raw pairwise distances were averaged for each given individual (Fig 1c). Averaged ratios equal to one support perfect phylogenetic placement for the specimen considered.
Genetic continuity between each individual specimen predating about 2200 bc and DOM2 horses was tested following the methodology from Schraiber 75 , which implements a likelihood-ratio test to compare the statistical support for placing DOM2 and the ancient specimen in a direct line of ancestry or as two sister groups. This methodology relies on exact allele frequency estimates within DOM2 and read counts for putatively ancestral ancient samples. To exclude residual sequencing errors within DOM2 horses, we, thus, conditioned these analyses on variants segregating at least as doubletons in positions covered in at least 75% of the DOM2 samples. Linked variation was pruned using Plink 76 (version v.1.9), with the following parameters, --indep-pairwise 50 10 0.2, which provided a panel of about 1.4 million transversions. Allele frequencies were polarized considering the outgroup genome used for measuring error rates. Results from direct ancestry tests are summarized in Supplementary Table 2.
The complex genetic makeup of some individuals (CAR05_Hun_ m2458 and Kan22_Tur_m2386) and/or group of individuals (DOM2) was investigated using the f 4 -statistics-based ancestry decomposition approach implemented in qpAdm 17 (version 7.0), in which one particular (group of) individual(s) is modelled as a linear, additive combination of candidate population sources ('left' populations). We followed the rotating strategy recommended by Harney and colleagues 18 Table 3).
We selected a total of nine horse lineages representing the main phylogenetic clusters, and carrying genetic ancestry profiles representative of the complete dataset, to model the population evolutionary history using OrientAGraph 19 (version 1.0). By implementing a network orientation subroutine that enables throughout exploration of the graph space, OrientAGraph constitutes a marked advancement in the automated inference of admixture graphs. We considered scenarios from zero to five migration pulses (M = 0 to 5; Extended Data Fig. 5a-e), and the population model assuming M = 3 is represented in Fig 3b. This analysis was conditioned on sites covered at least in one specimen of each population group. This filter yielded a set of 7,936,493 fully orthologous nucleotide transversions.

Struct-f4, ancestry components and multi-dimensional scaling
We extended the Struct-f4 package so as to assess individual genetic affinities within a panel of genomes, and to decompose them into K genetic ancestries. Struct-f4, thus, achieves similar objectives to other clustering methods, such as ADMIXTURE 77 and Ohana 78 , but does not assume Hardy-Weinberg equilibrium. The latter assumption is known to cause misinterpretation of highly drifted samples as ancestral homogeneous groups instead of highly derived mixtures from multiple populations, as thoroughly described elsewhere 79 . To circumvent this, Struct-f4 relies on the calculation of the widely used f 4 statistics, which were originally devised not only to test for admixture, but also to quantify the drift between the internal nodes of a population tree. The latter provides a direct representation of the true ancestral populations. Overall, Struct-f4 thus implements a more natural and robust (model-free) approach than other clustering alternatives.
Struct-f4 is based on a mixture model that parametrizes the drift that occurred between a given number of K pre-defined ancestral populations, and the mixing coefficient of each individual. Model parameters are estimated using an adaptive Metropolis-Hastings Markov chain Monte Carlo integration, identifying optimal numerical solutions for parameters by means of likelihood maximization. Struct-f4 was validated following extensive coalescent simulations with fastsimcoal2 80 (version 2.6.0.3). An example of such simulation designed to mimic the complex horse evolutionary history is provided in Extended Data Fig. 2, based on mutation and recombination rates of 2.3 × 10 −8 and 10 −8 events per generation and bp, respectively. Struct-f4 is implemented in Rcpp and only takes the full set of f 4 -statistics as input to automatically return individual ancestry coefficients, without requiring pre-defined, ad-hoc sets of reference and test populations.
Multi-dimensional scaling was carried out based on the co-ancestry semi-matrix summarizing the drift measured between each pair of individuals, as returned by Struct-f4, removing the domestic donkey outgroup prior to using the cmdscale R function.

Isolation by distance and spatial connectivity
Spatial barriers to gene flow prior to about 3000 bc, between about 3000 and 2000 bc and following about 2000 bc were run using EEMS 16 (built with Eigen version 3.2.2 and Boost version 1.57, and using rEEM-Splots version 0.0.0.9000) for 50 million iterations and considering a burn-in of 15 million iterations. Convergence was ensured from visual inspection of likelihood trajectories as well as by the strong correlation obtained between the observed and fitted genetic dissimilarities. Pie-charts depicting the ancestry proportions inferred by Struct-f4 were overlaid on the migration surfaces to facilitate tracking the geographic position of each excavation site, averaging ancestry proportions or using individual ancestry profiles if only one sample was characterized genetically at that location. Spatial pie-chart projection was carried out using the draw.pie R function from the mapplots package 81 (version 1.5.1). The size of each individual pie-chart was commensurate with the number of samples excavated at a given geographic location, provided that the number of samples was lower than 10, while set to a constant maximum radius otherwise.
Partial Mantel tests measuring the correlation between geographic and genomic distances over time were carried out using the ncf R package 82 (version 1.2.9). This test corrected for the time variation present within each window, similar to the approach described by Loog and colleagues 83 . Haversine geographic distances between pairs of ancient samples were computed using the geosphere package (version 1.5.10) in R 84 , from the corresponding longitude and latitude coordinates, while radiocarbon date ages were considered as point estimates (Supplementary Table 1). The matrix of pairwise genetic distances was obtained from the bed2diffs_v1 program provided together with the EEMS software 16 . The analysis was carried out for autosomes and the X chromosome separately, so as to investigate possible sex-bias in horse dispersal. Confidence intervals were calculated by sampling with replacement individuals within each time window.
Sliding time windows (step size = 250 years) were broadened forward in time until including at least ten specimens covering two-thirds of the total geographic area sampled in this study. The area delimited by a set or subset of GPS coordinates was calculated using the GeoRange R package 85 (version 0.1.0) and the age of the window was set to the average age amongst the samples included. Additionally, pairwise distances involving samples located less than 500 km away and separated by less than 500 years were masked in the corresponding matrices to estimate the patterns of isolation by distance between demes, instead of within demes. This whole scheme was designed to prevent regional effects, caused by the over-representation of particular regions in specific time intervals.
The LOCATOR 20 program (version 1.2) was run using a geolocated reference panel consisting of all non-DOM2 horses (n = 136), except the tarpan and the four Przewalski's horses present in our dataset, and considering nucleotide transversions covered at least in 75% of the samples, for a total of 3,194,008 SNPs. The geographic origin of each DOM2 horse was then estimated from the geographic structure defined by the populations present in the reference panel. Default parameters were used, except that the width of each neural layer was 512 (instead of 256). The best run was selected as the one showing the lowest validation error from a total of 50 independent runs. The analysis was repeated for the tarpan as well as the four Przewalski's horses present in our dataset.

Selection scans
To pinpoint genetic changes potentially underlying biological adaptation within DOM2 horses, we contrasted the frequency of each nucleotide transversion in our dataset (n = 10,205,277) in DOM2 (n = 141) and non-DOM2 horses (n = 142). The extensive number of samples represented provided unprecedented resolution into patterns of allele frequency differentiation, and encompassed the largest diversity of non-DOM2 horses characterized to date. Weir and Cockerham F ST index values between both groups were calculated using Plink 76 (version 1.9) and visualized using the GViz R package 86 (version 1.36.2), together with external genomic tracks provided by the gene models annotated for EquCab3 (Ensembl v0.102) and the interrupted repeats precomputed for the same assembly and stored in the UCSC browser.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability
All collapsed and paired-end sequence data for samples sequenced in this study are available in compressed fastq format through the European Nucleotide Archive under accession number PRJEB44430, together with rescaled and trimmed bam sequence alignments against both the nuclear and mitochondrial horse reference genomes. Previously published ancient data used in this study are available under accession numbers PRJEB7537, PRJEB10098, PRJEB10854, PRJEB22390 and PRJEB31613, and detailed in Supplementary Table 1. The genomes of ten modern horses, publicly available, were also accessed as indicated in their corresponding original publications 59,63,[87][88][89] .

Corresponding author(s): Ludovic ORLANDO
Last updated by author(s): Aug 12, 2021 Reporting Summary Nature Portfolio wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Data analysis
The Struct-f4 software is available without restriction on Bitbucket at https://bitbucket.org/plibradosanz/structf4/src/master/, together with a companion manual providing installation and running instructions. All other analyses relied on available software, fully referenced in the manuscript, including: -AdapterRemoval2 (version 2. For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy All studies must disclose on these points even when the disclosure is negative.

Study description
We have sequenced 264 ancient horse genomes, including from regions and/or time periods that remained uncharacterized at the genetic level. Additionally, we complemented the data previously generated for nine ancient horses and included a total of ten modern horse genomes, selected to represent a whole diversity range of breeds/populations. We applied procedures aimed at minimizing the impact of post-mortem DNA damage on sequence quality and identified a total of 10M+ high-quality polymorphic sites in our data set. For sites of critical importance, ancient genome variation was characterized from more than a single genome, providing individual replicates of the signatures identified. Metadata considered in the analyses included the GPS coordinates of the excavation sites and the age of the samples analyzed, most often assessed from radiocarbon dating. The data set gathered helped solved long-standing controversies about horse domestication.

Research sample
Ancient horse remains (Equus ferus caballus) were collected and screened by sequencing following DNA extraction and nextgeneration DNA library preparation in state-of-the-art ancient DNA facilities. Shallow DNA sequencing helped measure DNA preservation levels so as to identify those specimens for which whole genome sequences could be characterized by means of shotgun DNA sequencing. We mainly focused on the time period spanning the first to fourth millennium BCE (Before Common Era), as it encompassed the whole time frame of horse domestication. We, however, also included older samples so as to characterize the pre-domestication population structure. Finally, we also sequenced the genome of one historical 'Tarpan' specimen, given the contentious status of this population regarding horse domestication. Archaeological sites, individual specimens as well as their respective radiocarbon dates and sex assignments (as inferred from DNA data) are provided in Supplementary Table 1.
Supplementary Methods provide additional information on each archaeological site.

Sampling strategy
Sampling was conditioned by the availability of ancient remains. In order to cover the whole temporal and geographic range relevant for horse domestication, we have gathered together an extensive team of archaeologists and curators in charge of material collection from across Eurasia and North-Africa, and with full authority to undertake research-based activity on such material. The resulting data represents a large collection of 264 ancient genomes. Combined with a selection of genomes previously published, both modern and ancient, they provided adequate data and statistical power for statistical testing. The robustness of our analyses was assessed through a range of appropriate statistical methods, including bootstrapping, Maximum Likelihood, replicates and formal statistical tests. Sampling procedures were aimed at minimizing destruction, and were focused, whenever possible, on osseous remains such as petrosal bones, that are generally associated with better average DNA preservation.

Data collection
A majority of the remains consisted of loose petrosal bones and teeth that lost connection with their original skulls. These were either directly shipped to Ludovic Orlando by the archaeologists and/or curators in charge, or delivered in person to him (as he visited his collaborators, or when his collaborators visited his laboratory). Sampling for DNA extraction and radiocarbon dating were performed in the ancient DNA facilities of the CAGT laboratory using appropriate drilling instruments under flow-hoods and in an environment with filtered, positive air pressure. Routine procedures at CAGT are aimed at minimizing both destruction and contamination.
Timing and spatial scale All archaeological remains investigated in this study are described in Supplementary Table 1 and have been collected, mostly from 2017 (although the earliest was collected in 1995). A study that was then ongoing in our laboratory suggested that horse husbandry at Botai did not give rise to modern domestic horses, which implied that another domestication centre was yet to be found. As the Botai culture is located in the second half of the third millennium BCE of central Asia, we mainly decided to focus our attention on the following 1000 years, but also extended sampling to other regions, including those previously described as potential domestication centres. Both due to the complexity of the horse population genetic structure at the time, and the relatively limited abundance of horse archaeological remains during the third millennium BCE, we extended sampling to prior the third millennium in order to increase resolution and gain insights on those lineages pre-dating domestication. Finally, the discovery that a massive population turnover took place from the late third to early second millennium BCE led us characterize the second millennium BCE more extensively at the genetic level. Overall, over 2,000 archaeological remains have been collected since 2017. They have been screened for DNA and subjected to shotgun DNA sequencing whenever showing sufficient DNA quality, following a routine data production pipeline at CAGT. Decisions on which locations and time periods should be given priority or added to our data set were made as the project made progress following preliminary analyses of the data collected, and re-assessed on approximately a monthly basis. The genetic data analyzed in this study correspond to a main data freeze that was done in July 2020, and supplemented with additional data from November 2020.

Data exclusions
No data generated in this study were excluded from our analyses. Following standard procedures in ancient DNA research, transition substitutions were masked from most analyses in order to limit the noise added by post-mortem DNA damage, largely inflating DNA sequencing rates. We limited our analyses to only a fraction of horse genomes previously reported in order to (1) avoid technical batch effects that may have structured the data due to the different DNA library construction and sequencing technologies used, (2) reduce computational burden and (3) only those time and/or geographic regions that were relevant for the present study.