Abstract
Horses revolutionized human history with fast mobility1. However, the timeline between their domestication and their widespread integration as a means of transport remains contentious2,3,4. Here we assemble a collection of 475 ancient horse genomes to assess the period when these animals were first reshaped by human agency in Eurasia. We find that reproductive control of the modern domestic lineage emerged around 2200 bce, through close-kin mating and shortened generation times. Reproductive control emerged following a severe domestication bottleneck starting no earlier than approximately 2700 bce, and coincided with a sudden expansion across Eurasia that ultimately resulted in the replacement of nearly every local horse lineage. This expansion marked the rise of widespread horse-based mobility in human history, which refutes the commonly held narrative of large horse herds accompanying the massive migration of steppe peoples across Europe around 3000 bce and earlier3,5. Finally, we detect significantly shortened generation times at Botai around 3500 bce, a settlement from central Asia associated with corrals and a subsistence economy centred on horses6,7. This supports local horse husbandry before the rise of modern domestic bloodlines.
Similar content being viewed by others
Main
The genetic make-up of modern domestic horses (hereafter, DOM2) emerged in the western Eurasian steppes during the third millennium bce2. The spread of DOM2 horses, alongside the development of Sintashta spoke-wheeled chariots in Asia (around 2200–1800 bce) and the apparently limited DOM2 genetic influence in Europe before that time, has indicated that long-distance horse-based mobility developed no earlier than the late third millennium bce. This chronology implies that the spread of steppe-related ancestry that reshaped the human genetic landscape of nearly all regions of central and western Europe over the course of the third millennium bce8,9 was not driven by DOM2 horseback riding.
However, recent population models have claimed significant DOM2 genetic ancestry into European horses affiliated with the Corded Ware complex (CWC), a culture that developed from roughly 3000 bce against the backdrop of the Yamnaya steppe migration4. Bone pathologies potentially resulting from regular horseback riding also occur in about 5% of the human skeletons from the Carpathian Basin, mainly in steppe-related8 Yamnaya individuals, but also in pre-Yamnaya people, up to the fifth millennium bce5. Moreover, horse-related terminology commonly shared across Indo-European languages is often considered indicative of established equestrianism in the steppes, among Yamnaya-related proto-Indo-European speakers3. These findings have revived theories associating horseback riding with the Yamnaya expansion3, and possibly with earlier human steppe migrations into the Carpathian Basin after about 4500 bce10.
Whether or not rapid mobility was the only incentive for horse domestication is also a matter of controversy. Equine milk peptides were reported in Yamnaya human dental calculus from around 3300–2600 bce11, but further work has shown that western steppe pastoral practices shifted from sheep and cattle dairying to horse milking no earlier than around 1000 bce12. Archaeological evidence for pre-Yamnaya horse milking and harnessing6,7 exists further east in central Asia, in the 5,500-year-old Botai culture, which developed a subsistence economy almost entirely focused on horses13. At this site, evidence for horse milk consumption is supported by residue analysis of fatty acids absorbed into pottery shards (n = 5), but this is not corroborated by the palaeoproteomic analysis of human dental calculus (n = 2)6,11,14.
Furthermore, the unusual pattern of dental attrition on Botai horse teeth was initially identified as bit wear15, but this interpretation has since been challenged16. Unchanged sex ratios in pre Botai and Botai bone assemblages have also advocated against the emergence of new horse management practices at Botai17,18. Considering that DOM2 and Botai horses originate from two genetically distinct lineages7, new evidence is needed to assess the exact part played by horses in Botai society, and, more generally, how domestic horses contributed to the steppe migrations and the possibly concurrent spread of Indo-European languages (although see ref. 19).
Datasets and experimental design
To address the context in which horse husbandry developed in the fourth and third millennia bce, we analysed 475 ancient horse genomes (Fig. 1a), combined with 77 publicly available modern horse genomes, including 40 worldwide domestic breeds and 6 endangered Przewalski’s horses (Supplementary Table 1 and Extended Data Figs. 1 and 2). The 124 newly generated genomes show a median coverage of 1.40-fold (minimum 0.29; maximum 10.92) and span Eurasian archaeological contexts dating to more than 50,000 years ago, including in the Carpathian Basin, where bioanthropological evidence for horseback riding was reported5,20. Together with 401 radiocarbon dates, 140 of which are new, our dataset provides an unprecedented genome time series spanning the whole domestication process.
In this study, we investigate three possible markers of horse husbandry. First, we examine changes in the genomic make-up of horses across central and eastern Europe to test whether they accompanied the humans who moved from the steppe. Second, we reconstruct horse demographic trajectories to evaluate the existence, timing and severity of domestication bottlenecks. This shows when horses were bred in significant numbers to sustain large-scale mobility. Third, we track evidence for controlled reproduction of horses, in the form of close-kin mating and accelerated generation times.
Spread of DOM2 horses across Europe
Assuming that steppe humans and horses moved together implies parallel shifts of genetic ancestry in both species. Such concurrent shifts were supported by the population graphs presented by Maier et al.4, who identified horses excavated from a CWC context in Germany with roughly 20% DOM2 ancestry, somehow mirroring the approximately 70% Yamnaya-related steppe ancestry observed in humans8. However, Locator21 analyses predict that the geographic origin of CWC horses is exclusively within central Europe (Extended Data Fig. 3c,d). We also identify population graphs fitting published data significantly better than those previously proposed2,4 (P < 10−5; Extended Data Fig. 3b), and refining our understanding of the connectivity between the steppes and the rest of Europe by including four extra population groups (Extended Data Fig. 4). No such graphs support DOM2 genetic contribution to CWC horses (Extended Data Figs. 3a,b and 4), with the most comprehensive placing CWC horses close to pre-Yamnaya populations from central Europe (ENEOCZE, around 3364–3102 bce, and NEOPOL, around 5210–5006 bce). That a central European horse lineage remained isolated from the steppe is also supported by adjacent positioning in multidimension scaling analysis (Extended Data Fig. 5), distinctive ancestry profiles sharing the main genetic component of CWC horses (Fig. 1b,c and Extended Data Fig. 6) and qpAdm modelling (Supplementary Table 2). qpAdm models including two population sources depict CWC horses as a mixture between ENEOCZE (32.4%) and northern European horses (FBPWC, around 3050–2950 bce; 67.6%), whereas allowing for a third source returns negligible steppe contribution (less than or equal to 1.7%). Combined, these analyses uncover a distinct cline of genetic ancestry peaking in CWC horses and declining both westwards (LPNFR, around 13969–12090 bce) and eastwards across central Europe (ENEOCZE and NEOPOL), the Carpathian and Transylvanian Basins (HUNG, around 3364–1971 bce, and ENEOROM, around 4494–3658 bce) and Anatolia (NEOANA, around 6396–4456 bce) (Fig. 1b,c).
A substantial proportion of the CWC-related ancestry survives in wild European horses called ‘tarpans’ (about 45.1%) until roughly 1868 ce in our dataset (and possibly later in the last surviving captive or free-ranging tarpans22), but is at best residual in the genetic make-up of modern domestic horses (Fig. 1b). In fact, it vanishes with the expansion of the typical DOM2 ancestry profile outside the steppe (Fig. 1c). Our extended time-stamped panel of ancient genomes from the Carpathian Basin provided increased temporal resolution regarding the arrival of DOM2 horses and the replacement of the local lineage found there (HUNG). This is pivotal for clarifying the role of horses in human migrations from the steppe. The date for the first typical DOM2 horse in the Carpathian Basin is approximately 1822 bce (1895–1749 bce), whereas that for the last horse with a typical local HUNG genetic profile is around 2033 bce (2120–1945 bce). Considering individual archaeological sites, rather than the whole region, indicates similar chronologies (at Budapest-Királyok Útja: about 1822 bce (1895–1749 bce) versus about 2211 bce (2284–2138 bce); at Százhalombatta-Földvár: about 1822 bce (1893–1751 bce) versus about 2033 bce (2120–1945 bce)) (Supplementary Table 1). Combined, these findings narrow down the time for the genomic turnover accompanying the arrival of DOM2 horses in the Carpathian Basin to roughly 2033–1945 bce. This timeline is consistent with the first evidence of DOM2 horses outside the steppe, reported by Librado et al.2, in Moldavia around 2063 bce (2140–1985 bce), Anatolia around 2125 bce (2205–2044 bce) and Czechia around 2037 bce (2137–1936 bce), post-dating the arrival of human steppe-related ancestry in the respective regions by at least 600 years10,23. Yamnaya-related steppe migrations and the spread of DOM2 horses are, thus, chronologically incompatible.
However, humans may have migrated from the steppe using horses other than DOM2. To investigate this, we mapped the genetic ancestry identified by Struct-f4 (ref. 24) as characteristic of horse populations living across the steppe before the expansion of DOM2 (CPONT, TURG and NEONCAS; roughly 5616–2636 bce; Fig. 1b). Around 17.2% of this ancestry was present in the Carpathian Basin during the fourth and third millennia bce (around 3364–1971 bce). However, we find it also in Austria about 3300 bce (28.9%, KT46), and in the Transylvanian Basin about 4200 bce (54.5%, ENEOROM), at the Pietrele site where the genomic make-up of human populations suggests no steppe contact10. In fact, the steppe-related genetic ancestry is found in even earlier horse populations spanning a broad geographic range, including Poland (NEOPOL, around 5210–5006 bce), Anatolia (NEOANA, around 6396–4456 bce) and Iberia (IBE, around 5299–1900 bce), and as far back in time as in the Upper Palaeolithic of France (LPNFR, around 13969–12090 bce; LPSFR, around 21909–14646 bce). This is consistent with the best-fitting population graph showing ENEOROM horses receiving steppe genetic material from an ancestor that also contributed to LPSFR populations (Extended Data Fig. 4). Therefore, the spread of steppe-related horse genetic ancestry into Europe must predate about 14646 bce, which is considerably earlier than any claimed evidence for horse husbandry3, and, thus, occurred through natural contacts between wild populations, most probably dispersing in the aftermath of the Last Glacial Maximum (roughly 24000–17500 bce)25. Combined, the genomic make-up of ancient European horses does not endorse widespread horse-driven mobility before the end of the third millennium bce. It thus dismisses any substantial involvement of horses in the Yamnaya-related or earlier human migrations from the steppe.
DOM2 demographic history
To time precisely the rise of widespread horse-based mobility, we next estimated the period when DOM2 horses were bred in sufficiently large numbers to sustain their global spread. Specifically, we tracked changes in the DOM2 effective population size (Ne) during the 200 generations preceding about 1864 bce, which is the average date of the earliest 24 DOM2 horses in our dataset with sufficient sequence data (Fig. 2a). Crucially, linkage disequilibrium-based demographic reconstructions26 indicate a sharp demographic burst of about 13.7-fold increase within the 30 generations preceding that period. Matching those 30 generations with the Yamnaya-related steppe expansion, which had already reached central Europe by about 2750 bce at the latest8, would require unrealistic average generation times of roughly 27 years, largely exceeding horse life expectancy under modern intensive veterinarian care27,28. Assuming instead the commonly accepted generation time of 8 (7–12) years29,30,31,32 leads to about 2190 (2310–2160) bce for the rise of widespread horse-based mobility. Restricting analyses to horses from Sintashta contexts, which are associated with the spread of spoke-wheeled chariots in Asia, returns similar demographic profiles and time estimates (about 2100 bce (2200–2075 bce); Extended Data Fig. 7a). These timelines coincide not only with the radiocarbon dating of the earliest DOM2 horses outside the steppe, but also with the earliest horse images in Akkadian art33,34, and with major evidence of conflicts, crises and political disruption, from the Balkans to Egypt and the Indus valley35,36.
Our demographic reconstructions also provide evidence for a strong domestication bottleneck in horses during the 75 generations preceding the DOM2 expansion (Fig. 2a). The interval associated with minimal effective sizes (Ne ≈ 500 diploid individuals) starts about 2664 (3064–2564) bce. Therefore, the time when steppe people migrated did not coincide with expanding, but rather plummeting, availability of DOM2 reproductive horses, which aligns with horses not driving Yamnaya-related steppe migrations. Interestingly, the first evidence for horses carrying long runs of homozygosity (ROHs) only (greater than or equal to 15 cM), which is indicative of close-kin mating, is found in some of the earliest DOM2 sequenced (Fig. 2c), including in the steppes of central Asia and Anatolia. This indicates that the reproductive control underlying early DOM2 spread involved some levels of inbreeding, which is avoided in the wild, but is a common practice when breeding animals for desirable traits37.
DOM2 generation time contracted 2200 bce
In addition to the practice of close-kin mating, early DOM2 breeders may have aimed to produce more animals every year to meet the explosive demand for horses in the late third millennium bce. To test whether breeders used younger animals for reproduction, we developed two complementary proxies measuring generation times from single pseudo-haploid time-stamped genomes. The first quantifies the number of generations required for a genome to accumulate an observed number of mutations post divergence from outgroup(s) (mutation clock; Supplementary Methods and Extended Data Fig. 8a). The second leverages recombination patterns to estimate the number of generations elapsed since the most recent common ancestor (MRCA) of the sampled specimens (recombination clock; Supplementary Methods and Extended Data Fig. 9a,b). We validate the performance of our methodology through coalescent simulations across various inbreeding levels and demographic trajectories (Extended Data Fig. 10), and apply it to all of our radiocarbon-dated horse genomes to estimate roughly 7.4 years as the average time between two consecutive generations in the past 15,000 years (Fig. 3b and Supplementary Information).
Our analyses also show that horse generation times did not remain constant, but accelerated around 1.8-fold (approximately 4.1 years) during the past approximately 200 years, as could be expected given the development of modern breeding practices, optimized for animal production (Fig. 3a). Racing Quarter Horses and Thoroughbreds exemplify breeds with the least accelerated generation time, possibly due to the extended reproductive lifespan imposed on sport champions (Fig. 3a). No equivalent changes were detected backwards in time until about 2200–2100 bce, which coincides with a roughly 2.1-fold acceleration of the generation time, relative to the average of about 7.4 years (to about 3.5 years; Fig. 3b). This acceleration did not affect any of the DOM2 relatives, including those with individuals affiliated with Yamnaya, Turganik and Steppe Maykop contexts (CPONT and TURG; Fig. 3 and Extended Data Fig. 7c), or the older horses living in the steppe (NEONCAS) or in the Carpathian and Transylvanian Basins (HUNG and ENEOROM; Extended Data Fig. 7c). This shows that new practices of DOM2 reproductive control, aimed at faster productivity, emerged by the late third millennium bce, and were a prerequisite to early DOM2 breeding and adoption of widespread horse-based mobility.
New evidence of horse husbandry at Botai
Earlier research established minimal connectivity between horse populations during the fourth millennium bce2. As this encompasses the timeline of the Botai settlement (around 3500 bce), where controversial evidence for horse domestication was found, the incentive for domestication at Botai, if any, could not be long-distance horseback riding. In the 36 horses from the Botai site analysed, we found no evidence for close-kin mating, but we did find shortened generation times, an acceleration comparable in magnitude to that accompanying DOM2 breeding (Fig. 3). This trend is specific to the Botai and to a group descending directly from the Botai (Borly4, around 3000 bce; Fig. 3 and Extended Data Fig. 7d)7, and remains unprecedented in scale throughout the Ice Age to the Eneolithic. Notably, the Botai horse population experienced a 2.4-fold demographic expansion starting roughly 80 generations before settlement (Fig. 2b), that is, about 4140 (4460–4060) bce, assuming average generation times of 8 (7–12) years. This largely concurs with paleoclimatic data suggesting more humid conditions, and pollen records indicating no forest encroachment on the steppes38. These favourable conditions for horses may have encouraged humans to settle and develop a subsistence economy almost entirely focused on horses39, suggested to have been initially established through hunting40. However, our demographic reconstructions indicate that this once thriving resource progressively declined during the last 20 generations of Botai (that is, in 140–240 years; Fig. 2b). In response to declining food resources, Botai peoples may have exercised husbandry practices involving corralling and horse reproductive control through shortened generation times, in line with the prey domestication pathway6,41.
Discussion
This study tackles crucial debates regarding horse domestication, with major implications for both horse and human history. It shows that the horse genomic make-up remained entirely local in central Europe and in the Carpathian and Transylvanian Basins until the end of the third millennium bce. This timeline post-dates the period of steppe contact in the Carpathian and Transylvanian Basins starting around 4500 bce10, as well as the migrations potentially spreading proto-Indo-European languages into Europe with the Yamnaya phenomenon about 3000 bce. The pronounced spread of DOM2 horses immediately followed the foundation of this new bloodline, and marked a new era of widespread horse-based mobility from about 2200 bce, ushering in a monumental increase in connectivity and trade. It mirrors the archaeological record, which witnesses a massive spread of horses in the Near East and Asia during the transition between the third and second millennium bce2,42,43. Intensified herding practices12, growing aridity (the ‘4.2 ka BP aridification event’44) and/or increased exploitation of the steppe may have heightened the demand for expanding grazing areas, potentially facilitated by horse-mediated mobility. Domestic horses and spoke-wheeled chariots3,42 may also have aided the conquest and defence of larger geographic areas in the face of uprising violence and social conflicts35,36.
Our work does not reject the possibility of equestrianism developing in the Pontic steppe or the Carpathian Basin before 2200 bce. However, in such a scenario, the associated breeding practices would not have involved close-kin mating or accelerated generation times. The phenomenon would also have remained confined in scale, both demographically and geographically, excluding long-distance fast mobility as the primary domestication incentive. Our research strengthens the case for recognizing Botai as one such location in the central Asian steppe where horse husbandry developed before large-scale horse-based mobility. There, the domestication process did not aim at global production, but remained regional. It is aligned with the expectations of the prey pathway41, in which a settled group of humans developed husbandry through corralling and reproductive control, in the form of shortened generation times, but not close-kin mating, to ensure access to an otherwise depleting meat resource13.
Manipulating the animal life cycle by forcing earlier reproduction offers breeders enhanced productivity, especially for species with long gestational periods and/or small litter sizes. Our research demonstrates that this practice was integral to the array of breeding techniques developed to sustain the massive global demand for horses from the Early Bronze Age. The pressure for accelerated production relaxed quickly after around 1000 bce, as a large enough horse breeding pool became available across extensive geographic areas. However, the development of modern breeds required the fast production of specific bloodlines from limited foundational stocks, which again shortened the horse generation time over the past few centuries. Apparently, this process affected Asian breeds more than racehorses (Fig. 3a), especially Thoroughbreds, for which artificial insemination is forbidden. These findings align with stud book pedigrees recording increasingly faster generation times during the past three centuries, especially in coldblood horses45.
Our methodological framework for measuring generation times expands the bioarchaeological toolkit to detect molecular evidence of reproductive control. Together with close-kin mating, it may prove instrumental in clarifying the timing and context(s) into which human groups first developed animal husbandry, not only in horses, especially as early domestication processes may not always leave obvious skeletal modifications and marked foundational bottlenecks. Beyond domestic animals, our approach could be applied to measure the long-term generation times of ancient hominin groups, including Neanderthals and Denisovans, and their potential shifts in the face of major lifestyle transitions, such as following the out-of-Africa dispersal, during the Ice Age46 and during the Neolithic revolution47,48. For now, our analyses suggest that the last Ice Age may have affected horse generation times, although to a lesser extent than domestication (Fig. 3). Our work opens the way for a new line of research investigating the possible consequences of past and present environmental and epidemiological crises on the reproduction of both human groups and other species.
Methods
Archaeological samples and radiocarbon dating
We have gathered an extensive collection of 475 ancient horse remains spread across 230 sites in 41 countries. Sampling of archaeological horse remains was undertaken in collaboration with co-authors responsible for the curation and description of underlying contexts, and with the approval of the relevant institutions responsible for the archaeological remains, as detailed in the Reporting Summary. A total of 105 of the 124 newly sequenced specimens originate from archaeological sites for which no ancient horse genomes were characterized previously. Their underlying archaeological contexts are described in the Supplementary Information. A total of 140 new radiocarbon dates were obtained in this study, at the Keck Carbon Cycle Accelerator Mass Spectrometer Laboratory, University of California, Irvine (Supplementary Table 1). Collagen was extracted and ultra-filtered following mechanical cleaning of about 200 mg of cortical bone. Radiocarbon dates were calibrated using OxCalOnline49 and the IntCal20 calibration curve50. Samples were named with reference to their original internal label, followed by a three-letter country code and their associated age in calendar years bce or ce, all separated by underscore signs and appending the age with the ‘m’ prefix if bce (for example, KT46_Aus_m3240 refers to sample KT46, originating from the Kittsee site from Austria, which showed a midpoint radiocarbon date of 3240 bce).
Genome sequencing
Osseous samples were processed for DNA extraction, library construction and shallow sequencing in the ancient DNA facilities of the Centre for Anthropobiology and Genomics of Toulouse (Centre national de la recherche scientifique (CNRS) and University Paul Sabatier), France. The overall methodology followed the work from ref. 2, including: (1) powdering with the Mixel Mill MM200 (Retsch) Micro-dismembrator; (2) DNA extraction according to the procedure Y2 from Gamba et al.51; (3) USER (NEB) enzymatic treatment30; (4) DNA library construction from double-stranded DNA templates DNA libraries in which two internal indexes are added during adaptor ligation and one external index is added during polymerase chain reaction (PCR) amplification; and (5) PCR amplification, purification and quantification on the TapeStation 4200 (D1000 HS) instrument before pooling for Illumina DNA sequencing on MiniSeq, NovaSeq and/or HiSeq4000 instruments (paired-end mode). Sequencing pools were prepared to represent each of the three individual indexes only once.
FASTQ sequencing reads demultiplexing, trimming and collapsing was carried out using AdapterRemoval2 (v.2.3.0)52 disregarding reads shorter than 25 bp. The resulting collapsed and uncollapsed read pairs were processed through the Paleomix bam_pipeline (v.1.2.13.2)53 for Bowtie2 (ref. 54) alignment against the nuclear and mitochondrial horse reference genomes55,56, appended with the 751 Y-chromosome contigs from ref. 45, using the parameters recommended in ref. 57, removing PCR duplicates and requiring minimal mapping quality scores of 25. The presence of DNA fragmentation and nucleotide misincorporation patterns indicative of post-mortem DNA damage was assessed on the basis of 100,000 random mapped reads using mapDamage2 (v.2.0.8)58. Overall, we obtained sequence data from 390 DNA libraries for a total of 124 ancient horse specimens, resulting in genome characterization at an average depth of coverage of 0.288-to-10.925-fold (median 1.40-fold; Supplementary Table 1), as estimated using Paleomix coverage (--ignore-readgroups). The sequence data from 352 ancient and 81 modern genomes were processed following the same procedures to provide a comparative genome panel that included 4 donkeys59, 2 Equus ovodovi60 and 2 Late Pleistocene North American horses61 that were used as outgroups, plus 550 horses representing all lineages previously characterized at the genome level (Supplementary Table 1).
Genome rescaling and trimming, error rates and single nucleotide polymorphism variation
Sequencing errors and nucleotide misincorporations resulting from post-mortem DNA damage were reduced by subjecting alignments to a five-step procedure: (1) PMDtools (v.0.60)62 identification and separation of those reads affected (--threshold 1; DAM) or not (--upperthreshold 1; NODAM) by post-mortem DNA damage, (2) 5 bp end-trimming of NODAM-aligned reads, (3) rescaling of DAM read alignments using mapDamage2 with default parameters (v.2.0.8)58, (4) 10 bp trimming of rescaled read alignments and (5) merging of processed NODAM and DAM categories to obtain final Binary Alignment Map (BAM) sequence alignments. Error rates were estimated following Librado et al.2 as the excess of private mutations, relative to a high-quality modern genome considered to be error-free (P5782_Ice_Modern; Supplementary Table 1). Single nucleotide polymorphisms (SNPs) were identified following the procedures from ref. 2, entailing data pseudo-haploidization with ANGSD (v.0.917)63 for those sites covered by two reads or more (base quality scores greater than or equal to 30), and disregarding sites uncovered in 30% or more of the samples. A further filter included the random selection of one transversion SNP only, in cases where two successive transversions occurred in adjacent genomic positions. Overall, our final dataset retained 9,099,487 high-quality nucleotide transversions spread across the 31 horse autosomes. Alleles were polarized considering the allele common to the three outgroup lineages as ancestral. A second dataset of 7,092,366 variants was generated to mitigate for possible bias introduced by uneven sequencing depths by repeating the procedure described above, but following the downsampling of BAM alignment files to the median value of the average depth-of-coverage values found across all specimens (that is, 2.02-fold). Subsequent analyses were replicated on both variant datasets.
Population graph modelling and population structure
Population graph modelling was carried out using the Markov chain Monte Carlo (MCMC) framework implemented in AdmixtureBayes64, and in Admixtools2 (ref. 4), considering a pre-selection of 14 and 10 genetically homogeneous population groups, respectively, all represented by a minimum of two specimens. This was key for Admixtools2 analyses4, to avoid biasing f3-statistics4 in the presence of population groups comprising a single pseudo-haploid genome. AdmixtureBayes analyses involved three independent runs, each containing 163 MCMC chains recording 200 million iterations. The final space of population graphs was obtained using a burn-in of 90% and thinning one every 40 iterations. The genomic make-up of CWC horses was further investigated through the qpAdm rotating scheme65 (Supplementary Table 2), and using a threshold of 0.01 for statistical significance. The geographic origins of CWC horses were also predicted using the Locator methodological framework based on deep neural networks21. To achieve this, we considered genomic window sizes of 10 Mb and the panel of 148 ancient horses predating the radiocarbon date of CWC horses. Genetic ancestries’ decomposition and multidimensional scaling were carried out using the Struct-f4 package24, grouping together 272 ancient and modern DOM2 horses to decrease computational costs. The first analytical step (assuming no admixture) consisted of 100 million MCMC iterations, whereas the second one (assuming admixture) involved 500 million iterations, until strict convergence. Default parameters were used otherwise, and the analyses were repeated assuming K = 8 to K = 10 admixture edges.
Inbreeding
Per-genome inbreeding levels were estimated applying the methodology from ref. 59 to individual BAM alignment files. This methodology does not require prior knowledge of population allele frequencies; it involves instead the random sampling of two reads per nucleotide transversion position and considering the density of sites within 1-cM-long genomic windows where the same allele was sampled twice (pseudo-homozygosity), versus two different alleles (pseudo-heterozygosity). Physical distances were converted into genetic distances using the recombination map from ref. 66, interpolating recombination rates linearly between two successive positions on the map. Windows showing pseudo-heterozygosity rates lower than 0.005 were considered to represent ROHs, with their cumulative span providing an inbreeding proxy. Close-kin mating was assessed through the total genome span encompassing long ROHs (that is, greater than or equal to 15 Mb).
Demographic trajectories
A total of 28 genomes from unrelated Botai horses were pseudo-haploidized for transversion sites, all with a maximum missingness of 10%. The demographic dynamics was reconstructed using GONE26 and patterns of linkage disequilibrium along all autosomes, excepting chromosomes 7, 11, 12 and 20. The parameter PHASE was turned to 0 to account for pseudo-haploid data; default parameters were applied otherwise. Confidence intervals for effective size variation were estimated from 500 bootstrap pseudo-replicates. The same procedure was repeated considering a selection of 24 ancient horse genomes dating back to an average of about 1850 bce, which represents the earliest high-quality set of DOM2 genomes characterized.
Generation times
Generation times and their potential variation were measured from the temporal accumulation of mutations present in a given genome relative to an ancestral sequence (reconstructed based on three outgroup species; that is, mutation clock) and from the linkage disequilibrium between pairs of derived mutations (that is, recombination clock). The proportion of derived mutations present in a given genome provided a direct proxy for the distance separating the sample considered from the ancestral sequence. This proportion was converted into an estimate of number of generations, assuming the mutation rate from ref. 29, rescaled for transversions, which provided our mutation clock estimate of generations elapsed from the ancestral sequence.
Our ‘recombination clock’ estimate is based on the average probability to find, in a given genome, a pair of SNPs separated by milliMorgans, and both carrying a derived allele. This probability was normalized by the proportion of derived mutations detected in the genome considered to mitigate potential bias resulting from depth-of-coverage and/or error rate variations across individuals, providing a direct measurement of the number of generations from the MRCA to all Eurasian horses present in our dataset. The ‘mutation clock’-based estimate was derived from all 31 autosomes, whereas chromosomes 7, 11, 12 and 20 were masked to obtain the ‘recombination clock’ estimate, owing to limitations in the recombination map now available for horses in relation to unaccounted structural variation, local misassemblies and the presence of neocentromeres. The ‘recombination clock’ estimate depends on three unknown parameters that were optimized through least square optimization (T, the total genealogical length in the whole sample set averaged across loci; ti, the genealogical length from the MRCA to horse specimen i averaged across its loci; and a constant pi capturing sample-specific variation in demography and haplotype sizes).
Our methodology was validated using the serial coalescent simulation framework provided by fastsimcoal v.2.702 (ref. 67) and considering 10 demographic scenarios, consisting of constant population sizes, population contractions and population expansion of various magnitudes and times, followed or not by population recovery (Extended Data Fig. 10). Individual genomes were simulated as 31 autosomes of 75 Mb each, using 10−8 recombination events and 2.3 × 10−8 mutation events per base pair and generation, respectively. A total of 20 simulated individuals were sampled along the genealogy every 100 generations, starting 900 generations ago, to cover the entire temporal range of horse domestication. Simulated as haploid, the 20 individuals sampled in each time bin, except the most recent, were then randomly paired to simulate diploid data under random mating, and were further subjected to pseudo-haploidization to mimic the data processing carried out on real data. The 20 individuals sampled for the most recent time period were paired with themselves before pseudo-haploidization to account for the increased inbreeding levels found in modern horse populations68.
The real genome dataset was filtered to exclude the IBE, LPSFR, ELEN and Vert311 population groups, which contain significant ancestry affinities with Late Pleistocene specimens from North America (LPNAMR). This prevented biasing the generation time estimates as a result of DNA introgression from divergent population groups, related to lineages used to polarize alleles as ancestral or derived. Ancient specimens not associated with direct radiocarbon dating were also disregarded, except at Botai, where the archaeological context is similar across all samples. This left 483 specimens delivering both ‘mutation clock’ and ‘recombination clock’ estimates for the number of generations elapsed from the ancestral sequence and since the time to the MRCA of Eurasian horses, respectively. Temporal shifts in generation times were identified on the basis of the downsampled dataset (Fig. 3), and using a generalized additive model (GAM), as implemented in the R mgvc package. Radiocarbon dates, the first five coordinates of the Struct-f4 multidimensional scaling analysis to capture the underlying population structure and a parameter, pi, controlling for the depth of coverage of each individual genome were the model covariates. Standard errors for the dependent variable were calculated by jackknifing, leaving one chromosome out at a time, and the inverse of the resulting variances were used as regression weights. Regression models in which radiocarbon dates were linearly related to the number of generations received significantly lower support than those allowing relaxing linearity through cubic spline transformation of radiocarbon dates (adjusted R2 (adj. R2) = 0.803 for the linear versus 0.894 for the GAM regression; analysis of variance P < 2.2 × 10−16). Finally, we used the derivative function of the R gratia package and time bins of 1,000 years to measure temporal changes in generation times.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All collapsed and paired-end sequence data for samples sequenced in this study are available in compressed FASTQ format through the European Nucleotide Archive under accession number PRJEB71445, together with rescaled and trimmed BAM sequence alignments against the nuclear horse reference genomes. Previously published ancient data used in this study are available under accession numbers PRJEB7537, PRJEB10098, PRJEB10854, PRJEB22390, PRJEB31613 and PRJEB44430, and detailed in Supplementary Table 1. The genomes of 78 modern horses, publicly available, were also accessed as indicated in their corresponding original publications, and in Supplementary Table 1. The maps presented in Fig. 1 were generated using QGIS 3.36 software (available at https://www.qgis.org/en/site/) and using free raster images obtained from Natural Earth (https://www.naturalearthdata.com/). The maps in Extended Data Fig. 3c,d were automatically generated through the R scripts embedded in the Locator software package (https://github.com/kr-colab/locator).
Code availability
The software to calculate generation time changes based on the recombination clock is available without restriction at Bitbucket (https://bitbucket.org/plibradosanz/generationtime/src/master/) and Zenodo (https://doi.org/10.5281/zenodo.10842666 or https://zenodo.org/records/10842666)69.
References
Kelekna, P. The Horse in Human History (Cambridge Univ. Press, 2009).
Librado, P. et al. The origins and spread of domestic horses from the Western Eurasian steppes. Nature 598, 634–640 (2021).
Anthony, D. W. The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World (Princeton Univ. Press, 2007).
Maier, R. et al. On the limits of fitting complex models of population history to f-statistics. eLife 12, e85492 (2023).
Trautmann, M. et al. First bioanthropological evidence for Yamnaya horsemanship. Sci. Adv. https://doi.org/10.1126/sciadv.ade2451 (2023).
Outram, A. K. et al. The earliest horse harnessing and milking. Science 323, 1332–1335 (2009).
Gaunitz, C. et al. Ancient genomes revisit the ancestry of domestic and Przewalski’s horses. Science 360, 111–114 (2018).
Haak, W. et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207–211 (2015).
Allentoft, M. E. et al. Population genomics of Bronze Age Eurasia. Nature 522, 167–172 (2015).
Penske, S. et al. Early contact between late farming and pastoralist societies in southeastern Europe. Nature 620, 358–365 (2023).
Wilkin, S. et al. Dairying enabled Early Bronze Age Yamnaya steppe expansions. Nature 598, 629–633 (2021).
Scott, A. et al. Emergence and intensification of dairying in the Caucasus and Eurasian steppes. Nat. Ecol. Evol. 6, 813–822 (2022).
Outram, A. K. Horse domestication as a multi-centered, multi-stage process: Botai and the role of specialized Eneolithic horse pastoralism in the development of human-equine relationships. Front. Environ. Archaeol. 2, 1134068 (2023).
Casanova, E. et al. Direct 14 C dating of equine products preserved in archaeological pottery vessels from Botai and Bestamak, Kazakhstan. Archaeol. Anthropol. Sci. 14, 175 (2022).
Outram, A., Bendrey, R., Evershed, R. P., Orlando, L. & Zaibert, V. F. Rebuttal of Taylor and Barrón-Ortiz 2021: Rethinking the evidence for early horse domestication at Botai. Zenodo https://doi.org/10.5281/zenodo.5142604 (2021).
Taylor, W. T. T. & Barrón-Ortiz, C. I. Rethinking the evidence for early horse domestication at Botai. Sci. Rep. 11, 7440 (2021).
Chechushkov, I. V. & Kosintsev, P. A. The Botai horse practices represent the neolithization process in the central Eurasian steppes: important findings from a new study on ancient horse DNA. J. Archaeol. Sci. Rep. 32, 102426 (2020).
Fages, A., Seguin-Orlando, A., Germonpré, M. & Orlando, L. Horse males became over-represented in archaeological assemblages during the Bronze Age. J. Archaeol. Sci. Rep. 31, 102364 (2020).
Heggarty, P. et al. Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages. Science 381, eabg0818 (2023).
Kanne, K. Riding, ruling, and resistance: equestrianism and political authority in the Hungarian Bronze Age. Curr. Anthropol. 63, 289–329 (2022).
Battey, C., Ralph, P. L. & Kern, A. D. Predicting geographic location from genetic variation with deep neural networks. eLife 9, e54507 (2020).
Kyselý, R. & Peške, L. New discoveries change existing views on the domestication of the horse and specify its role in human prehistory and history – a review. Archeologické rozhledy 74, 299–345 (2022).
Lazaridis, I. et al. The genetic history of the Southern Arc: a bridge between West Asia and Europe. Science 377, eabm4247 (2022).
Librado, P. & Orlando, L. Struct-f4: a Rcpp package for ancestry profile and population structure inference from f4-statistics. Bioinformatics 38, 2070–2071 (2022).
Clark, P. U. et al. The Last Glacial Maximum. Science 325, 710–714 (2009).
Santiago, E. et al. Recent demographic history inferred by high-resolution analysis of linkage disequilibrium. Mol. Biol. Evol. 37, 3642–3653 (2020).
Cozzi, B., Ballarin, C., Mantovani, R. & Rota, A. Aging and veterinary care of cats, dogs, and horses through the records of three university veterinary hospitals. Front. Vet. Sci. 4, 14 (2017).
Miller, M. A., Moore, G. E., Bertin, F. R. & Kritchevsky, J. E. What’s new in old horses? Postmortem diagnoses in mature and aged equids. Vet. Pathol. 53, 390–398 (2016).
Orlando, L. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).
Fages, A. et al. Tracking five millennia of horse management with extensive ancient genome time series. Cell 177, 1419–1435.e31 (2019).
Warmuth, V. et al. Reconstructing the origin and spread of horse domestication in the Eurasian steppe. Proc. Natl Acad. Sci. USA 109, 8202–8206 (2012).
Thiruvenkadan, A. K., Kandasamy, N. & Panneerselvam, S. Inheritance of racing performance of Thoroughbred horses. Livest. Sci. 121, 308–326 (2009).
Oates, J. in Prehistoric Steppe Adaptation and the Horse (eds Levine, M., Renfrew, C. & Boyle, K.) 115–138 (McDonald Institute for Archaeological Research, 2003).
Vila, E. Data on equids from late fourth and third millennium sites in northern Syria. In Proc. 9th Conference of the International Council of Archaeozoology (ed. Mashkour, M.) 101–123 (Oxbow Books, 2006).
Schwartz, G. M. & Nichols, J. J. After Collapse: The Regeneration of Complex Societies (Univ. of Arizona Press, 2010).
Butzer, K. W. in Third Millennium BC Climate Change and Old World Collapse (eds Dalfes, H. N., Kukla, G. & Weiss, H.) 245–296 (Springer, 1997).
Kristensen, T. N. & Sørensen, A. C. Inbreeding – lessons from animal breeding, evolutionary biology and conservation genetics. Anim. Sci. 80, 121–133 (2005).
Zaibert, V. Botaiskaya Kultura (KazAkparat, 2009).
Outram, A. K. & Bogaard, A. Subsistence and Society in Prehistory: New Directions in Economic Archaeology (Cambridge Univ. Press, 2019).
Levine, M. Botai and the origins of horse domestication. J. Anthropol. Archaeol. 18, 29–78 (1999).
Zeder, M. A. in Biodiversity in Agriculture: Domestication, Evolution, and Sustainability (eds Damania, A. B. et al.) 227–259 (Cambridge Univ. Press, 2012).
Kristiansen, K., Lindkvist, T. & Myrdal, J. Trade and Civilisation: Economic Networks and Cultural Ties, from Prehistory to the Early Modern Era (Cambridge Univ. Press, 2018).
Guimaraes, S. et al. Ancient DNA shows domestic horses were introduced in the southern Caucasus and Anatolia during the Bronze Age. Sci. Adv. 6, eabb0030 (2020).
Walker, M. et al. Formal subdivision of the Holocene series/epoch: a summary. J. Geol. Soc. India 93, 135–141 (2019).
Felkel, S. et al. The horse Y chromosome as an informative marker for tracing sire lines. Sci. Rep. 9, 6095 (2019).
Posth, C. et al. Palaeogenomics of Upper Palaeolithic to Neolithic European hunter-gatherers. Nature 615, 117–126 (2023).
Nielsen, R. et al. Tracing the peopling of the world through genomics. Nature 541, 302–310 (2017).
Bergström, A., Stringer, C., Hajdinjak, M., Scerri, E. M. L. & Skoglund, P. Origins of modern human ancestry. Nature 590, 229–237 (2021).
Ramsey, C. B. Bayesian analysis of radiocarbon dates. Radiocarbon 51, 337–360 (2009).
Reimer, P. J. et al. The IntCal20 Northern Hemisphere radiocarbon age calibration curve (0–55 cal kBP). Radiocarbon 62, 725–757 (2020).
Gamba, C. et al. Comparing the performance of three ancient DNA extraction methods for high-throughput sequencing. Mol. Ecol. Resour. 16, 459–469 (2016).
Schubert, M., Lindgreen, S. & Orlando, L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res. Notes 9, 88 (2016).
Schubert, M. et al. Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nat. Protoc. 9, 1056–1082 (2014).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Kalbfleisch, T. S. et al. Improved reference genome for the domestic horse increases assembly contiguity and composition. Commun. Biol. 1, 197 (2018).
Xu, X. & Arnason, U. The complete mitochondrial DNA sequence of the horse, Equus caballus: extensive heteroplasmy of the control region. Gene 148, 357–362 (1994).
Poullet, M. & Orlando, L. Assessing DNA sequence alignment methods for characterizing ancient genomes and methylomes. Front. Ecol. Evol. 8, 105 (2020).
Jónsson, H., Ginolhac, A., Schubert, M., Johnson, P. L. F. & Orlando, L. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29, 1682–1684 (2013).
Todd, E. T. et al. The genomic history and global expansion of domestic donkeys. Science 377, 1172–1180 (2022).
Cai, D. et al. Radiocarbon and genomic evidence for the survival of Equus Sussemionus until the late Holocene. eLife 11, e73346 (2022).
Vershinina, A. O. et al. Ancient horse genomes reveal the timing and extent of dispersals across the Bering Land Bridge. Mol. Ecol. 30, 6144–6161 (2021).
Skoglund, P. et al. Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proc. Natl Acad. Sci. USA 111, 2229–2234 (2014).
Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: analysis of next generation sequencing data. BMC Bioinf. 15, 356 (2014).
Nielsen, S. V. et al. Bayesian inference of admixture graphs on Native American and Arctic populations. PLoS Genet. 19, e1010410 (2023).
Harney, É., Patterson, N., Reich, D. & Wakeley, J. Assessing the performance of qpAdm: a statistical tool for studying population admixture. Genetics 217, iyaa045 (2021).
Beeson, S. K., Mickelson, J. R. & McCue, M. E. Equine recombination map updated to EquCab3.0. Anim. Genet. 51, 341–342 (2020).
Excoffier, L. et al. fastsimcoal2: demographic inference under complex evolutionary scenarios. Bioinformatics 37, 4882–4885 (2021).
Todd, E. T. et al. Imputed genomes of historical horses provide insights into modern breeding. iScience 26, 107104 (2023).
Librado, P. GenerationTime. Zenodo https://doi.org/10.5281/zenodo.10842666 (2024).
Acknowledgements
We thank A. Fromentier and C. Gamba for preliminary ancient DNA work; all members of the Archaeology, Genomics, Evolution and Societies group at the Centre for Anthropobiology and Genomics of Toulouse for support with laboratory organization and discussions; J. Karlsson and the Archaeological Collections at the National Historical Museums in Stockholm, Sweden, for advice and access to their collections; E. Willerslev for facilitating the sampling of ancient horse remains; A. Abedi, H. Davoudi, A. Mohased, J. Nokandeh and H. Omrani for facilitating access to archaeological material from Iran; and A. Choyke and P. Csippán at the Aquincum Museum, and J. Dani of the Deri Museum for access to, and assistance with, the collections in Hungary. We thank A. Marangoni for initiating collaborative work with some of our Italian partner institutions; V. Zaibert, who dedicated his life excavating at Botai, for accessing material from the site. This work was supported by France Génomique National infrastructure, funded as part of the ‘Investissement d’avenir’ programme managed by Agence Nationale de la Recherche (ANR-10-INBS-09); RYC2021-031607 funded by MCIN/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/Recovery, Transformation and Resilience Plan (PRTR); the consolidated research group SGR2021-00337-Seminari d’Estudis i Recerques Prehistòriques (SERP-UB) and SGR2021-00501- Archaeology of Social Dynamics (ASD, IMF-CSIC); the Ministerio de Economía y Competitividad of Spain (project HAR2017-87695-P); the Swiss National Science Foundation (project 178834); the Departament de Cultura of the ‘Generalitat de Catalunya’ (project ARQ001SOL-178-2022); the ‘Ginnerup and the End of Northern Europe’s First Farming Culture’ project, funded by the Aage og Johanne Louis-Hansen Foundation and the Augustinus Foundation; the ‘Cultural and economic centres of the Late Bronze Age of the southern part of the Middle Irtysh region’ (AP23488815); the Innovation Fund of the Austrian Academy of Sciences (ÖAW) (grant agreement IF_2015_17); the Russian Science Foundation (project 22-18-00470); the CNRS International Research Project AnimalFarm; the University Paul Sabatier IDEX Chaire d’Excellence (OURASI); and the France Génomique Appel à Grand Projet (ANR-10-INBS-09-08, BUCEPHALE and MARENGO projects). C.A. and A. Outram were funded by the Arts and Humanities Research Council, UK (AH/S000380/1). Y.R.H.C. was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie (grant agreement 890702-MethylRIDE). K.K. was supported by a National Science Foundation Doctoral Dissertation Improvement Grant (no. 0833106) and a Wenner-Gren Foundation Dissertation Fieldwork Grant. A.K.K. is supported by government research (project FMZF-2022-0013). R. Kyselý is supported by the Czech Academy of Sciences (RVO:67985912). P.F.K. is supported by the Russian Science Foundation (project RSF 22-18-00194). P.M. is supported by the National Science Centre (NSC), Poland, under the grant number 2023/49/B/HS3/00825. I.M. and V.M. are supported by the Ministry of Science and Education of the Republic of Kazakhstan (project BR18574223). V.V.P. is supported by the Russian Science Foundation (projects 21-18-00457P RNF and 24-68-00031 RNF) and government research (project FMZF-2022-0012). M.V.S. is supported by grant agreement 075-15-2021-1069. M. Szeliga is supported by the NSC, Poland, under grant number 2015/19/B/HS3/01720. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreements 834616-ARCHCAUCASUS, 101117101-anthropYXX, 681605-PEGASUS and 101071707-Horsepower).
Author information
Authors and Affiliations
Contributions
P.L. and L.O. conceptualized the project. K.M.G., M.D.J., K.C., L.C.H., M.P., E.P., H.V., M.N., A.R., P.T., S. Reiter, G.B., C.S., E.B., C.R., C. Degueurce, L.K.H., L.K., U.R., J.K., N.N.J., D.M., P.M., M. Szeliga, V.I., V.R., J.R., V.E.M., M. Verdugo, D.G.B., J.L.C., M.J.V., M.T.A., C.A., R.T., A. Ludwig, M. Marzullo, O.P., G.B.G., U.T., J.G., A.S., S.D.E., M.S.M., N. Boulbes, A.G., C.M., H.J.D., M. Vicze, P.A.K., R. Kyselý, L.P., T.O.C., E.A., I. Shevnina, A. Logvin, A.A.K., T.O.I., M.V.S., P.K.D., A.S.G., I.M., V.M., A.K.K., V.V.P., V.O., A. Öztan, B.S.A., H.M., G.R., R. Khaskhanov, S. Demidenko, A. Kadieva, B.A., M. Sundqvist, G.L., F.J.L.C., S.A., T.T.V., A.R.P., M.B., P.R.S., J.W., D.A.V., F.C., C.G.D., J.M.D.L., J.P., G.D.P., J. Sanmartí, N. Kallala, J.R.T., B.M.T., M.C.B.F., S.V.L., A.Z., S.L., S. Duchesne, A.A., J.B., J.L.H., N. Bayarkhuu, T.T., E.C., I. Shingiray, M. Mashkour, N.Y.B., D.S.K., A.B., A. Kalmykov, J.P.D., S. Reinhold, S.H., B.W., N.R., P.F.K., A.A.T., K.K., A. Outram and L.O. were responsible for sample collection. G.T., L.C., A.F., N. Khan, S.S., L.C.T., M.A.K., C.G., X.L., S.W., C.D.S., A.S.O., A.P., J.M.A., J. Southon, B.S., O.B., C. Donnadieu, Y.R.H.C., P.W. and L.O. were responsible for data generation. P.L. and L.O. carried out data analyses. P.L. developed the method. Y.R.H.C., A.S.O., L.K., P.M., M. Szeliga, R. Kyselý, M.V.S., I.M., V.M., A.K.K., V.V.P., F.J.L.C., S.A., S.H., B.W., P.F.K., A.A.T., K.K., A. Outram and L.O. acquired funding. L.O. co-ordinated the project. P.L. and L.O. wrote the original draft. L.O. reviewed and edited the paper, with input from all co-authors.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature thanks Wolfgang Haak, Lukas Wacker and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 QC filtering.
a) Histogram showing the distance between adjacent nucleotide transversions, if separated by less than 1Kbp. This revealed an excess of mutations at contiguous genomic positions (ie. 1 bp away). Although these could correspond to true single nucleotide polymorphism (SNPs) or multiple nucleotide variants (MNVs), they could also be enriched for spurious variants resulting from mis-mapping around small DNA insertions and deletions. b) Proportion of mutations within pre-defined MAF bins (Minor Allele Frequency), as a function of missingness across the specimens. Pre-defined MAF bins range from low- (pink) to high-frequency variants (green). The dashed line delimits the positions included (left) or excluded (right) from the analyses. The identifiability of low-frequency variants decreases with greater missingness, as expected. c) Same as panel a), for the ~7.1 M nucleotide transversions of the downsampled data set. d) Same as panel b), for the ~7.1 M nucleotide transversions of the downsampled data set.
Extended Data Fig. 2 Relative error rates.
Missing mutations per site in a test genome (y-axis), relative to a modern Icelandic horse (P5782_Ice_Modern) used as high-quality reference. a) for the full data set and SNP_pval 0. b) for the downsampled data set and SNP_val 0.
Extended Data Fig. 3 On the origins of CWC horses.
a) Consensus admixture graph generated from the posterior distribution of AdmixtureBayes64, when applied to the same horse populations considered in Extended Data Fig. 4. The values between brackets summarize the proportion of graphs sampled from the posterior distribution that support a split or admixture node. Admixture from unsampled (ghosts) populations is not represented, in contrast to Extended Data Fig. 4. b) Best Admixtools24 population model assuming 8 migration edges. The drift and admixture estimates are based on our extended dataset. c) Reference panel used for modeling pre-CWC clines of genetic diversity. d) Geospatial projection of the six CWC horse genomes analyzed in this study, in 10Mb-long windows.
Extended Data Fig. 4 Most supported population graph.
This graph summarizes the evolutionary history of pre- and post-domestication horse lineages, with CWC horses not receiving any direct genetic contribution from the steppe. The model is split into 2 panels for clarity. The numbers reported within boxes reflect the admixture contributions from the nodes specified, while those adjacent to arrows indicate the amount of genetic drift leading to individual nodes. Population groups are detailed in Table S1 and colors are according to Fig. 1a.
Extended Data Fig. 5 Visual embedding of Struct-f4 affinities.
a) The two first dimensions of a Metric MultiDimensional Scaling (MDS) analysis, summarizing the genomic affinities between horses, based on Struct-f4. To improve visualization, this excludes the five outgroup specimens. Samples are color-coded following Fig. 1a, and population groups are labelled accordingly. Horses projecting intermediate to large population groups reflect ancient clines of ancestry, stretching from the East (closer to Botai) to the West (closer to Europe). CPONT individuals, from the Central Steppe, are the closest to DOM2 horses. b) Same as a) for the downsampled dataset. c) First and third dimension of the same MDS analysis, which reveals CWC horses as the most distant European horses to DOM2 horses. d) Same for the downsampled dataset.
Extended Data Fig. 6 Struct-f4 ancestry profiles.
Ancestry proportions for the 558 individuals considered in this study, assuming from K = 8 (left) to K = 10 (right) components. A total of 272 horses previously identified as DOM2 were merged into a single population (DOM2), including all modern breeds, to reduce computational costs. CWC horses show the typical ancestry profile of pre-domestication Europe.
Extended Data Fig. 7 GONE demographic reconstruction.
Effective population size (Ne) estimated from the patterns of linkage disequilibrium (LD) present in a nearly contemporaneous population of 14 horses affiliated to the Sintashta culture, up to 200 generations before their existence. b) Example of local ancestry for a TURG horse genome (LR18x15_Rus_m2763), modeled with Admixfrog as a mixture of Botai and early DOM2 horses. c) Raw generation time estimates for ancient horses from the steppe, the Carpathian and Transylvanian Basins, without correcting for population structure and uneven sequencing depths (Supplementary Information). TURG* represents the group of TURG horses, after masking their genomes for tracts introgressed from Botai horses. d) Same for Botai horses, which involved more generations than past and contemporaneous horses from the region, with the exception of BORL and Przewalski’s horses (PRZW), previously inferred to descend from Botai and saved from extinction through captive management. The dates reported correspond to rounded means of the different samples present in each group.
Extended Data Fig. 8 Mutation clock estimates.
a) Relationship of the ingroup Eurasian horses to the outgroups considered in this study, including non-caballine equids (E. ovodovi and the donkey) and ancient horses from North America (LP_NAMR). Leveraging this topology, we counted the number of mutations (represented as stars) that occurred in the branch leading to every single Eurasian horse. Following pseudohaploidization, positions that are truly heterozygous in Eurasian horses become ancestral or derived, and both outcomes are expected at equal probabilities. This approach is, thus, insensitive to the underlying heterozygosity of the sample, and, hence, to their demographic history. b) Estimates of the number of generations evolved from the outgroups, based on the full data set. c) Estimates based on the downsampled dataset.
Extended Data Fig. 9 Recombination clock estimates.
a) Schematic representation that illustrates the expectation that the variance along the genome is greater in an older specimen (left) as the result of more generations of evolution and, hence, more recombination events than in younger specimens with regards to the time to the most common recent ancestor (MRCA) of the whole sample set. It is thus expected that the distribution of mutations (stars) is less even in the younger specimen (right), which underwent fewer recombination events, and thus carry longer haplotype blocks, in which mutations are equally likely to have occurred or not. b) Schematic visualization of the ti (time to the MRCA) and T (total length of the genealogy) parameters constituting the recombination clock model, for an illustrative sample of four genomes. c) Number of generations evolved from the MRCA, as estimated by applying the recombination clock model to the full data set.
Extended Data Fig. 10 Coalescent simulations to validate both methods.
a) Illustration of the 10 simulated scenarios (A-J), together with their underlying parameters. b) Each boxplot summarizes the estimates obtained from n = 10 diploid samples, when using the method relying on the recombination clock (in generations of evolution from the MRCA). Boxplots are comprised of their corresponding centres (median), box boundaries (interquantile ranges), and whiskers (1.5 times the interquantile ranges). The estimated age of the samples perfectly correlates with the simulated age of sampling (Pearson correlation; r = 0.999; two-tailed p-value = 0). c) Same as b) for the mutation clock (Pearson correlation; r = 0.999; two-tailed p-value = 0).
Supplementary information
Supplementary Information
This Supplementary Information file contains the following sections: Section 1. Archaeological Contexts and Sample Information; Section 2. Radiocarbon Dating; Section 3. Genome Analyses; Section 4. Measuring temporal variations in the horse generation time; and references.
Supplementary Tables
This file contains Supplementary Tables 1 and 2.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Librado, P., Tressières, G., Chauvey, L. et al. Widespread horse-based mobility arose around 2200 bce in Eurasia. Nature 631, 819–825 (2024). https://doi.org/10.1038/s41586-024-07597-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-024-07597-5
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.