Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse


The rich fossil record of equids has made them a model for evolutionary processes1. Here we present a 1.12-times coverage draft genome from a horse bone recovered from permafrost dated to approximately 560–780 thousand years before present (kyr bp)2,3. Our data represent the oldest full genome sequence determined so far by almost an order of magnitude. For comparison, we sequenced the genome of a Late Pleistocene horse (43 kyr bp), and modern genomes of five domestic horse breeds (Equus ferus caballus), a Przewalski’s horse (E. f. przewalskii) and a donkey (E. asinus). Our analyses suggest that the Equus lineage giving rise to all contemporary horses, zebras and donkeys originated 4.0–4.5 million years before present (Myr bp), twice the conventionally accepted time to the most recent common ancestor of the genus Equus4,5. We also find that horse population size fluctuated multiple times over the past 2 Myr, particularly during periods of severe climatic changes. We estimate that the Przewalski’s and domestic horse populations diverged 38–72 kyr bp, and find no evidence of recent admixture between the domestic horse breeds and the Przewalski’s horse investigated. This supports the contention that Przewalski’s horses represent the last surviving wild horse population6. We find similar levels of genetic variation among Przewalski’s and domestic populations, indicating that the former are genetically viable and worthy of conservation efforts. We also find evidence for continuous selection on the immune system and olfaction throughout horse evolution. Finally, we identify 29 genomic regions among horse breeds that deviate from neutrality and show low levels of genetic variation compared to the Przewalski’s horse. Such regions could correspond to loci selected early during domestication.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: The early Middle Pleistocene horse metapodial from Thistle Creek (TC).
Figure 2: Amino acid, protein and DNA preservation of the Thistle Creek horse bone.
Figure 3: Horse phylogenetic relationships and population divergence times.
Figure 4: Horse demographic history.

Accession codes


Sequence Read Archive

Data deposits

All sequence data have been submitted to Sequence Read Archive under accession number SRA082086 and are available for download, together with final BAM and VCF files, de novo donkey scaffolds, and proteomic data at


  1. 1

    Franzen, J. L. The Rise of Horses: 55 Million Years of Evolution (Johns Hopkins Univ. Press, 2010)

    Google Scholar 

  2. 2

    Froese, D. G., Westgate, J. A., Reyes, A. V., Enkin, R. J. & Preece, S. J. Ancient permafrost and a future, warmer Arctic. Science 321, 1648 (2008)

    CAS  ADS  Article  Google Scholar 

  3. 3

    Westgate, J. A. et al. Gold Run tephra: A Middle Pleistocene stratigraphic and paleoenvironmental marker across west-central Yukon Territory, Canada. Can. J. Earth Sci. 46, 465–478 (2009)

    Article  Google Scholar 

  4. 4

    Eisenmann, V. Origins, dispersals, and migrations of Equus (Mammalia, Perissofactyla). Courier Forschungsintitut Senckenberg 153, 161–170 (1992)

    Google Scholar 

  5. 5

    Forsten, A. Mitochondrial-DNA timetable and the evolution of Equus: Comparison of molecular and paleontological evidence. Ann. Zool. Fenn. 28, 301–309 (1992)

    Google Scholar 

  6. 6

    Goto, H. et al. A massively parallel sequencing approach uncovers ancient origins and high genetic variability of endangered Przewalski’s horses. Genome Biol. Evol. 3, 1096–1106 (2011)

    Article  Google Scholar 

  7. 7

    Reyes, A. V., Froese, D. G. & Jensen, B. J. Response of permafrost to last interglacial warming: field evidence from non-glaciated Yukon and Alaska. Quat. Sci. Rev. 29, 3256–3274 (2010)

    ADS  Article  Google Scholar 

  8. 8

    Orlando, L. et al. True single-molecule DNA sequencing of a Pleistocene horse bone. Genet. Res. 21, 1705–1719 (2011)

    CAS  Article  Google Scholar 

  9. 9

    Lindahl, T. Instability and decay of the primary structure of DNA. Nature 362, 709–715 (1993)

    CAS  ADS  Article  Google Scholar 

  10. 10

    Willerslev, E. et al. Ancient biomolecules from deep ice cores reveal a forested southern Greenland. Science 317, 111–114 (2007)

    CAS  ADS  Article  Google Scholar 

  11. 11

    Miller, W. et al. Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change. Proc. Natl Acad. Sci. USA 109, E2382–E2390 (2012)

    CAS  Article  Google Scholar 

  12. 12

    Cappellini, E. et al. Proteomic analysis of a pleistocene mammoth femur reveals more than one hundred ancient bone proteins. J. Proteome Res. 11, 917–926 (2012)

    CAS  Article  Google Scholar 

  13. 13

    Ginolhac, A. et al. Improving the performance of True Single Molecule Sequencing for ancient DNA. BMC Genomics 13, 177 (2012)

    CAS  Article  Google Scholar 

  14. 14

    Rasmussen, M. et al. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757–762 (2010)

    CAS  ADS  Article  Google Scholar 

  15. 15

    Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012)

    CAS  ADS  Article  Google Scholar 

  16. 16

    van Doorn, N. L., Wilson, J., Hollund, H., Soressi, M. & Collins, M. J. Site-specific deamidation of glutamine: a new marker of bone collagen deterioration. Rapid Commun. Mass Spectrom. 26, 2319–2327 (2012)

    CAS  ADS  Article  Google Scholar 

  17. 17

    Vilstrup, J. T. et al. Mitochondrial phylogenomics of modern and ancient equids. PLoS ONE 8, e55950 (2013)

    CAS  ADS  Article  Google Scholar 

  18. 18

    McFadden, B. J. & Carranza-Castaneda, O. Cranium of Dinohippus mexicanus (Mammalia Equidae) from the early Pliocene (latest Hemphillian) of central Mexico and the origin of Equus. Bull. Florida Museum Nat.. History 43, 163–185 (2002)

    Google Scholar 

  19. 19

    Weinstock, J. et al. Evolution, systematics, and phylogeography of Pleistocene horses in the new world: a molecular perspective. PLoS Biol. 3, e241 (2005)

    Article  Google Scholar 

  20. 20

    Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010)

    CAS  ADS  Article  Google Scholar 

  21. 21

    Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011)

    CAS  Article  Google Scholar 

  22. 22

    Lorenzen, E. D. et al. Species-specific responses of Late Quaternary megafauna to climate and humans. Nature 479, 359–364 (2011)

    CAS  ADS  Article  Google Scholar 

  23. 23

    International Union for Conservation of Nature. IUCN Red List of Threatened Species, Version 2010.1, (downloaded 11 March 2010)

  24. 24

    Reich, D. et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 1053–1060 (2010)

    CAS  ADS  Article  Google Scholar 

  25. 25

    Bowling, A. T. et al. Genetic variation in Przewalski’s horses, with special focus on the last wild caught mare, 231 Orlitza III. Cytogenet. Genome Res. 102, 226–234 (2003)

    CAS  Article  Google Scholar 

  26. 26

    Wade, C. M. et al. Genome sequence, comparative analysis, and population genetics of the domestic horse. Science 326, 865–867 (2009)

    CAS  ADS  Article  Google Scholar 

  27. 27

    Allentoft, M. E. et al. The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proc. R. Soc. Lond. B 279, 4724–4733 (2012)

    CAS  Article  Google Scholar 

  28. 28

    Kelstrup, C. D., Young, C., Lavallee, R., Nielsen, M. L. & Olsen, J. V. Optimized fast and sensitive acquisition methods for shotgun proteomics on a quadrupole orbitrap mass spectrometer. J. Proteome Res. 11, 3487–3497 (2012)

    CAS  Article  Google Scholar 

  29. 29

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009)

    CAS  Article  Google Scholar 

  30. 30

    Orlando, L. et al. Revising the recent evolutionary history of equids using ancient DNA. Proc. Natl Acad. Sci. USA 106, 21754–21759 (2009)

    CAS  ADS  Article  Google Scholar 

  31. 31

    Rohland, N. & Hofreiter, M. Ancient DNA extraction from bones and teeth. Nature Protocols 2, 1756–1762 (2007)

    CAS  Article  Google Scholar 

  32. 32

    Meyer, M. & Kircher, M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb. Protoc.. 6, (2010)

  33. 33

    Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1, 18 (2012)

    Article  Google Scholar 

  34. 34

    Stanke, M., Steinkamp, R., Waack, S. & Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 32, W309–W312 (2004)

    CAS  Article  Google Scholar 

  35. 35

    Carlton, J. M. et al. Draft genome sequence of the sexually transmitted pathogen Trichonomas vaginalis. Science 315, 207–212 (2007)

    ADS  Article  Google Scholar 

  36. 36

    Li, H. & Durbin, R. R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010)

    Article  Google Scholar 

  37. 37

    Li, H. et al. The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics 25, 2078–2079 (2009)

    Article  Google Scholar 

  38. 38

    McCue, M. E. et al. A high density SNP array for the domestic horse and extant Perissodactyla: utility for association mapping, genetic diversity, and phylogeny studies. PLoS Genet. 8, e1002451 (2012)

    CAS  Article  Google Scholar 

  39. 39

    Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007)

    CAS  Article  Google Scholar 

  40. 40

    Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006)

    Article  Google Scholar 

  41. 41

    R Development Core Team. A language and environment for statistical computing, (R Foundation for Statistical Computing, 2011)

  42. 42

    McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069–2070 (2010)

    CAS  Article  Google Scholar 

  43. 43

    Smith, C. I., Chamberlain, A. T., Riley, M. S., Stringer, C. & Collins, M. J. The thermal history of human fossils and the likelihood of successful DNA amplification. J. Hum. Evol. 45, 203–217 (2003)

    Article  Google Scholar 

  44. 44

    Ginolhac, A., Rasmussen, M., Gilbert, T. M., Willerslev, E. & Orlando, L. mapDamage: testing for damage patterns in ancient DNA sequences. Bioinformatics 27, 2153–2155 (2011)

    CAS  Article  Google Scholar 

  45. 45

    Briggs, A. W. et al. Patterns of damage in genomic DNA sequences from a Neandertal. Proc. Natl Acad. Sci. USA 104, 14616–14621 (2007)

    CAS  ADS  Article  Google Scholar 

  46. 46

    Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nature Biotechnol. 26, 1367–1372 (2008)

    CAS  Article  Google Scholar 

  47. 47

    Cox, J. et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011)

    CAS  ADS  Article  Google Scholar 

  48. 48

    Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002)

    CAS  Article  Google Scholar 

  49. 49

    Katoh, K. & Toh, H. Recent developments in the MAFFT multiple sequence alignment program. Brief. Bioinform. 9, 286–298 (2008)

    CAS  Article  Google Scholar 

  50. 50

    Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006)

    CAS  Article  Google Scholar 

  51. 51

    Stamatakis, A. et al. RAxML-Light: a tool for computing Terabyte phylogenies. Bioinformatics 28, 2064–2066 (2012)

    CAS  Article  Google Scholar 

  52. 52

    Sanderson, M. J. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19, 301–302 (2003)

    CAS  Article  Google Scholar 

  53. 53

    Shimodaira, H. & Hasegawa, M. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17, 1246–1247 (2001)

    CAS  Article  Google Scholar 

  54. 54

    Lippold, S., Matzke, N. J., Reissmann, M. & Hofreiter, M. Whole mitochondrial genome sequencing of domestic horses reveals incorporation of extensive wild horse diversity during domestication. BMC Evol. Biol. 11, 328 (2011)

    CAS  Article  Google Scholar 

  55. 55

    Achilli, A. et al. Mitochondrial genomes from modern horses reveal the major haplogroups that underwent domestication. Proc. Natl Acad. Sci. USA 109, 2449–2454 (2012)

    CAS  ADS  Article  Google Scholar 

  56. 56

    Warmuth, V. et al. Reconstructing the origin and spread of horse domestication in the Eurasian steppe. Proc. Natl Acad. Sci. USA 109, 8202–8206 (2012)

    CAS  ADS  Article  Google Scholar 

  57. 57

    Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 (2007)

    Article  Google Scholar 

  58. 58

    Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012)

    CAS  Article  Google Scholar 

  59. 59

    Rambaut, A. & Drummond, A. J. Tracer v1. 5, (2009)

  60. 60

    Hudson, R. R. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18, 337–338 (2002)

    CAS  Article  Google Scholar 

  61. 61

    Zhang, Z. Computational Molecular Evolution (Oxford Univ. Press, 2006)

    Google Scholar 

  62. 62

    Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nature Protocols 4, 44–57 (2009)

    CAS  Article  Google Scholar 

  63. 63

    Nielsen, R. Molecular signatures of natural selection. Annu. Rev. Genet. 39, 197–218 (2005)

    CAS  Article  Google Scholar 

  64. 64

    Busing, F. M. T. A., Meijer, E. & Van Der Leeden, R. Delete-m Jackknife for Unequal m. Stat. Comput. 9, 3–8 (1999)

    Article  Google Scholar 

  65. 65

    Keane, T. M., Creevey, C. J., Pentony, M. M., Naughton, T. J. & McInerney, J. O. Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol. Biol. 6, 29 (2006)

    Article  Google Scholar 

  66. 66

    Guindon, S. et al. New algorithms and methods to estimate Maximum-Likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010)

    CAS  Article  Google Scholar 

Download references


We thank T. Brand, the laboratory technicians at the Danish National High-throughput DNA Sequencing Centre and the Illumina sequencing platform at SciLifeLab-Uppsala for technical assistance; J. Clausen for help with the donkey samples; S. Rasmussen for computational assistance; J. N. MacLeod and T. Kalbfleisch for discussions involving the re-sequencing of the horse reference genome; and S. Sawyer for providing published ancient horse data. This work was supported by the Danish Council for Independent Research, Natural Sciences (FNU); the Danish National Research Foundation; the Novo Nordisk Foundation; the Lundbeck Foundation (R52-A5062); a Marie-Curie Career Integration grant (FP7 CIG-293845); the National Science Foundation ARC-0909456; National Science Foundation DBI-0906041; the Searle Scholars Program; King Saud University Distinguished Scientist Fellowship Program (DSFP); Natural Science and Engineering Research Council of Canada; the US National Science Foundation DMR-0923096; and a grant RC2 HG005598 from the National Human Genetics Research Institute (NHGRI). A.G. was supported by a Marie-Curie Intra-European Fellowship (FP7 IEF-299176). M.F. was supported by EMBO Long-Term Post-doctoral Fellowship (ALTF 229-2011). A.-S.M. was supported by a fellowship from the Swiss National Science Foundation (SNSF). Mi.S. was supported by the Lundbeck foundation (R82-5062).

Author information




L.O. and E.W. initially conceived and headed the project; G.Z. and Ju.W. headed research at BGI; L.O. and E.W. designed the experimental research project set-up, with input from B.S. and R.N.; D.F. and G.D.Z. provided the Thistle Creek specimen, stratigraphic context and morphological information, with input from K.K.; K.H.R., B.S., K.G., D.C.M., D.F.A., K.A.S.A.-R. and M.F.B. provided samples; L.O., J.T.V., Ma.R., M.H., C.M. and J.S. did ancient and modern DNA extractions and constructed Illumina DNA libraries for shotgun sequencing; Ja.W. did the independent replication in Oxford; Ma.S. did ancient DNA extractions and generated target enrichment sequence data; Ji.M. and X.W. did Illumina libraries on donkey extracts; K.M., C.M. and A.S.-O. performed Illumina sequencing for the Middle Pleistocene and the 43-kyr-old horse genomes, the five domestic horse genomes and the Przewalski’s horse genome at Copenhagen, with input from Mo.R.; Ji.M. and X.W. performed Illumina sequencing for the Middle Pleistocene and the donkey genomes at BGI; J.F.T. headed true Single DNA Molecule Sequencing of the Middle Pleistocene genome; A.G., B.P. and Mi.S. did the mapping analyses and generated genome alignments, with input from L.O. and A.K.; Jo.V. and T.S.-P. did the metagenomic analyses, with input from A.G., B.P., S.B. and L.O.; Jo.V. and T.S.-P. did the ab initio prediction of the donkey genes and the identification of the Y chromosome scaffolds, with input from A.G. and Mi.S.; L.O., A.G. and P.L.F.J. did the damage analyses, with input from I.M.; A.G. did the functional SNP assignment; A.M.V.V. and L.O. did the PCA analyses, with input from O.R.; B.S. did the phylogenetic and Bayesian skyline reconstructions on mitochondrial data; Mi.S. did the phylogenetic and divergence dating based on nuclear data, with input from L.O.; A.G. did the PSMC analyses using data generated by C.J.R. and L.A.; L.O. and A.G. did the population divergence analyses, with input from J.C., R.N. and M.F.; L.O., A.G. and T.K. did the selection scans, with input from A.-S.M. and R.N.; A.A., I.M. and M.F. did the admixture analyses, with input from R.N.; L.O. and A.G. did the analysis of paralogues and structural variation; Ja.V. and A.D. did the amino-acid composition analyses; E.C., C.D.K., D.S., L.J.J. and J.V.O. did the proteomic analyses, with input from M.T.P.G. and A.M.V.V.; L.O. and V.E. performed the morphological analyses, with input from D.F. and G.D.Z.; L.O. and E.W. wrote the manuscript, with critical input from M.H., B.S., Jo.M. and all remaining authors.

Corresponding authors

Correspondence to Ludovic Orlando or Jun Wang or Eske Willerslev.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Information

This file contains Supplementary Text and Data, Supplementary Figures, Supplementary Tables and additional references (see Contents for details). This file was updated on 3 July 2013 to correctly display figure S1.3 (PDF 20068 kb)

Supplementary Figures

This file contains Supplementary Figures S6.8-S6.38, which show DNA fragmentation and nucleotide misincorporation patterns for mitochondrial reads from other ancient samples analyzed in this study. (PDF 2191 kb)

Supplementary Tables

This zipped file contains Supplementary Tables 4.2, 4.3, 4.4, 5.9, 11.3, 11.4, 11.7 and 12.8. (ZIP 10146 kb)

PowerPoint slides

Source data

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Orlando, L., Ginolhac, A., Zhang, G. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing