Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Complete mitogenomes document substantial genetic contribution from the Eurasian Steppe into northern Pakistani Indo-Iranian speakers


To elucidate whether Bronze Age population dispersals from the Eurasian Steppe to South Asia contributed to the gene pool of Indo-Iranian-speaking groups, we analyzed 19,568 mitochondrial DNA (mtDNA) sequences from northern Pakistani and surrounding populations, including 213 newly generated mitochondrial genomes (mitogenomes) from Iranian and Dardic groups, both speakers from the ancient Indo-Iranian branch in northern Pakistan. Our results showed that 23% of mtDNA lineages with west Eurasian origin arose in situ in northern Pakistan since ~5000 years ago (kya), a time depth very close to the documented Indo-European dispersals into South Asia during the Bronze Age. Together with ancient mitogenomes from western Eurasia since the Neolithic, we identified five haplogroups (~8.4% of maternal gene pool) with roots in the Steppe region and subbranches arising (age ~5–2 kya old) in northern Pakistan as genetic legacies of Indo-Iranian speakers. Some of these haplogroups, such as W3a1b that have been found in the ancient samples from the late Bronze Age to the Iron Age period individuals of Swat Valley northern Pakistan, even have sub-lineages (age ~4 kya old) in the southern subcontinent, consistent with the southward spread of Indo-Iranian languages. By showing that substantial genetic components of Indo-Iranian speakers in northern Pakistan can be traced to Bronze Age in the Steppe region, our study suggests a demographic link with the spread of Indo-Iranian languages, and further highlights the corridor role of northern Pakistan in the southward dispersal of Indo-Iranian-speaking groups.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Native speakers of Indo-Iranian subgroups and sampling locations.
Fig. 2: PCA plot based on haplogroup frequencies for 6528 complete mitogenomes included in the present study (Table S5).
Fig. 3: Phylogeographic presentations of genetic legacy of IE speakers from the Steppe during the Bronze Age.
Fig. 4: Bayesian skyline plot (BSP) analysis of mitogenomes in South Asia.

Data availability

The 213 complete mitochondrial DNA sequences were deposited in GenBank under accession numbers MN595684–MN595896.


  1. 1.

    Eberhard DM, Simons GF, Fennig CD. Ethnologue: languages of the world. 23rd ed. Dallas: SIL International; 2020.

  2. 2.

    Parpola A. The roots of Hinduism: the early Aryans and the Indus Civilization. New York, Oxford University Press; 2015.

  3. 3.

    Anthony DW. The horse, the wheel, and language: how Bronze-Age Riders from the Eurasian Steppes shaped the modern world. New Jersy, Princeton University Press; 2007.

  4. 4.

    Renfrew C. Archaeology and language: the puzzle of Indo-European origins. London: Jonathan Cape; 1987.

  5. 5.

    Wang CC, Reinhold S, Kalmykov A, Wissgott A, Brandt G, Jeong C, et al. Ancient human genome-wide data from a 3000-year interval in the Caucasus corresponds with eco-geographic regions. Nat Commun. 2019;10:590.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  6. 6.

    Haak W, Lazaridis I, Patterson N, Rohland N, Mallick S, Llamas B, et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature. 2015;522:207–11.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  7. 7.

    Basu A, Mukherjee N, Roy S, Sengupta S, Banerjee S, Chakraborty M, et al. Ethnic India: a genomic view, with special reference to peopling and structure. Genome Res. 2003;13:2277–90.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  8. 8.

    Cordaux R, Saha N, Bentley GR, Aunger R, Sirajuddin SM, Stoneking M. Mitochondrial DNA analysis reveals diverse histories of tribal populations from India. Eur J Hum Genet. 2003;11:253–64.

    CAS  PubMed  Article  Google Scholar 

  9. 9.

    Silva M, Oliveira M, Vieira D, Brandao A, Rito T, Pereira JB, et al. A genetic chronology for the Indian Subcontinent points to heavily sex-biased dispersals. BMC Evol Biol. 2017;17:88.

    PubMed  PubMed Central  Article  Google Scholar 

  10. 10.

    Narasimhan VM, Patterson N, Moorjani P, Rohland N, Bernardos R, Mallick S, et al. The formation of human populations in South and Central Asia. Science. 2019;365:eaat7487.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. 11.

    Kivisild T, Bamshad MJ, Kaldma K, Metspalu M, Metspalu E, Reidla M, et al. Deep common ancestry of Indian and western-Eurasian mitochondrial DNA lineages. Curr Biol. 1999;9:1331–4.

    CAS  PubMed  Article  Google Scholar 

  12. 12.

    de Barros Damgaard P, Martiniano R, Kamm J, Moreno-Mayar JV, Kroonen G, Peyrot M, et al. The first horse herders and the impact of early Bronze Age steppe expansions into Asia. Science. 2018;360:eaar7711.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  13. 13.

    Li Y-C, Wang H-W, Tian J-Y, Li R-L, Rahman ZU, Kong Q-P. Cultural diffusion of Indo-Aryan languages into Bangladesh: a perspective from mitochondrial DNA. Mitochondrion. 2018;38:23–30.

    PubMed  Article  CAS  Google Scholar 

  14. 14.

    Moorjani P, Thangaraj K, Patterson N, Lipson M, Loh PR, Govindaraj P, et al. Genetic evidence for recent population mixture in India. Am J Hum Genet. 2013;93:422–38.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  15. 15.

    Cavalli-Sforza LL, Cavalli-Sforza L, Menozzi P, Piazza A. The history and geography of human genes. New Jersy, Princeton University Press; 1994.

  16. 16.

    Jettmar K. Ethnological research in Dardistan 1958: preliminary report. Proc Am Philos Soc. 1961;105:79–97.

    Google Scholar 

  17. 17.

    Grierson AG. Linguistic survey of India: Vol. VIII. India: Office of the Superintendent of Government Printing; 1919.

  18. 18.

    Ayub Q, Mezzavilla M, Pagani L, Haber M, Mohyuddin A, Khaliq S, et al. The Kalash genetic isolate: ancient divergence, drift, and selection. Am J Hum Genet. 2015;96:775–83.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  19. 19.

    Masica CP. The Indo-Aryan languages. Cambridge: Cambridge University Press; 1991.

  20. 20.

    Jettmar K. Urgent tasks of research among the Dardic peoples of Eastern Afghanistan and Northern Pakistan. Anthropol Ethnol Res. 1959;2:85–96.

    Google Scholar 

  21. 21.

    Michael MTH. Four varieties of Pashto. J Am Orient Soc. 1983;103:595–7.

    Article  Google Scholar 

  22. 22.

    Rahman ZU, Li Y-C, Tian J-Y, Kong Q-P. Exploring European ancestry among the Kalash population: a mitogenomic perspective. Zool Res. 2020;41:552–6.

    PubMed  PubMed Central  Article  Google Scholar 

  23. 23.

    Sambrook J, Fritsch EF, Maniatis T. Extraction with phenol: chloroform. Molecular cloning: a laboratory manual. New York, Cold Spring Harbor Laboratory Press; vol 3. 1989.

  24. 24.

    Andrews RM, Kubacka I, Chinnery PF, Lightowlers RN, Turnbull DM, Howell N. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet. 1999;23:147.

    CAS  PubMed  Article  Google Scholar 

  25. 25.

    van Oven M, Kayser M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum Mutat. 2009;30:E386–94.

    PubMed  Article  Google Scholar 

  26. 26.

    Weissensteiner H, Pacher D, Kloss-Brandstatter A, Forer L, Specht G, Bandelt HJ, et al. HaploGrep 2: era of high-throughput sequencing. Nucleic Acids Res. 2016;44:W58–63.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. 27.

    Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526:75–81.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28.

    Long JC, Williams RC, McAuley JE, Medis R, Partel R, Tregellas WM, et al. Genetic variation in Arizona Mexican Americans: estimation and interpretation of admixture proportions. Am J Phys Anthropol. 1991;84:141–57.

    CAS  PubMed  Article  Google Scholar 

  29. 29.

    Saillard J, Forster P, Lynnerup N, Bandelt HJ, Nørby S. mtDNA variation among Greenland Eskimos: the edge of the Beringian expansion. Am J Hum Genet. 2000;67:718–26.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  30. 30.

    Forster P, Harding R, Torroni A, Bandelt HJ. Origin and evolution of Native American mtDNA variation: a reappraisal. Am J Hum Genet. 1996;59:935–45.

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31.

    Macaulay V, Soares P, Richards MB. Rectifying long-standing misconceptions about the ρ statistic for molecular dating. PLoS ONE. 2019;14:e0212311.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  32. 32.

    Soares P, Ermini L, Thomson N, Mormina M, Rito T, Rohl A, et al. Correcting for purifying selection: an improved human mitochondrial molecular clock. Am J Hum Genet. 2009;84:740–59.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. 33.

    Drummond AJ, Suchard MA, Xie D, Rambaut A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol. 2012;29:1969–73.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. 34.

    Kogelnik AM, Lott MT, Brown MD, Navathe SB, Wallace DC. MITOMAP: an update on the status of the human mitochondrial genome database. Nucleic Acids Res. 1997;25:196–9.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. 35.

    Ehler E, Novotny J, Juras A, Chylenski M, Moravcik O, Paces J. AmtDB: a database of ancient human mitochondrial genomes. Nucleic Acids Res. 2018;47:D29–D32.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  36. 36.

    Atkinson QD, Gray RD, Drummond AJ. mtDNA variation predicts population size in humans and reveals a major Southern Asian chapter in human prehistory. Mol Biol Evol. 2008;25:468–74.

    CAS  PubMed  Article  Google Scholar 

  37. 37.

    Posada D, Buckley TR. Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and bayesian approaches over likelihood ratio tests. Syst Biol. 2004;53:793–808.

    PubMed  PubMed Central  Article  Google Scholar 

  38. 38.

    Ho SY, Endicott P. The crucial role of calibration in molecular date estimates for the peopling of the Americas. Am J Hum Genet. 2008;83:142–6.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  39. 39.

    Bouckaert RR. DensiTree: making sense of sets of phylogenetic trees. Bioinformatics. 2010;26:1372–3.

    CAS  PubMed  Article  Google Scholar 

  40. 40.

    Torroni A, Lott MT, Cabell MF, Chen YS, Lavergne L, Wallace DC. mtDNA and the origin of Caucasians: identification of ancient Caucasian-specific haplogroups, one of which is prone to a recurrent somatic duplication in the D-loop region. Am J Hum Genet. 1994;55:760–76.

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Richards M, Corte-Real H, Forster P, Macaulay V, Wilkinson-Herbots H, Demaine A, et al. Paleolithic and neolithic lineages in the European mitochondrial gene pool. Am J Hum Genet. 1996;59:185–203.

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Macaulay V, Richards M, Hickey E, Vega E, Cruciani F, Guida V, et al. The emerging tree of West Eurasian mtDNAs: a synthesis of control-region sequences and RFLPs. Am J Hum Genet. 1999;64:232–249.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  43. 43.

    Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, et al. A human genome diversity cell line panel. Science. 2002;296:261–2.

    CAS  PubMed  Article  Google Scholar 

  44. 44.

    Dambricourt Malassé A, Gaillard C. Relations between climatic changes and prehistoric human migrations during Holocene between Gissar Range, Pamir, Hindu Kush and Kashmir: the archaeological and ecological data. Quat Int. 2011;229:123–31.

    Article  Google Scholar 

  45. 45.

    Dani AH. Gandhara grave culture and the Aryan problem. J Cent Asia. 1978;1:42–55.

    Google Scholar 

  46. 46.

    Lazaridis I, Nadel D, Rollefson G, Merrett DC, Rohland N, Mallick S, et al. Genomic insights into the origin of farming in the ancient Near East. Nature. 2016;536:419–24.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  47. 47.

    Hofmanová Z, Kreutzer S, Hellenthal G, Sell C, Diekmann Y, Díez-del-Molino D, et al. Early farmers from across Europe directly descended from Neolithic Aegeans. Proc Natl Am Sci. 2016;113:6886–91.

    Article  CAS  Google Scholar 

  48. 48.

    Shinde V, Narasimhan VM, Rohland N, Mallick S, Mah M, Lipson M, et al. An Ancient Harappan genome lacks ancestry from Steppe Pastoralists or Iranian farmers. Cell. 2019;179:729–35.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  49. 49.

    Heller R, Chikhi L, Siegismund HR. The confounding effect of population structure on Bayesian skyline plot inferences of demographic history. PLoS ONE. 2013;8:e62992.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  50. 50.

    Hewitt GM. Genetic consequences of climatic oscillations in the Quaternary. Philos Trans R Soc Lond B Biol Sci. 2004;359:183–95.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

Download references


We would like to thank Dr. Christine Watts for help in refining the paper. We also extend our gratitude to all sample donors for making this study possible. Z-UR is grateful to Dr. Jan Alam and Abdul Hameed from Hazara University, Pakistan, for discussions on the anthropological/archeological perspectives of the study. Z-UR is also grateful to former colleagues from Dr. Obaid lab at University of Karachi, Pakistan, for their support. We are also thankful to the four reviewers for providing helpful comments and suggestions.


This work was supported by the Strategic Priority Research Program (Grant No. XDA20040102), Second Tibetan Plateau Scientific Expedition, Research (STEP) (Grant No. 2019QZKK0607), National Natural Science Foundation of China (31620103907, 31601017), Chinese Academy of Sciences (QYZDB-SSW-SMC020), and Yunnan Applied Basic Research Project (2017FB044).

Author information




Q-PK and Y-CL designed the research; Z-UR collected samples; Z-UR and J-YT collected the data; Z-UR, J-YT, B-YY, and L-QY performed the experiments; Z-UR, Y-CL, and J-YT analyzed the data; Z-LG, H-TW, and W-XX assisted in data analysis and discussed the results; Z-UR, Y-CL, and Q-PK wrote the paper.

Corresponding authors

Correspondence to Yu-Chun Li or Qing-Peng Kong.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rahman, Z.U., Tian, JY., Gao, ZL. et al. Complete mitogenomes document substantial genetic contribution from the Eurasian Steppe into northern Pakistani Indo-Iranian speakers. Eur J Hum Genet (2021).

Download citation


Quick links