To elucidate whether Bronze Age population dispersals from the Eurasian Steppe to South Asia contributed to the gene pool of Indo-Iranian-speaking groups, we analyzed 19,568 mitochondrial DNA (mtDNA) sequences from northern Pakistani and surrounding populations, including 213 newly generated mitochondrial genomes (mitogenomes) from Iranian and Dardic groups, both speakers from the ancient Indo-Iranian branch in northern Pakistan. Our results showed that 23% of mtDNA lineages with west Eurasian origin arose in situ in northern Pakistan since ~5000 years ago (kya), a time depth very close to the documented Indo-European dispersals into South Asia during the Bronze Age. Together with ancient mitogenomes from western Eurasia since the Neolithic, we identified five haplogroups (~8.4% of maternal gene pool) with roots in the Steppe region and subbranches arising (age ~5–2 kya old) in northern Pakistan as genetic legacies of Indo-Iranian speakers. Some of these haplogroups, such as W3a1b that have been found in the ancient samples from the late Bronze Age to the Iron Age period individuals of Swat Valley northern Pakistan, even have sub-lineages (age ~4 kya old) in the southern subcontinent, consistent with the southward spread of Indo-Iranian languages. By showing that substantial genetic components of Indo-Iranian speakers in northern Pakistan can be traced to Bronze Age in the Steppe region, our study suggests a demographic link with the spread of Indo-Iranian languages, and further highlights the corridor role of northern Pakistan in the southward dispersal of Indo-Iranian-speaking groups.
Subscribe to Journal
Get full journal access for 1 year
only $33.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The 213 complete mitochondrial DNA sequences were deposited in GenBank under accession numbers MN595684–MN595896.
Eberhard DM, Simons GF, Fennig CD. Ethnologue: languages of the world. 23rd ed. Dallas: SIL International; 2020.
Parpola A. The roots of Hinduism: the early Aryans and the Indus Civilization. New York, Oxford University Press; 2015.
Anthony DW. The horse, the wheel, and language: how Bronze-Age Riders from the Eurasian Steppes shaped the modern world. New Jersy, Princeton University Press; 2007.
Renfrew C. Archaeology and language: the puzzle of Indo-European origins. London: Jonathan Cape; 1987.
Wang CC, Reinhold S, Kalmykov A, Wissgott A, Brandt G, Jeong C, et al. Ancient human genome-wide data from a 3000-year interval in the Caucasus corresponds with eco-geographic regions. Nat Commun. 2019;10:590.
Haak W, Lazaridis I, Patterson N, Rohland N, Mallick S, Llamas B, et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature. 2015;522:207–11.
Basu A, Mukherjee N, Roy S, Sengupta S, Banerjee S, Chakraborty M, et al. Ethnic India: a genomic view, with special reference to peopling and structure. Genome Res. 2003;13:2277–90.
Cordaux R, Saha N, Bentley GR, Aunger R, Sirajuddin SM, Stoneking M. Mitochondrial DNA analysis reveals diverse histories of tribal populations from India. Eur J Hum Genet. 2003;11:253–64.
Silva M, Oliveira M, Vieira D, Brandao A, Rito T, Pereira JB, et al. A genetic chronology for the Indian Subcontinent points to heavily sex-biased dispersals. BMC Evol Biol. 2017;17:88.
Narasimhan VM, Patterson N, Moorjani P, Rohland N, Bernardos R, Mallick S, et al. The formation of human populations in South and Central Asia. Science. 2019;365:eaat7487.
Kivisild T, Bamshad MJ, Kaldma K, Metspalu M, Metspalu E, Reidla M, et al. Deep common ancestry of Indian and western-Eurasian mitochondrial DNA lineages. Curr Biol. 1999;9:1331–4.
de Barros Damgaard P, Martiniano R, Kamm J, Moreno-Mayar JV, Kroonen G, Peyrot M, et al. The first horse herders and the impact of early Bronze Age steppe expansions into Asia. Science. 2018;360:eaar7711.
Li Y-C, Wang H-W, Tian J-Y, Li R-L, Rahman ZU, Kong Q-P. Cultural diffusion of Indo-Aryan languages into Bangladesh: a perspective from mitochondrial DNA. Mitochondrion. 2018;38:23–30.
Moorjani P, Thangaraj K, Patterson N, Lipson M, Loh PR, Govindaraj P, et al. Genetic evidence for recent population mixture in India. Am J Hum Genet. 2013;93:422–38.
Cavalli-Sforza LL, Cavalli-Sforza L, Menozzi P, Piazza A. The history and geography of human genes. New Jersy, Princeton University Press; 1994.
Jettmar K. Ethnological research in Dardistan 1958: preliminary report. Proc Am Philos Soc. 1961;105:79–97.
Grierson AG. Linguistic survey of India: Vol. VIII. India: Office of the Superintendent of Government Printing; 1919.
Ayub Q, Mezzavilla M, Pagani L, Haber M, Mohyuddin A, Khaliq S, et al. The Kalash genetic isolate: ancient divergence, drift, and selection. Am J Hum Genet. 2015;96:775–83.
Masica CP. The Indo-Aryan languages. Cambridge: Cambridge University Press; 1991.
Jettmar K. Urgent tasks of research among the Dardic peoples of Eastern Afghanistan and Northern Pakistan. Anthropol Ethnol Res. 1959;2:85–96.
Michael MTH. Four varieties of Pashto. J Am Orient Soc. 1983;103:595–7.
Rahman ZU, Li Y-C, Tian J-Y, Kong Q-P. Exploring European ancestry among the Kalash population: a mitogenomic perspective. Zool Res. 2020;41:552–6.
Sambrook J, Fritsch EF, Maniatis T. Extraction with phenol: chloroform. Molecular cloning: a laboratory manual. New York, Cold Spring Harbor Laboratory Press; vol 3. 1989.
Andrews RM, Kubacka I, Chinnery PF, Lightowlers RN, Turnbull DM, Howell N. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet. 1999;23:147.
van Oven M, Kayser M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum Mutat. 2009;30:E386–94.
Weissensteiner H, Pacher D, Kloss-Brandstatter A, Forer L, Specht G, Bandelt HJ, et al. HaploGrep 2: era of high-throughput sequencing. Nucleic Acids Res. 2016;44:W58–63.
Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526:75–81.
Long JC, Williams RC, McAuley JE, Medis R, Partel R, Tregellas WM, et al. Genetic variation in Arizona Mexican Americans: estimation and interpretation of admixture proportions. Am J Phys Anthropol. 1991;84:141–57.
Saillard J, Forster P, Lynnerup N, Bandelt HJ, Nørby S. mtDNA variation among Greenland Eskimos: the edge of the Beringian expansion. Am J Hum Genet. 2000;67:718–26.
Forster P, Harding R, Torroni A, Bandelt HJ. Origin and evolution of Native American mtDNA variation: a reappraisal. Am J Hum Genet. 1996;59:935–45.
Macaulay V, Soares P, Richards MB. Rectifying long-standing misconceptions about the ρ statistic for molecular dating. PLoS ONE. 2019;14:e0212311.
Soares P, Ermini L, Thomson N, Mormina M, Rito T, Rohl A, et al. Correcting for purifying selection: an improved human mitochondrial molecular clock. Am J Hum Genet. 2009;84:740–59.
Drummond AJ, Suchard MA, Xie D, Rambaut A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol. 2012;29:1969–73.
Kogelnik AM, Lott MT, Brown MD, Navathe SB, Wallace DC. MITOMAP: an update on the status of the human mitochondrial genome database. Nucleic Acids Res. 1997;25:196–9.
Ehler E, Novotny J, Juras A, Chylenski M, Moravcik O, Paces J. AmtDB: a database of ancient human mitochondrial genomes. Nucleic Acids Res. 2018;47:D29–D32.
Atkinson QD, Gray RD, Drummond AJ. mtDNA variation predicts population size in humans and reveals a major Southern Asian chapter in human prehistory. Mol Biol Evol. 2008;25:468–74.
Posada D, Buckley TR. Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and bayesian approaches over likelihood ratio tests. Syst Biol. 2004;53:793–808.
Ho SY, Endicott P. The crucial role of calibration in molecular date estimates for the peopling of the Americas. Am J Hum Genet. 2008;83:142–6.
Bouckaert RR. DensiTree: making sense of sets of phylogenetic trees. Bioinformatics. 2010;26:1372–3.
Torroni A, Lott MT, Cabell MF, Chen YS, Lavergne L, Wallace DC. mtDNA and the origin of Caucasians: identification of ancient Caucasian-specific haplogroups, one of which is prone to a recurrent somatic duplication in the D-loop region. Am J Hum Genet. 1994;55:760–76.
Richards M, Corte-Real H, Forster P, Macaulay V, Wilkinson-Herbots H, Demaine A, et al. Paleolithic and neolithic lineages in the European mitochondrial gene pool. Am J Hum Genet. 1996;59:185–203.
Macaulay V, Richards M, Hickey E, Vega E, Cruciani F, Guida V, et al. The emerging tree of West Eurasian mtDNAs: a synthesis of control-region sequences and RFLPs. Am J Hum Genet. 1999;64:232–249.
Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, et al. A human genome diversity cell line panel. Science. 2002;296:261–2.
Dambricourt Malassé A, Gaillard C. Relations between climatic changes and prehistoric human migrations during Holocene between Gissar Range, Pamir, Hindu Kush and Kashmir: the archaeological and ecological data. Quat Int. 2011;229:123–31.
Dani AH. Gandhara grave culture and the Aryan problem. J Cent Asia. 1978;1:42–55.
Lazaridis I, Nadel D, Rollefson G, Merrett DC, Rohland N, Mallick S, et al. Genomic insights into the origin of farming in the ancient Near East. Nature. 2016;536:419–24.
Hofmanová Z, Kreutzer S, Hellenthal G, Sell C, Diekmann Y, Díez-del-Molino D, et al. Early farmers from across Europe directly descended from Neolithic Aegeans. Proc Natl Am Sci. 2016;113:6886–91.
Shinde V, Narasimhan VM, Rohland N, Mallick S, Mah M, Lipson M, et al. An Ancient Harappan genome lacks ancestry from Steppe Pastoralists or Iranian farmers. Cell. 2019;179:729–35.
Heller R, Chikhi L, Siegismund HR. The confounding effect of population structure on Bayesian skyline plot inferences of demographic history. PLoS ONE. 2013;8:e62992.
Hewitt GM. Genetic consequences of climatic oscillations in the Quaternary. Philos Trans R Soc Lond B Biol Sci. 2004;359:183–95.
We would like to thank Dr. Christine Watts for help in refining the paper. We also extend our gratitude to all sample donors for making this study possible. Z-UR is grateful to Dr. Jan Alam and Abdul Hameed from Hazara University, Pakistan, for discussions on the anthropological/archeological perspectives of the study. Z-UR is also grateful to former colleagues from Dr. Obaid lab at University of Karachi, Pakistan, for their support. We are also thankful to the four reviewers for providing helpful comments and suggestions.
This work was supported by the Strategic Priority Research Program (Grant No. XDA20040102), Second Tibetan Plateau Scientific Expedition, Research (STEP) (Grant No. 2019QZKK0607), National Natural Science Foundation of China (31620103907, 31601017), Chinese Academy of Sciences (QYZDB-SSW-SMC020), and Yunnan Applied Basic Research Project (2017FB044).
Conflict of interest
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Rahman, Z.U., Tian, JY., Gao, ZL. et al. Complete mitogenomes document substantial genetic contribution from the Eurasian Steppe into northern Pakistani Indo-Iranian speakers. Eur J Hum Genet (2021). https://doi.org/10.1038/s41431-021-00829-6