Original Article

Journal of Human Genetics (2016) 61, 181–191; doi:10.1038/jhg.2015.132; published online 26 November 2015

Coevolution of genes and languages and high levels of population structure among the highland populations of Daghestan

Tatiana M Karafet1, Kazima B Bulayeva2, Johanna Nichols3, Oleg A Bulayev2, Farida Gurgenova2, Jamilia Omarova2, Levon Yepiskoposyan4,5, Olga V Savina1, Barry H Rodrigue6 and Michael F Hammer1

  1. 1ARL Division of Biotechnology, University of Arizona, Tucson, AZ, USA
  2. 2Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
  3. 3Department of Slavic Languages and Literatures, University of California, Berkeley, CA, USA
  4. 4Institute of Molecular Biology, National Academy of Sciences, Yerevan, Armenia
  5. 5Russian-Armenian University, Yerevan, Armenia
  6. 6Institute of Oriental Studies, Russian Academy of Sciences, Moscow, Russia

Correspondence: Professor MF Hammer, ARL Division of Biotechnology, University of Arizona, Biosciences West, Tucson, AZ 85721, USA. E-mail: mfh@email.arizona.edu

Received 6 July 2015; Revised 11 August 2015; Accepted 8 October 2015
Advance online publication 26 November 2015

Top

Abstract

As a result of the combination of great linguistic and cultural diversity, the highland populations of Daghestan present an excellent opportunity to test the hypothesis of language–gene coevolution at a fine geographic scale. However, previous genetic studies generally have been restricted to uniparental markers and have not included many of the key populations of the region. To improve our understanding of the genetic structure of Daghestani populations and to investigate possible correlations between genetic and linguistic variation, we analyzed ~550000 autosomal single nucleotide polymorphisms, phylogenetically informative Y chromosome markers and mtDNA haplotypes in 21 ethnic Daghestani groups. We found high levels of population structure in Daghestan consistent with the hypothesis of long-term isolation among populations of the highland Caucasus. Highland Daghestani populations exhibit extremely high levels of between-population diversity for all genetic systems tested, leading to some of the highest FST values observed for any region of the world. In addition, we find a significant positive correlation between gene and language diversity, suggesting that these two aspects of human diversity have coevolved as a result of historical patterns of social interaction among highland farmers at the community level. Finally, our data are consistent with the hypothesis that most Daghestanian-speaking groups descend from a common ancestral population (~6000–6500 years ago) that spread to the Caucasus by demic diffusion followed by population fragmentation and low levels of gene flow.