South Asia is the home to more than a fifth of the world’s population, and is thought, on genetic grounds, to have been the first main reservoir in the dispersal of modern humans Out of Africa.1, 2 Additionally, high level of endogamy within and between various castes, along with the influence of several evolutionary forces and long-term effective population size, facilitate the formation of complex demographic history of the subcontinent.3 Therefore, the ancestry of peopling of the South Asia is a question of fundamental importance in archaeogenetics, linguistics and historical disciplines, and it is not surprising that the number and timing of migrations in and out of South Asia is still vigorously debated.2, 3, 4 Researches from various disciplines focused on testing the hypothesis that several separate migrations entered to the subcontinent with each migration being associated with different tool technology, linguistic and genetic characteristics.2, 3, 5, 6 The mtDNA (mitochondrial DNA) data suggest deep autochthonous diversity with minor sharing with East and West Eurasians,3 whereas, in contrast with this, the recent autosomal data showed substantial similarities of their genome with Caucasus and West Asians.4, 6 However, at the current resolution, it is unclear that this sharing is extremely ancient or arisen with the arrival of new languages and farming.7

Among South Asian countries, India and Pakistan have somewhat better understanding of their genetic structuring, whereas genetic information from Nepal, Bangladesh, Bhutan, Sri Lanka and the Maldives are either published at the level of forensic data or restricted to few populations. Being at the offshoot of southernmost tip of South Asia and along the proposed southern migration route, the island of Sri Lanka has long been settled by various ethnic groups and may offer a unique insight into initial peopling of the subcontinent. Most importantly, it is one and only precursor of modern humans fossil in South Asia dated back to 37 000 years ago.8, 9 Therefore, it is important to study prehistoric human settlement of Sri Lanka, and their relation to other adjoining populations.

Though each and every population of South Asia is unique in terms of their genetic structuring, dialect and specific rituals, there are few language isolates who have hypothesized to be the remnants of the ancient settlers and may provide an insight to the initial peopling of the subcontinent.10 Hence, it is important to study the genome structure of these groups. However, it is also important to note that the language isolates may not be always the genetic isolates. Except from the Andaman islanders, none of the studied South Asian language isolates hitherto were found to be true genetic isolate.1, 11, 12 The linguistic isolate of Sri Lanka is known as Vedda (aka Vadda).8, 10 Vedda is small hunter-gatherer tribe existing in Northwest province of Sri Lanka. They are called as aboriginal people of Sri Lanka and were suggested to represent the indigenous population of the entire subcontinent.8, 10 The first genetic study of Vedda along with other Asian populations suggested their long period of isolation.13 However, the analysis of alpha-2-HS-glycoprotein allele frequencies supports the view that the Veddas are biologically most closely related to the Sinhalese.14 Till date, a high-resolution genetic data was not available from this population and their affinity with other populations of Eurasia remained obscure.

In previous issue, Ranaweera and colleagues9 rectified this issue by generating the novel informations from the HVS-I and HVS-II region of mtDNA to date for the Vedda, including other major ethnic groups of Sri Lanka. Through a well-covered sampling strategy, they filled a major geographic ‘white spot’ inhabited by ∼20 million people. Their analysis led to the more precise identification of South Asian-specific indigenous mtDNA haplogroups and a better understanding of the extent of East and West Eurasian admixture among Sri Lankan populations. In addition, for the first time by generating mtDNA data on relic Vedda populations, this study was able to confirm that they exhibit low genetic variability, which is consistent with their small population size and strong effect of genetic drift.

The haplogroup distribution and sharing among different ethnic groups is intriguing.9 Majority of mtDNA haplogroups were belonging to South Asian-specific clades (Figure 1). However, the Sri Lankan populations (except Indian Tamils sampled in Sri Lanka) have significantly higher West Eurasian ancestry than any other Southern Indian states (Figure 1). The most common West Eurasian haplogroup observed were haplogroup U1 and U7. Majority of haplotypes among studied populations have a complete or nearly complete match with South Indian variants, whereas only three haplogroups (M2, M6, M33 and R5) are common and share haplotypes across all the studied groups.

Figure 1
figure 1

(a) The sharing of maternal ancestry of Sri Lankan populations in comparison with different states of Southern India. (b) The most parsimonious tree of haplogroup R30 complete mtDNA sequences showing the most recent common clad of Vedda (R30b2). Coalescent times were calculated by a calibration method described elsewhere.16 16182C, 16183C and 16519 polymorphisms were omitted. Suffixes A, C, G and T indicate transversions, recurrent mutations are underlined. Synonymous (s) mutations are distinguished. Sequences were taken from the published and our unpublished sources.17, 18, 19 A full color version of this figure is available at the Journal of Human Genetics journal online.

Another important element of this study was generating the maternal haplogroup level data from one of the so far genetically virtually unstudied language isolate of South Asia known as Vedda.9 This study reports that the Vedda is most distinct among all the ethnic groups, which is likely due to elevated frequency of haplogroups R30, U1 and U7, altogether represent 64% of maternal lineages of Vedda. The haplotype distribution of these haplogroups clearly indicates this scenario as a result of founder effect(s) due to random genetic drift. Owing to this unique haplogroup structuring, Vedda single out as an exceptional tribal population of South Asia, having less than 30% individuals sharing haplogroup M. Majority of their individuals share a branch of haplogroup R30 (that is, R30b2), widespread mainly in the coastal region of South Asia and shows an expansion time between 8 and 24 kya (Figure 1b). Haplogroup U1 haplotypes, which is also reported from Southern India, were found among 12% of Vedda individuals. Haplogroup U7 haplotypes were shared with both North and South Indian populations. The future phylogeographic study of these haplogroups would able to find out the population sharing the closest common ancestry with Vedda.

From this study it is apparent that the; (1) considerable number of maternal lineages of Sri Lanka is shared with India, more precisely with southern part of India; (2) the maternal genetic structuring is shaped by both ethnicity and geography; and (3) the language isolate Vedda is not likely a genetic isolate and shares their lineages with their neighbors. As this study is lacking the highest level of resolution of mtDNA, therefore it is hard to establish any timeline for the presence of these haplogroups (most importantly west Eurasian-specific haplogroups), into this region, although the archeological record suggest the presence of modern human since Upper Paleolithic time.8 Moreover, mtDNA is highly prone to genetic drift, especially in small tribal populations like Vedda, thus one or more mtDNA founding haplogroups can easily be lost among them. Therefore, one cannot dismiss the possibility, for instance, that the Vedda had a geographical distribution wider and deep connection with other language isolates than that observed in modern populations.10

The demographic history of a population cannot be established by studying just a single locus. Therefore, to find out the signatures of ancient as well as recent admixture to reconstruct the demographic history for the Sri Lankan populations (including linguistic isolate Vedda), along with the complete mtDNA informations, high-resolution Y-chromosomal and high coverage complete genome resequencing data are essential. In the era of Genomics and cutting edge technology, it can be expected that in near future the complete understanding of the Sri Lankan genepool will help to contribute significantly to the knowledge of genetic variation of modern humans in South Asia.15