Introduction

Lakshadweep is an archipelago of about thirty-five islands, scattered over approximately 78,000 square km of the Arabian Sea, 200–440 kms off the south-western coast of India1,2. The literal meaning of Lakshadweep is “one hundred thousand islands” in Sanskrit. The total population size of the archipelago is approximately 65 thousands, however, it lacks aboriginal population1,3. Presently, majority of the populations follow Islam and are ethnically similar to Malayali people of Kerala state2. The first human settlement of this archipelago is not very clear, nevertheless it was under the power of various regime. Amini, Kalpeni Andrott, Kavaratti and and Agatti are the oldest inhabited islands. The islands were known to sailors since ancient time2. Jataka stories of Buddhism has mentioned these islands, which documented the spread of Buddhism to the islands during 6th century BC1,4. The local stories suggest the arrival of Islam in 661 AD by Arabians. Later, Cholas ruled the islands in 11th, Portuguese in 16th, Ali Rajahs in 17th, Tipu Sultan in 18th and finally it was under British Raj in 19th Century1,2,3,4,5,6. Arab traveller Ibn Batuta mentioned about these islands in many of his stories2.

Due to its geographical location, the present day Lakshadweep populations may offer a unique insight into historic migration events. The exposure of these islands to people from pan-world ethnicity might have created a mosaic like pattern in their genomic ancestry, however no genetic study so far has been done on populations living in these islands. Therefore, for the first time, we have done extensive DNA sampling of several major islands of Lakshadweep (Supplementary Fig. 1), and performed a high-resolution analysis of haploid markers. We looked specifically for the composition of various haplogroups (hg) present in these islands and their intra and inter-relations to the surrounding populations.

Methods

Sampling

Approximately, 5–10 ml blood samples, were collected from 557 individuals belonging to eight major islands of Lakshadweep (Agatti, Andorth, Bitra, Chetlat, Kadmat, Kalpeni, Kiltan and Minicoy), with informed written consent (Supplementary Fig. 1). With detailed interview procedure about their family pedigree, we have excluded the samples of the people who are related for minimum of three generations. This project was approved and permitted by the Institutional Human Ethical Committee, Mangalore University, Karnataka, India and Institutional Ethical Committee of the CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India. All methods were performed in accordance with the relevant guidelines and regulations.

Genotyping

We have used Sanger Sequencing method and utilised 23 F and 23 R markers described elsewhere7 to sequence the Hypervariable segment I (HVS-I) of mtDNA. We sequenced both of the strands to minimize errors. Variations were recorded against the r-CRS8. To assign those samples in haplogroups, we subsequently genotyped coding region diagnostic mutations and used these combined information to assign haplogroups (Table 1, Supplementary Tables 1 and 2). They were further verified by consulting the data published till date in PhyloTree build 169. In case of Y chromosome, we genotyped 166 samples for 44 biallelic markers published recently10 to assign the Y chromosomal haplogroups (Supplementary Table 2). For all the Y chromosome markers, we utilised Sanger sequencing (ABI 3700, Applied Biosystems) method and compared it with the reference to note the variations.

Table 1 The distribution of various maternal haplogroups into the islands of the Lakshadweep.

Statistical analysis

For statistical analysis, first we plotted the PC1 vs PC2 estimated by using POPSTR (kindly provided by H. Harpending), to infer the inter and intra relationship of populations. We plotted the regional haplogroup affiliation of all the islands over their geographical co-ordinates. The classification of different haplogroups to East Eurasia, West Eurasia and South Asia was done manually considering few major points e.g. their exclusive presence in a region, their high frequency and diversity in a region and based on classifications used in previous publications11,12. In cases where the particular geography was not determined with the current resolution of data, we have put them as haplogroup of ‘unknown’ origin. We also took special care for haplogroup R1a-M17 which harbours substantial amount of Indian paternal ancestry13. It is a haplogroup of debatable origin14,15, we used it as such. The haplotype diversity and AMOVA tests were performed by using Arlequin 3.5 software16 Median-joining and reduced median networks were reconstructed with NETWORK program (version 5)17. Reduced median and median-joining procedures were applied sequentially for the analysis and they showed similar network structure. For comparison, we have used various datasets published elsewhere11,18,19,20,21,22,23,24,25,26,27,28,29.

Results and Discussion

The Lakshadweep islands, were well known to the maritime sailors since ancient times, presumably due to a staging point between different continents1,2. The extensive maritime history facilitated the access of these islands by various ethnic groups, therefore, it is likely that the genetic landscape of these islands might have shaped uniquely. However, this assumption has not been investigated at the molecular level that simultaneously examines the ancestry composition of different islands and their genetic relationships that might also co-vary with distance from neighbouring regions. To validate these assumptions, we have analysed 557 samples for mitochondrial DNA (mtDNA) and 166 samples for Y chromosome markers and compared them with the neighbouring regions.

Our analysis on mtDNA has revealed that nearly 56% of the maternal lineages of Lakshadweep islands belong to three major haplogroups (M2b2, M30 and R30) (Table 1 and Supplementary Fig. 2). It is interesting to note that in the background of major haplogroups, all of the individuals carried mostly a single haplotype; therefore we call them as founder haplogroup (Supplementary Table 1). These founder haplogroups are virtually present in all the studied islands. Notably, haplogroup R30, which is also reported previously in the coastal South Asia including Sri Lanka21,30, is the most frequent haplogroup in many of the Lakshadweep islands (Table 1 and Supplementary Table 1). To have a spatial understanding of various maternal lineages present in Lakshadweep, we have grouped haplogroups in to regions and plotted with their geographical co-ordinates (Fig. 1). All the islands (except Minicoy) showed major component of maternal ancestry associated with South Asia. The southern most islands Kalpeni, and Minicoy show substantial West Eurasian-specific maternal ancestry, which is largely similar to the Maldives and Sri Lanka. Additionally, Minicoy island carried East Eurasian-specific haplogroups, which is otherwise virtually absent in this region.

Figure 1
figure 1

The spatial distribution of regional maternal components in Lakshadweep islanders in comparison with other neighbouring populations. South Asian haplogroups—mtDNA: M2–6, N5, M33–65, R5–8, and R31–32. East Eurasian haplogroups—A–G, M7–12, M66, R22, and N9. West Eurasian haplogroups H-K, U1, U4, U7 and U9 Unresolved haplogroups—M*, R*, N* including other lineages, for example, M31 and M32.

The principle component analysis (PCA) of maternal ancestry including South Asian major regions formed a distinct cline of Lakshadweep populations, which is likely due to drift created by strong founder effect seen in their maternal ancestry (Fig. 2). The closest cluster to many of these islands are the populations of Kerala with whom they share their most common haplogroups i.e. M2 and R30 (Supplementary Table 2).

Figure 2
figure 2

The scatter pattern of Lakshadweep islands groups in the PC1 vs. PC2 analysis for mtDNA within other regional South Asian populations.

We measured the haplotype diversity of island populations with respect to the mainland South Asian populations (Supplementary Fig. 2a). Our data indicate a significant (two tailed p value < 0.001) reduction in the haplotype diversity of Lakshadweep populations. The most likely explanation of this reduction is that a few individuals were introduced to these islands, which therefore created a ‘bottleneck followed by founder effect’ like scenario.

In order to have better understanding of founder lineages present in different islands of Lakshadweep we have reconstructed a median-joining network of maternal haplogroups identified for Lakshadweep islands (Fig. 3). The network analysis clearly showed the high prevalence of specific haplotypes present in the islands, likely stemming out from a single founder in each of their haplogroup background. Apart from major common founder maternal lineages, such as hg R30 followed by hg M2 (the top two maternal founder lineages of several islands), some of the islands carry specific founder lineages. For example, island Minicoy carry founder haplogroups M7 and M66 (East Eurasian specific) as well as hg U9 (West Eurasian-specific). Similarly, island Kalpeni harbour hg U4 (West Eurasian-specific) (Fig. 3 and Supplementary Fig. 1). The sharing of many of the maternal lineages among islands of Lakshadweep advocates a high level of intra-island maternal gene flow. At the inter-population level, they are closest to the Kerala populations with whom they share the same tongue (Supplementary Table 2). We have performed AMOVA analysis to compare the genetic relatedness of the Lakshadweep with the mainland South Asian populations. The analysis suggested a close genetic affinity of Lakshadweep populations with the South Indian Kerala populations (Table 2).

Figure 3
figure 3

The median-joining network of 557 mtDNAs belonging to Lakshadweep islands. Each sample represented on the diagram has been sequenced for the HVS-I region and genotyped for the coding region mutations that are indicated in Supplementary Table 1. Circle sizes are proportional to the number of mtDNAs with that haplotype.

Table 2 The population differentiation analysis by AMOVA comparing Lakshadweep populations with the other South Asian mainland and island populations.

In order to assess the paternal ancestry, we further analysed the Y chromosome markers among 166 individuals of Lakshadweep (Supplementary Tables 2 and 3). Since the sample numbers genotyped for Y chromosomal markers from various islands were uneven and five out of nine islands were genotyped for less than 20 individuals, we have pooled all the Y chromosomal data and represented as single data of Lakshadweep (Supplementary Tables 2 and 3). The paternal affinity illustrates a distinct pattern in comparison with maternal ancestry. In the PCA analysis, Lakshadweep inclines towards Maldives population (Fig. 4), which is likely due to hg J2-M172. In the region wise comparison, the paternal ancestry components of Lakshadweep are more similar to the Maldives, North India and Pakistan rather than Kerala (Fig. 5 and Table 2). This suggests a gene flow to Lakshadweep from all of these regions. However, similar to maternal founders, we also see a reduction in haplotype diversity due to strong founder effect for paternal lineages (Supplementary Fig. 3b). Three haplogroups J2a-M172, Ra1-M17 and R2a-M124 harbour approximately 85% of the paternal ancestry of the Lakshadweep. The most frequent haplogroup was hg R2a-M124 followed by hg R1a1-M17 (Supplementary Table 2). In comparison to the mainland India, the Maldives and Sri Lanka, Lakshadweep show reduced paternal diversity and stronger founder effect (Supplementary Figs 1 and 3b). However, due to absence of Y-STR data, we could not determine if all the persons belonging to one Y chromosomal haplogroup, has been derived from a single founder haplotype or from a multiple founder haplotypes.

Figure 4
figure 4

The genetic affinity of Lakshadweep island populations with other regional South Asian populations in the PC1 vs. PC2 analysis obtained from haplogroup frequencies for the Y chromosome.

Figure 5
figure 5

The spatial distribution of regional paternal components in Lakshadweep islanders in comparison with other neighbouring populations. South Asian haplogroups—C5, F, H, L, and R2. Southeast Asian haplogroups—C2, C3, D, and M–O. West Eurasian haplogroups G,I, and J. Unresolved haplogroups—K, P, Q and R1a.

In conclusions, the Lakshadweep populations share a large number of ancestries within the islands (except Minicoy). In spite of the fact that these islands served as a maritime crossroad, we have observed a strong founder effect with reduced diversity for both maternal and paternal ancestries. The maternal ancestry of the islands is largely derived from Southern India, whereas the paternal ancestry is showing higher affinity with the populations of the Maldives or Northern or Western part of the South Asia. The most diverse island is Minicoy which might have served as a popular destination for maritime sailors, thus received gene flow from various directions.