Introduction

The linguistic landscape of India is composed of four major language families and a number of language isolates and is largely associated with non-overlapping geographical divisions. The majority of the populations speak Indo-European languages, which cover a large geographical area including northern and western India.1, 2 Dravidian languages are spoken primarily in southern India with some exceptions, eg, Brahui in Pakistan, Kurukh–Malto in eastern India and Gondi–Manda languages in central India. Austroasiatic language speakers are scattered in pockets mainly towards eastern and central regions, whereas Tibeto-Burman language speakers are found along the Himalayan fringe and in the Northeast of the subcontinent.1, 2 The genetic ancestry of Austroasiatic and Tibeto-Burman speakers in the subcontinent strongly correlates with the language. However, geography supersedes when we focus on the Indo-European and Dravidian languages.3, 4

The geographical distribution of languages in India is largely non-overlapping.5 However, eastern central India presents an amalgam of three major language groups.6, 7 This region is home to more than 30% of South Asia’s tribal populations, some of whom still practise hunting and gathering subsistence strategies.8, 9 Geographically, the rivers Narmada and Tapti act as abundant water sources, and the mountain ranges Vindhya and Satpura act as a significant geographical barrier to casual interaction with adjoining regions. The complexity of the geography and the fact that this area has historically lain outside of the main thoroughfares of commercial and cultural exchange between the subcontinent’s major Hochkulturen have rendered this region a fringe area, where from Neolithic and Chalcolithic times the local material cultures, as preserved in the archaeological record, were comparatively less developed.10, 11, 12 The combination of the more rudimentary technological level of development of the resident populations and geographical remoteness may have facilitated the gradual admixture and assimilation of incursive populations willing to adapt to the subsistence strategies practised locally, while impeding the bearers of technologically more advanced cultural assemblages.10 Previous studies have reported language shift among many populations living in this region (viz. Bathudi, Bhuiyan, Kanwar, Pando and Mushar) and referred to them as Transitional.13, 14 Nevertheless, these studies have also indicated that the process of language shift did not always greatly alter the genetic make-up of the local populations. The picture that is beginning to emerge from various genetic studies is that resident populations practising hunting and foraging and speaking now lost tongues adopted cultural influences and adapted linguistically as well as technologically to more advanced populations from other parts of South and Southeast Asia.4, 7, 15 In a similar vein, the linguistic assimilation of the local Munda populations in adjacent areas to the Austroasiatic language family provides a stunning case of language shift correlated with an exclusively male-biased linguistic intrusion from an area with a technologically more advanced level of cultural development.16

Unlike the caste populations in India, there are very few tribes with total population sizes ranging in millions. Among all the central Indian tribes, Gond is the most populous tribe and has a well-defined clan structure.8 With a population size of over 12 million, they are mainly found in eastern central India (Supplementary Figure 1). The time of the existence of Gond in the subcontinent is not known with certainty. However, they are mentioned in the epic Ramayana, and four of their kingdoms are dated to between 1300 and 1600 AD.17 By the medieval period, these kingdoms had assimilated so much religious and cultural influence from neighbouring Hindu culture that the Gond societies had become a socially more hierarchically structured tribal population.

Different groups of the extended Gond population speak Gondi, Konda, Kui, Kuvi, Pengo and Manda, all languages of the South Central branch of the Dravidian language family.8, 17 Linguistically, the Gondi–Manda subgroup shares its most recent common ancestry with Telugu that is mainly spoken in the state of Andhra Pradesh, including Telangana.18, 19 Ethnographical studies by Robert von Heine-Geldern20, 21 had suggested that a subset of Dravidian populations represented by the various Gond linguistic communities as well as the local ancestral component of the Munda populations collectively represent an older layer of peopling of the Indian subcontinent. This theory was adopted by Grigson,22 who proposed that the Gonds were an originally ‘pre-Dravidian’ or what he called ‘proto-Australoid’ population that had been modified by considerable Dravidian element. Christoph von FĂ¼rer-Haimendorf23, 24, 25, 26 conducted studies on the Gond and their closely allied Dravidian linguistic communities, which led him to view these peoples as remnants of an earlier primordial population that had been linguistically assimilated.

Work on the mitochondrial DNA of Gond population groups has shown that the majority of their maternal gene pool falls into South Asian specific clades with a few haplotypes belonging to the haplogroups M2, R7, M40 and M45 shared with the Austroasiatic populations.4, 7, 27, 28 The Y chromosomal and autosomal studies have suggested their deeply rooted South Asian ancestry.29, 30 However, previous genetic studies relied on either low-resolution data or studied only a single Gond group.4, 7, 27, 28, 29, 30 Therefore, in the present study, we extracted genome-wide SNP data (>95 K), of 18 Gond samples from two recent publications.31, 32 These 18 samples represent four distinct geographical locations, spanning three Indian states: three samples each of Gond1 and Gond3 from Madhya Pradesh and five samples each of Gond2 from Chhattisgarh and Gond4 from Uttar Pradesh (Supplementary Figure 1). We first explored the relation of the different Gond groups in respect to a wider Eurasian context and then evaluated their genomic diversity at the intra and inter-population level. Furthermore, we evaluated the population interaction and gene flow across the overlapping linguistic phyla in this region.

Materials and methods

Present analyses were performed on the merged data published in various genome-wide studies16, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 (Supplementary Table 1). This study was approved by the ethical committee of the CSIR-CCMB, India. The tribal and caste populations were grouped according to their linguistic affiliation. We renamed four Gond groups as Gond1, Gond2, Gond3 and Gond4 (Supplementary Figure 1). Gond1 and Gond3 are from Madhya Pradesh, Gond2 is from the Chhattisgarh state and Gond4 group from Uttar Pradesh (Supplementary Figure 1). We grouped populations that were known to have undergone language shift in recent time as Transitional.14 Plink 1.9 was used for the data curation, management and IBS (Identity-by-State) calculations.41 To remove background linkage disequilibrium (LD) that can affect both principal component analysis (PCA)42 and ADMIXTURE,43 we thinned the data set by removing one SNP of any pair in strong LD r2>0.4, in a window of 200 SNPs (sliding the window by 25 SNPs at a time).

We performed PC analysis using the smartpca programme of the EIGENSOFT package with the default settings44 to capture genetic variability described by the first five components. In the final settings, we ran ADMIXTURE43 with a random seed number generator on the LD-pruned data set 25 times from K=2 to K=12. We have used the methods described earlier16, 31 and found K=9 to be the best K. Given the result of the PC and ADMIXTURE analyses, we have removed one outlier sample from Gond1 and Gond2 groups for further population-based analysis. The outgroup f3 statistics44 was calculated as f3 = (Gond1/Gond2/Gond3/Gond4,X;Yoruba), where X was another Indian populations. To plot the alleles sharing of Gonds and other Indian populations with Dravidian vs Austroasiatic groups, we took the Paniya population as a representative of Dravidian and the Bonda (South Munda) population as a representative of Indian Austroasiatic. The selection of these populations was based on their outlier position and highest ASI (Ancestral South Indian) ancestry. To investigate the gene flow among different Indian populations, D statistics were used by taking African Yorubas as an outlier.44 We constructed the maximum likelihood (ML) tree of Indian populations considering four migration events using treemix45 with the -k4 flag; 25 replicates were made to assure convergence. For haplotype-based analysis (fineSTRUCTURE),46 samples were phased with Beagle 3.3.2.47 A co-ancestry matrix was constructed using ChromoPainter,46 and fineSTRUCTURE was used to perform an MCMC iteration using 10 m burning runtime and 100 000 MCMC samples. The number of samples and SNPs used for each of the analyses have been listed in Supplementary Table 1.

Results and discussion

To explore the variability and visualise the genetic structure of the four Gond groups, we first performed PCA. The majority of the Gond samples were shifted away from the Indo-European-Dravidian cline31, 37 (Figure 1a). Gond groups showed a gradient of affinity with Austroasiatic (Munda) populations from Gond2 being closest and Gond1 furthest to them, whereas Gond3 and Gond4 clustering together in between (Figure 1a).

Figure 1
figure 1

(a) Principal component analysis (PCA) of the combined autosomal SNP data of individuals from Eurasia. The inset picture showed the plot of mean eigen values of Gond and their genetic neighbours. (b) Plot of population-wise unsupervised ADMIXTURE analysis (K = 9) of world population with a zoom-in of various Indian populations including Gonds. The colour codes of the Indian populations have been given according to their linguistic affiliation shown in Figure 1a. Bhil_GUJ, Bhils from Gujarat; Bhil_MP, Bhils from Madhya Pradesh; Munda_N, North Munda group; Munda_S, South Munda group.

ADMIXTURE43 was applied to the pruned data set to visualise the multicomponent genetic structure of Gond (Figure 1b). The best-supported31 clustering (K=9) Admixture showed k4 (dark green) as predominant component among Gond groups (Figure 1b). The k7 ‘light green’ component was trifling compared with any Indo-European or non-Gondi Dravidian populations. Consistent with PCA, one sample from Gond1 and Gond2 showed deviation from the general pattern of genetic structure among the Gond. It is striking that the Gond groups were more similar in their ancestry component composition to the North Munda group than to their linguistic neighbours (Figure 1b). Therefore, the ADMIXTURE analysis suggest evidence for overwhelming North Munda (Austroasiatic) affinity with all the four Gond groups as well as gene flow between Gond1 and Dravidian or (and) Indo-European speakers. In contrast with many central Indian indigenous populations (Bhil, Kol), the proportion of Austroasiatic specific component is significantly (two-tailed P-value <0.0001) higher in each of the Gond groups. Such observations point out a significant difference in the admixture process between the Munda and Gond groups as compared with the admixture of Kol, Bhil,48 Nihali and others with the Munda groups.

To have a better understanding of genome sharing of the Gonds with the extent of other Indian populations, we applied the haplotype-based analysis fineSTRUCTURE.46 This programme generates a co-ancestry matrix using ChromoPainter46 and compares the haplotypes of each and every individual with one another. On the basis of haplotype sharing among the studied groups, we compared the mean chunk counts donated by Eurasian populations with various Gond groups (Figure 2a). Consistent with the PCA and ADMIXTURE analysis, two of the outlier samples showed a different pattern. Hence they were excluded from any population-based comparison. As expected from PCA and ADMIXTURE analyses, all Gonds received the majority of the chunks from South Asian populations when compared with other Eurasians. Among the South Asians, Munda, the Transitional group and the Gond themselves were the major chunk contributors (Figure 2a). It is interesting to note that the Gond populations received significantly lower number of chunks from Dravidians (two-tailed P-value <0.0001) than from the Munda groups. This conclusion holds even after comparing with the Telugu speakers who are closest to them linguistically. The excess amount of allele sharing between Gond and Munda populations is also evident in IBS analysis (Supplementary Figure 2) as well as by the outgroup f3 statistics (Figure 2b). We have also estimated the D-values.44 When we filtered the top 10 D-values of gene flow for each of the Gond sets, we found similar results supporting the extensive gene flow among Gond and Munda groups (Table 1).

Figure 2
figure 2

(a) Plot of mean sharing of chunk counts donated by Eurasian populations to the Gonds; (b) plot of shared drift f3 analysis. The values were calculated as f3 = (X, Y; Yoruba), where X was another Indian population and Y a different Gond group. The colour codes of the populations followed their linguistic affiliation. Indian_TB, Indian Tibeto-Burman; SEA, Southeast Asian.

Table 1 The top 10 values of D statistics showing the gene flow between Gonds and other Indian populations

The striking genetic affinity of Gond with Austroasiatic (Munda) populations is consistent in all our analyses (Figures 1 and 2 and Table 1). One reason for such closeness could be the process of language shift, which is common and reported among several populations of this region.13, 14 However, it is noteworthy that the populations reported to have undergone language shift are numerically smaller and do not cover a vast geographical area such as that of the Gond. Moreover, the total number of Gond is equal to the number of Austroasiatic speakers of India.49 By considering the case of language shift we modelled the scenario considering Gond originally as an Austroasiatic population, which has recently changed its language to Dravidian. In this case we should expect largely similar amount of chunks donated by an outlier distant Austroasiatic population (Bonda) to Gonds and their present Austroasiatic (both North and South Munda) neighbours. However, this was not the case in our analysis, and we observed significantly higher Bonda chunks among North and South Munda neighbours than any Gond group (Table 2). Hence, this weakens the case for any recent language shift of Gond from Austroasiatic speakers and suggests a distinct genetic identity of the Gonds.

Table 2 The result of two-tailed P-values when counting the donated chunks from Bonda (South Munda) to Gonds vs their neighbouring North and South Munda populations

To compare the gene flow of Gond with Munda and Dravidian populations, two outlier populations, one from each group, Bonda (South Munda) and Paniya (Dravidian), were selected as distinct representatives of these language groups (see Materials and Methods section). The D statistics showed a significant level of gene flow between Gond and Munda groups when compared with Telugu speakers (Table 3 and Supplementary Table 2). However, the Gond showed largely similar levels of gene flow from both North and South Munda groups. Conversely, gene flow between North Munda and South Munda was significantly higher when compared with the Gond groups (Table 3).

Table 3 The level of significance of gene flow between Gond, Telugu and Munda populations

We have plotted the shared drift values (calculated via f3 statistics) of extant Indian populations with respect to the outlier Bonda (South Munda) vs Paniya (Dravidian) populations (Supplementary Figure 3). As both of the populations carried high amounts of ASI ancestry, we should expect a linear trend of population assemblage. The excess of Paniya or Bonda related alleles in a particular population would place it towards that axis, away from the central line. We observed a deviation of the Gond groups from their linguistic neighbours in the direction of the Austroasiatic populations (Supplementary Figure 3). The digression of Tharu is also evident here, supporting our previous conclusion, suggesting that up to one half of their genome would be East Asian specific.50 Interestingly, the f3 statistics plot also revealed a clear-cut distinction of the Gond from their neighbours, which include Transitional and Nihali populations, in sharing the different proportions of Munda and Dravidian alleles (Supplementary Figure 3).

To visualise the affinity of Gond with other Indian populations and infer potential migration events, we drew a ML tree by using the method applied in treemix.45 In the ML tree, all the Gond groups cluster with the western side of the Austroasiatic cluster (Supplementary Figure 4a), in consonance with a similar trend observed previously in the PCA plot (Figure 1a). With four migration events, substantial gene flow among the populations living in the central Indian region including Gonds is being revealed (Supplementary Figure 4b) supporting the notion that the central Indian region served as a selective melting pot for various populations speaking different tongues. The effect of the geography, language or ethnicity, which are major factors in other geographical regions, is minimised by the fact that eastern central India has acted as a marginal sink area. In this respect, eastern central India differs from regions such as Central Asia, where the genetic landscape was significantly shaped by the intrusion of Turkic nomads,51 with a contrasting example of the Caucasus region.35

In conclusion, our extensive analysis of genome-wide genetic diversity on various Gond groups has revealed that all the Gond groups shared extensive portions of their genomes within the group as well as with North and South Munda groups. The distinctive gene flow patterns observed suggest a different population history of the Gond groups than that of their neighbouring populations. Within the overall South Asian landscape, the eastern central Indian region, with multiple language groups, is exceptional, where geography is not the major determinant correlating with genetic variation. Hence, our wide-ranging investigation on the Gond and their neighbours living in central India has shown population interaction and gene flow between various language groups transgressing the linguistic barrier by linguistic assimilation of resident populations to small but technologically more developed incursive groups.