Introduction

Thailand, one of the biggest Buddhist countries, is in the geographical heart of Southeast Asia. The country is divided into six regions: the North, the Northeast, the East, the West, the Central and the South. Stretching down the Malay Peninsula, the South of Thailand borders the Gulf of Thailand to the East, the Andaman Sea to the West, and Malaysia to the South. On the basis of the most recent census of 65.9 million Thais,1 the vast majority practices Buddhism (93.6% or 61.7 million) and the largest minority practices Islam (4.9% or 3.2 million).2 The percentages flip in Southern Thailand.3 Over 80% of the 3.2 million Muslims in Thailand live in the five deep Southern Provinces of Narathiwat, Pattani, Yala, Satun and Songkhla (Figure 1).

Figure 1
figure 1

Southeast Asian map showing the locations of studied populations (represented by ) and compared populations (represented by ) in Thailand, Malaysia and Philippines. The sample size for each population varies from 46 to 304 unrelated individuals. The references for the compared populations are given in Table 1.

On the basis of historical and archeological evidence, prehistoric people inhabited the territory of deep Southern Thailand, and the area was nurtured by contacts with other cultures since the first century. Hindus from India migrated to occupy this area during the Indianization period (third to fifth centuries). The Buddhist Mon people then inhabited this area between the fifth and eighth centuries. Subsequently, the Malays populated the area, found the Hindu–Buddhist kingdom of Srivijaya, and dominated this peninsular from the eighth to eleventh centuries. The Hindu–Buddhist Malay kingdom of Patani Sultunate was established as early as the ninth century. The Khmer then governed the area from the eleventh to thirteenth centuries, followed by the southward movements of the Thai people from Siam.4 Islamization of the Patani Sultunate occurred in the fourteenth century. This territory became a vassal state under the Ayutthaya Kingdom in the early sixteenth century.4 Throughout history, Siam (Ayutthaya Kingdom, Thonburi Kingdom and then Thailand) attacked the Patani Sultunate many times over many hundred years.5 Finally, it was annexed as part of Siam under the Anglo-Siamese Treaty of 1909.6

Muslims in Thailand can be categorized into two groups: the Sunni-Malay Muslims (henceforth known as Thai-Malay Muslims (MUS)) and Shiite-Muslims (non-Malay Muslims). The non-Malay Muslims descended from Arabs and Persians, and they are fewer in numbers compared with the MUS who are of Malay ancestry. The non-Malay Muslims speak Thai as their main language and assimilate well in the Thai society. In contrast, the MUS speak ‘Yawee’, a local dialect of Bahasa Melayu (an Austronesian linguistic family). They follow the Malay traditions, use Malay names and listen to Malay music. All of these hinder population admixture among Thai Buddhists (BUD) and MUS.3,7 The modern history of Southern Thailand is marred by various separatist movements carried out by locals who resist the assimilation of their culture with the Bangkok/Buddhist-centric government. In other words, the ethnocentric policies of Thailand alienate the MUS.8 Therefore, MUS and BUD have remained culturally, linguistically, religiously and politically distant from each other. However, comparison of genetic structure between these two populations has never been carried out until this study.

Microsatellites or short tandem repeats (STRs) have been proven as a powerful tool for investigating phylogenetic relationships and for forensic identification, as these loci are distributed throughout the human genome, have a high mutation rate, and are highly polymorphic.9,10 Thousands of microsatellites have been mapped, but <20 (the CODIS and the ENFSI loci) are currently in use for individualization by forensic and population geneticists worldwide.11 In Thailand, studies on forensic STRs and their utilization for population genetics are quite limited.12, 13, 14, 15, 16

The present study used the 15 forensic autosomal microsatellites to investigate the genetic variation and structure of two Southern Thai populations from the three deep Southern Provinces of Thailand. The genetic relationships of the two studied populations and other populations residing in Thailand, Malaysia and the Philippines from earlier studies13,17,18 were also evaluated.

Materials and methods

Samples

Buccal swabs were collected, under informed consent, from 150 unrelated individuals from the deep South of Thailand, in the area of Narathiwat, Pattani and Yala Provinces (Thai-Malay Muslim, n=104, of which 17 were from Narathiwat, 44 from Pattani and 43 from Yala; Thai Buddhist, n=46, of which 6 were from Narathiwat, 15 from Pattani and 25 from Yala). Ethical approval was obtained from the Ethics Board of Prince of Songkla University. Since the statistical analyses in this study were on the basis of Bayesian-clustering algorithm, the compared populations used to determine population affinity were selected from previous studies in which raw genotypic data of 15 STRs are available. Nine neighboring populations from Malaysia (ML1, ML2, CH-ML and IN-ML), Thailand (TH-North, TH-Isan, TH-Central and TH-Southern) and Philippines (FI) were utilized (Figure 1 and Table 1).

Table 1 General information and genetic diversities using 15 short tandem repeats of populations included in the analyses

DNA extraction and STR typing

Genomic DNAs were extracted from the buccal samples using the Promega IQ kit (Promega Corp., Madison, WI, USA). Extracted DNAs were quantified using the Quantifiler Human DNA Quantification Kit (Applied Biosystem, Foster City, CA, USA). Multiplex PCR amplification from 0.5 ng of template DNA was performed using a commercial AmpFℓSTR Identifiler PCR Amplification kit (Applied Biosystem), which amplifies 15 autosomal STR loci and one gender determination locus: D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, D13S317, D16S539, vWA, TPOX, D18S51, D5S818, FGA, D19S433, D2S1338 and amelogenin. Amplicons were genotyped by multicapillary electrophoresis in an ABI Prism 3130xl Genetic Analyzer (Applied Biosystem) following the manufacturer’s instruction. The results were further analyzed using GeneMapper v.3.2.1 (Applied Biosystem).

Statistical analyses

ARLEQUIN 3.5 software19 was used to determine STR allele frequencies, Hardy–Weinberg P-values, observed heterozygosity (Ho), expected heterozygosity, number of allele and gene diversity. Significance levels for Hardy–Weinberg P-values were adjusted using sequential Bonferroni correction (α=0.05/15 or 0.0033).20 Several parameters of forensic and population genetic importance, including power of discrimination, match probability, polymorphic information content, power of exclusion and typical paternity index, were obtained with the Excel PowerStats v.12 spread sheet (www.promega.com/geneticidtools/powerstats).

To elucidate population affinity, both distance-based and model-based clustering approaches were employed. Pairwise genetic distances on basis of allele frequency variance (Fst), as well as the statistical significance using 1000 permutations were calculated using ARLEQUIN 3.5. The Fst distance matrix was then plotted in three dimensions by means of multidimensional scaling (3D MDS) using the STATISTICA 10.0 software (StatSoft, Tulsa, OK, USA). Principal coordinate analysis (PCoA) was constructed by a covariance matrix with data standardization using GENALEX 6.3 software.21

Population structure was evaluated according to religion (Islam, Buddhism and Christianity) through hierarchical analysis of molecular variance,22 as implemented in Arlequin. Further investigation of population structure was performed using Bayesian-clustering method implemented in STRUCTURE 2.323, 24, 25 under assumptions of admixture, correlated allele frequencies and LOCPRIOR model.25 The number of clusters (K) was from 1 to 8 with five replicates performed using an MCMC chain burn-in length of 50 000 iterations followed by 100 000 iterations run length. STRUCTURE Harvester26 was used to calculate the greatest estimated mean posterior probability (Ln Prob) averaged over five replicates,23 as well as a second order rate of change logarithmic probability between the subsequent K values (delta K)27 to identify the optimal K in the data.

The estimator of admixture coefficient was calculated using ADMIX 2.0.28 As the estimated coefficients are less affected by the stochasticity of the mutation process, and as the short time scale of human admixture process is influenced by genetic drift rather than accumulated mutation,29 both mutation and molecular divergence between the alleles were not taken into account.28,30

Results

Genetic variability and genetic structure

The average Ho and gene diversity in MUS were 0.787 and 0.786, respectively, whereas 0.801 (Ho) and 0.784 (gene diversity) were observed in BUD. The Ho and gene diversity values in both populations were in the same range as the other compared populations (Table 1). Deviation from Hardy–Weinberg equilibrium was detected in only one locus (D18S51) in MUS after Bonferroni correction. In MUS, the combined match probability was 1 in 4.4917 × 1017 and the combined power of exclusion was 0.9999986, whereas BUD had 1 in 2.4579 × 1016 combined match probability and 0.9999996 combined power of exclusion. The combined power of discrimination for these loci was robust (>0.999999999999999 in both populations). In MUS, the most polymorphic locus was FGA, reflected by the highest Ho and number of alleles, whereas in BUD this was found in D21S11 and D8S1179, respectively. As expected, the most polymorphic loci in both populations were highly discriminating, as shown by the relatively high PD values. This demonstrates that this set of loci will be useful for forensic identification. Two private alleles in MUS (32.3 at D21S11 and 15 at D7S820, respectively) and only one allele in BUD (31 at FGA) were found. The allele frequency distributions, population genetic parameters and forensic parameters are listed in Supplementary Tables 1 and 2 for MUS and BUD, respectively.

The proportion of genetic variation within and between the different religious groups was assessed by analysis of molecular variance (Table 2). Interpopulation differences were higher when the populations constituted a single group (0.59%) compared with when populations were grouped according to religions (0.41%). The nonsignificant Fct statistics (P>0.01) observed when populations were grouped according to religion indicates that genetic structure was only minimally influenced by religion.

Table 2 Analysis of molecular variance

Population affinity

We performed pairwise Fst comparison with test for genetic differences between MUS and BUD, as well as compared them with previously studied populations. Forty out of fifty-five pairwise differences were statistically significant (P<0.01) (Table 3), but MUS and BUD exhibited nonsignificant genetic difference. MUS also showed nonsignificant genetic difference from ML1, ML2 and TH-South. BUD did not differ from the following five populations: ML1, TH-North, TH-Isan, TH-Central and TH-South. Among the 11 compared populations, IN-ML, FI and CH-ML statistically differed from other populations, indicating high genetic divergence.

Table 3 Pairwise genetic distances among population data set based on Fst

We also conducted distance-based clustering (MDS and PCoA) to determine the genetic relationships among these populations. The 3D MDS plot shows genetic differentiation of IN-ML, FI and CH-ML in dimensions 1, 2 and 3, respectively (Figure 2). This agrees with the pairwise Fst analysis. The remaining eight populations clustered together around the center of the plot, which indicates close genetic affinity. Within the cluster itself, two population subgroups were weakly separated in dimension 3. MUS was grouped with two Malaysian populations (ML1 and ML2), whereas BUD was clustered with other Thai populations nationwide (TH-North, TH-Isan, TH-Central and TH-South). The PCoA result strongly agrees with the MDS result. IN-ML and FI were segregated from all populations on axis 1 and axis 2, which explained 49.34 and 23.54% of the distance matrix (Figure 3). CH-ML was dispersed further in an upward position on axis 3, which explained 16.11% of the variation (Figure 3). Again, a cluster of populations was seen in the middle of the plot. MUS lay near ML1 and ML2 but BUD formed a continuum running to axis 2 with TH-North, TH-Isan, TH-Central and TH-South. Interestingly, TH-South occupied the intermediate position among groups of populations from Thailand and Malaysia. This probably represents gene flow within TH-South.

Figure 2
figure 2

Multidimensional scaling plot depicting population affinity based on Fst distances. The stress value is 0.021. Genetic differentiation of IN-ML, FI and CH-ML are observed in dimensions 1, 2 and 3, respectively. Other populations clustered at the center of the plot. Within the cluster itself, two population subgroups are separated in dimension 3: MUS is grouped with Malaysians (ML1 and ML2), whereas BUD is grouped with other Thais (TH-North, TH-Isan, TH-Central and TH-South). See the meaning of population abbreviations in Table 1.

Figure 3
figure 3

PCoA was performed using a covariance matrix with data standardization. Axis 1, 2 and 3 explained 49.34%, 23.54% and 16.11% of the distance matrix, respectively. IN-ML and FI are separated from all the other populations. Similarly, CH-ML is above the other populations on axis 3 (lower panel). MUS lies near ML1 and ML2 but BUD forms a continuum running to axis 2 with TH-North, TH-Isan, TH-Central and TH-South. See the meaning of population abbreviations in Table 1.

Model-based clustering method was performed using STRUCTURE 2.3 to clarify population structures and their relationships. Because of the low level of genetic divergence among populations (Fst of 0–0.0189), we executed the analysis using sampling information, which provides more information.31 Cluster (K) values were varied from 1 to 8. According to the highest LnP (K), the most suitable K23 was observed at K=3. The ad hoc statistic delta K27 showed a strong modal peak at K=2. Table 4 shows the proportion of membership of each pre-defined population in each of the two or three clusters with mean LnP (K) and δK. At K=2, most populations belonged to cluster 1, with IN-ML as the only exception (membership proportion of 0.648 in cluster 2). When K was increased to 3, a new delineated cluster belonging to FI emerged (0.725). Clusters 1 and 2 still displayed similar proportion of membership. The consistency between MDS, PCoA and STRUCTURE results indicate genetic distinction of IN-ML, FI and to a lesser extent the CH-ML. STRUCTURE result also reveals no clear population differentiation in cluster 1, which was comprised of the two studied populations (MUS and BUD) and other populations from all parts of Thailand (TH-North, TH-Isan, TH-Central and TH-South) and Malaysia (ML1, ML2 and CH-ML). In other words, our analysis clearly suggests genetic homogeneity among these populations.

Table 4 Membership proportions of each population in each of clusters (K) and other parameters estimated by STRUCTURE

Admixture estimation

MDS and PCoA positioned BUD, MUS and TH-South midway between ML1 and ML2, TH-North, TH-Isan, TH-Central and FI. This suggests an admixed origin for BUD and MUS; thus we further evaluated population admixture. In the admixture analysis, two groups of populations (the admixed ones and the parental ones) were defined. The BUD, MUS and TH-South were considered as admixed populations who received genetic contributions from their putative parental sources: the Malay group (ML1 and ML2) and the Thai group (TH-North, TH-Isan and TH-Central). We discounted FI as a parental population based on historical evidence, even though FI was located periphery to the admixed groups. Figure 4 depicts the estimators of admixture coefficient. The parental Malay group contributed a lower proportion to BUD, MUS and TH-South (2.83, 33.49 and 36.11%) when compared with the parental Thai group (97.17, 66.51 and 63.89%).

Figure 4
figure 4

Genetic contributions from the parental Malay (black) and Thai (white) to the admixed populations. The parental Malay groups contributed a lower proportion to BUD, MUS and TH-South (2.83, 33.49 and 36.11%) when compared with the parental Thai group (97.17, 66.51 and 63.89%).

Discussion

Several studies on molecular genetic loci have been reported with current Thai populations. However, those studies focused on linguistic and geographic factors in determining a population's genetic structure.15,16,32 In this study, we focused on the religious background of populations to determine the genetic relationship of MUS and BUD in the deep South of Thailand using 15 autosomal STR loci. On the basis of historical and anthropological facts, the people in Patani Sultunate originally practiced Hindu–Buddhism. Islamization took place around the 12th to 15th centuries, and the Patani Sultanate continued to prosper until it fell under Siamese control in the early 16th century.33 Local legend told of a learned Muslim who cured Raja Indra (the ruler of Patani Sultanate at that time) on the condition that he would be converted to Islam. From then on, two distinct religious populations have inhabited this area: the Muslims and the Buddhists.

According to the historical evidence, present day MUS could be the descendants of either local Hindu–Buddhist converts or Muslims from India who arrived during the spread of Islamic faith into the Malay Peninsula. Our prominent genetic finding is the genetic homogeneity between MUS and BUD (Figures 2 and 3). This supports the theory that MUS descends from the Hindu–Buddhist converts, who underwent cultural transformation during the reign of Raja Indra. Cultural transformation has shaped several Muslim communities, for example, Indian Muslims and Vietnamese Cham.34, 35, 36

Despite the genetic resemblance between MUS and BUD, some degree of differentiation was observed. MUS seemed to be genetically closer to the Malaysian Malays (ML1 and ML2), whereas BUD were more closely related to the remaining Thai populations nationwide. This suggests demographic movement through politics and trades within this micro-geographic region after the Islamization event.37 Therefore, we performed admixture analysis to infer recent genetic history between MUS and BUD. Parental Malay population contributed 30% and 3% to the gene pool of MUS and BUD, respectively (Figure 4). This finding is consistent with MDS and PCoA results (Figures 2 and 3). Admixture analysis indicates weak genetic divergence between these two studied populations, possibly due to a recent cultural barrier. Admixture is common in human evolution and exists in various populations worldwide.15,38,39 The local history indicates that the two populations differ in culture, language and religion. Previous studies have shown that these factors can be barriers that hinder genetic admixing.7 As such, the weak divergence observed could be because MUS amalgamate with the Malays, whereas BUD tend to mix with other Thai populations. Interestingly, the TH-South population from Shotivaranon et al.13 also displayed admixed origin. Nearly one-third of their genetic content was contributed by the Malay populations. TH-South samples were collected randomly from 211 Southern Thai individuals regardless of their ethnic affiliation. Therefore, genetic heterogeneity among individuals composing of BUD and MUS could be seen in current admixture analysis.

The studied populations displayed high heterozygosity for the 15 STR loci used. Only one locus deviated from HWE (D18S51 in MUS), indicating non-consanguineous marriages in both populations. The forensic parameters of this STR panel support previous findings that the loci are appropriate for forensic testing of Thai samples.12,13,15 Moreover, the loci reveal genetic relations of the two deep South populations. With the recent expansion of the CODIS loci from 13 to 20 loci40 and the availability of new 24-loci multiplexes (for example, GlobalFiler PCR Amplification kit), it is possible that we will be able to gain more insight into the genetic history of BUD and MUS. These newer loci have been selected due to their high heterozygosity.41 Thus, it might be possible to better differentiate the two populations. Additional information from more advanced techniques such as high throughput SNP genotyping platform, as well as the uniparentally-inherited loci (Y-STRs and mitochondrial DNA) should help elucidate the genetic relationships of the Thai populations.15,42,43

Our analyses using forensic loci on the genetic structure and affinity among BUD and MUS residing in the deep South of Thailand showed that (1) BUD and MUS exhibited genetic homogeneity due to common biological ancestry; (2) despite genetic similarity, some genetic divergence was noticeable. MUS are genetically closer to the Malaysian Malays, whereas BUD are more closely related to the other Buddhist populations in Thailand. Cultural barriers might have obstructed recent admixture between the studied populations. In contrast, admixture in short time-scale between MUS and Malaysian Malays could have been promoted by religious affiliation; and (3) cultural transformation and religious difference apparently contributed to the genetic structure of BUD and MUS.