Population genetic portrait of Pakistani Lahore-Christians based on 32 STR loci

Phylogenetic relationship and the population structure of 500 individuals from the Christian community of Lahore, Pakistan, were examined based on 15 autosomal short tandem repeats (STRs) using the AmpFℓSTR Identifiler Plus PCR Amplification Kit and our previously published Y-filer kit data (17 Y-STRs) of same samples. A total of 147 alleles were observed in 15 loci and allele 11 at the TPOX locus was the most frequent with frequency value (0.464). The data revealed that the Christian population has unique genetic characteristics with respect to a few unusual alleles and their frequencies relative to the other Pakistani population. Significant deviations from Hardy–Weinberg equilibrium were found at two loci (D13S317, D18S51) after Boneferroni’s correction (p ≤ 0.003). The combined power of discrimination, combined power of exclusion and cumulative probability of matching were 0.999999999999999978430815060354, 0.999995039393942 and 2.15692 × 10−17, respectively. On the bases of genetic distances, PCA, phylogenetic and structure analysis Lahore-Christians appeared genetically more associated to south Asian particularly Indian populations like Tamil, Karnataka, Kerala and Andhra Pradesh than rest of global populations.

Pakistan is a multiethnic country, harboring 217 million people, of whom the majority is Muslim according to the Pakistan Burea of Statistics 1 . Minority religious affiliates residing in Pakistan include Hindus, Christians, Ahmedis, Baha'is, Sikhs, Parsis, and Buddhists, amongst others. The Christian population comprises of 2.5 million (1.6%), making it the second largest religious minority of Pakistan 2 . Lahore, the capital of the Pakistani province of Punjab, is the second-most populous city in Pakistan (11.13 million) with a Muslim majority (97%) and a Christian minority (2%). Christianity was initially imported by Reverend Thomas Valpy who was appointed as the first Bishop of Lahore in 1877 3 . Christians are considered to be descendants of a caste population of India 4 and while they are thought to be a relatively closed population because of religious constraints, yet amiable relations are sustained with the majority population.
Short tandem repeats (STRs), also known as microsatellites, are repetitive sequences of DNA with a repeat motif of four to six base pairs and are almost universally employed as forensic identity markers because they are highly polymorphic and heterozygous, have short sequence lengths and are distributed throughout the human genome 5,6 Although their mutation rates are significantly higher than those for single nucleotide polymorphisms (SNPs) 7 , they are none the less useful as genetic markers for population genetic studies, especially more recent genetic history 8 .
There have been many earlier studies of 15 autosomal STRs in various Pakistani populations except Christians. We emphasize that this population must be targeted as a whole, to understand the genetic context of Christians and its connection to the greater Eurasian continent. Hence, Lahore-Christian samples were evaluated based on fifteen autosomal STRs of Identifiler Plus Kit (Applied Biosystems) and already published data set of same male samples (YA004381) 9 on 17 YSTRs (DYS438, DYS393, DYS385a⁄b, DYS389I⁄II, DYS458, DYS437, DYS391, DYS392, DYS635 (Y-GATA-C4), Y-GATA-H4, DYS19, DYS390, DYS439, DYS456, DYS448). To affirm phylogenetic affiliations of this population, data sets were compared with referenced populations as given in Table 1

Materials and methods
Sample collection. About 3 mL blood was collected in EDTA vacutainer tubes from 500 unrelated Christian individuals residing Lahore, capital city of the Punjab province in Pakistan. Whatman blood stain cards were prepared for each sample with a unique sample ID that was henceforth used for processing.
DNA extraction and quantitation. Genomic DNA was isolated by an organic-extraction procedure described by Signer et al. (1988) 10  Genotyping. To perform genotyping on an ABI3730xl Genetic Analyzer (Applied Biosystems), 1µL of amplified product was added to 0.35µL GeneScan 500 LIZ size standard (Applied Biosystems) and 13µL highly deionized (Hi-Di) formamide. Data was analyzed using GeneMapper ID v3.2 to designate alleles in accordance with the Kit allelic ladder.
Quality control. The efficiency of the PCR amplification was monitored using Identifiler Plus Control DNA 9947A as a positive control and all reagents except DNA template as negative control. The STR analysis was conducted following the nomenclature recommendations of the DNA Commission of the International Society for Forensic Genetics (ISFG) 13 . The dataset was evaluated by the STRidER database 13 with QC report reference number STR000284.
Population datasets used for comparison. The STR data of Lahore-Christians was compared with the available data of indigenous and global populations (supplementary Table 1) derived from published sources as summarized in Table 1.

Results and discussion
Allelic frequencies and forensic parameters. A total of 147 alleles were observed over all loci and allele 11 at the TPOX locus was found to have the highest frequency of 0.46. Allelic frequencies at each locus are shown in Supplementary Table 2   Phylogenetic analysis. The neighbour-joining phylogenetic tree (Fig. 1A) 43 which also proclaim our phylogenetic analysis and migration history of Lahore-Christians. Moreover, it suggests that while South Indians and Pakistani Christians are geographically isolated, they have similar genetic origins. Structure analysis. Although 15 autosomal STR markers have limited differentiation power to detect population structure but are efficient to some extent in differentiating Lahore-Christians from 9 other reference populations. Structure analysis was conducted employing Structure2.3.4 software using the admixture model with correlated allele frequencies without prior population information (USEPOPINFO = 0). Number of inferred clusters varied from 1 to 6 with three repetitions using 50,000 burnin and 100,000 MCMC simulation for each K. Results are intuitively depicted by bar plot as illustrated in Fig. 2A. All populations were partitioned into K colored segments depending on the value of K.
Whereas, K = 3 was the most suitable configuration based upon output posterior probability results inferred using the Structure Harvester 44 as depicted in Fig. 2B. At K = 3 African American and Mongol were almost entirely filled with red and green component respectively. Lahore-Christians and Tamil shared blue color as major component structure in similar pattern that gradually diminished in next populations. Punjabi and Sindhi presented the mixture of green and blue components whereas Europeans (Caucasian, Romani) shared a mixture of red and green component to similar extent. While we may have expected Christians to exhibit some differentiation from the other Pakistani populations, it is not surprising that Lahore-Christians and South Asian Tamil are not differentiated by the STRs in the Identifiler panel using Structure.
Principal components analysis. A PCA plot was constructed from autosomal STR allele frequencies (Supplementary Table 1) among Lahore-Christians, 4 indigenous reference populations (Fig. 3A) and global populations (Fig. 3B). In Fig. 3A Lahore-Christians signified as divergent population in lower right quadrant while rest of Pakistani populations clustered in upper right quadrant. Other global populations were scattered in the plot. In Fig. 3B Lahore-Christians were compared to Indian and 7 other world populations. It shows that studied population is relatively closer to South Indian populations (Karnataka, Kerala and Tamil) as compared to others. In Fig. 3A,B components 1 and 2 explain 55% and 46% of the variance respectively indicating genetic distances between populations.
Multidimensional scaling plot was generated based on haplotype data of YSTRs, to figure out Lahore-Christians paternal lineage (Fig. 3C). In this plot the Lahore-Christians remained tightly clustered with South Indian (Tamil, Karnatka, Andhra Pradesh), Central Indians (Madhya Pradesh) and Bangladeshi population. Balmiki was found distantly associated in the same quadrant whereas other populations including Pakistani, European American, Mongol, African American and Uganda scattered in different quadrants. Our results also corroborated with the past reportings of the most frequent haplotypes from South Asia 45 . The outcomes of phylogenetic analysis presented that Lahore Christians are most closely related to Indians particularly Tamil and might share common ancestors. Moreover, there are clear genetic variations between Christians and rest of the populations. It also supports historical records that, following the geographical migration from India to Pakistan, this population got eventually recognized as Christians 46 .
Lahore-Christians are primarily nomadic, poses conservative lifestyle, religious practices, extremely endogamous culture and traditional occupation as compared to other Pakistani population. Tracing their trail of migration and relatedness with world populations would provide a glimpse of primordial trajectory. Genetic affinity of Lahore-Christians to South Asian Indian populations and their common nomadic practices indicates historical genetic relatedness. Migratory events lead to subsequent separation of both populations. Relatively higher genetic distance to other Pakistani population were observed in our current study. Previous reports have also suggested genetic similarities of Tamils representing their common origin but minimal signature of gene exchange with other nomadic groups 47 .
However there were certain inconsistencies seen in side by side comparison at fewer population groups based on autosomal and YSTRs due to limited availability of their respective samples data. However, all these analyses clearly indicate that Lahore-Christian population has close genetic affiliation to South Indian population. Moreover, significant differences were observed between Lahore-Christians and other Pakistani populations except Punjabi that seems bit closer. This might also indicate that Lahore-Christians and Pakistani Punjabi diverged gradually from native South Indians following its geographical migration, which also corresponded with historical records 48 .

Conclusion
We have provided evidence that the Christian population in Lahore, Pakistan, forms a sub-population among Asian groups and has some unique genetic characteristics 14,39,49 . Results of inter-population differentiations, PCA, phylogenetic and structure analysis revealed that Lahore Christians have relatively close genetic relationships with south Asians particularly Indians. Being closely related to South Indians therefore it showed close resemblance to Tamil, Kerala, Andhra Pradesh and Karnataka populations. In this population, the 15 autosomal STRs and 17 YSTRs provide ample information for lineage characterization. This data would be useful for studies of genealogy, historical migration of Pakistani populations and database development. Genetic data obtained from autosomal and YSTR are in accord with human migration history of Indo-Pak populations. However, there is need of a detailed mitochondrial study to assign them mitochondrial haplogroups for maternal lineage identification.