Genetic polymorphism and evolutionary differentiation of Eastern Chinese Han: a comprehensive and comparative analysis on KIRs

Killer cell immunoglobulin-like receptor genes, namely KIRs, cluster together within the 160 kb genomic DNA region. In this study, we used PCR-SSP approach and successfully identified the genotype of 17 KIR genes in 123 independent healthy donors residing in the Jiangsu province, China. All individuals were positive at the 7 genes. The observed carrier gene frequencies (OFs) of remaining 10 KIRs ranged from 14.63% (KIR2DS3) to 95.93% (KIR3DL1). We found 27 distinct genotypes excluding KIR1D. The most frequent occurred in 63 individuals (51.22%). The linkage disequilibrium analysis signified 29 positive and 6 negative relations in 45 pairwise comparisons. To study population differentiation, we drew a Heatmap based on the data of KIRs from 59 populations and conducted Hierarchical Clustering by Euclidean distances. We next validated our results by estimating pairwise DA distances and illustrating a Neighbor-Joining tree, as well as a MDS plot covering 3 additional Chinese Han groups. The phylogenetic reconstruction and cluster analysis strongly indicated a genetically close relationship between Eastern and Jilin Hans. In conclusion, the present study provided a meritorious resource of KIR genotyping for population genetics, and could be helpful to uncover the genetic mechanism of KIRs in immune disease in the future.


Methods
Study samples. Blood samples were collected from a total of 123 randomly selected healthy donors of Han ancestry living in Nanjing, Jiangsu province of China, membership of Eastern China. All participants provided their written informed consent and completed a basic health screen. Also, each participant was interviewed to ensure that no individuals have common ancestry going back at least three generations and their three generations are native of eastern coastal area of China. The whole-blood samples anti-coagulated with ethylene diamine tetraacetic acid (EDTA) were frozen at − 80 °C until use. The study was conducted in accordance with the human and ethical research principles of Nanjing Medical University and approved by the ethics committee of Nanjing Medical University. DNA isolation. According to the manufacturer's instructions, we extracted genomic DNA from 300 μ l peripheral blood containing ethylene diamine tetraacetic acid (EDTA), using TIANamp Genomic DNA Kit (TIANGEN, Beijing, China). The quality and quantity of extracted DNA samples were determined by NanoDrop 2000 (Thermo Fisher Scientific, Waltham, USA). The optimal density values used to evaluate the concentration and purity of extracted DNA reflected by the A260/280 values (1.7 to 1.9). The concentration was adjusted to 20-40 ng/μ l.
KIR PCR-SSP genotyping. KIR genes were genotyped for the presence or absence of the 17 KIR genes, including KIR2DL1, 2DL2, 2DL3, 2DL4, 2DL5, 3DL1, 3DL2, 3DL3, 2DS1, 2DS2, 2DS3, 2DS4, 2DS5, 3DS1 (in the full-length form), 1D (in the deleted form) and two pseudogenes, 3DP1(putative protein product) and 2DP1(no protein expression) using PCR-SSP method with a commercially available KIR GENOTYPING SSP KIT (Invitrogen, California, USA). The PCR amplification was performed with the PCR system in a reaction mixture volume of 10 μ l consisting of 4 μ l pre-aliquoted PCR buffer, 0.06 μ l Taq polymerase, and 30 to 50 ng of template DNA. Temperature cycling conditions for PCR reactions were as follows: denaturation for 1 minute at 95 °C, followed by 30 cycles for 20 seconds at 94 °C, 20 seconds at 63 °C, 1.5 minutes at 72 °C, a elongation step for 10 minutes at 72 °C and finally hold in 4 °C. PCR products were visualized under ultraviolet light after electrophoresis in 1.5% agarose gel well mixed with 10% v/v ethidium bromide. Negative controls were performed for each gene while positive internal controls were performed for each lane. False reactions that yielded no internal control bands were repeated.
Scientific REPoRtS | 7:42486 | DOI: 10.1038/srep42486 Statistical analysis. The observed carrier frequencies (OFs) of the KIR genes were determined by the number of positive typing reactions. Based on the assumption of Hardy-Weinberg equilibrium, we calculated the estimated gene frequencies (GFs) using the formula, GF = 1− (1− OF) 1/2 . The GF is determined by the OF of the KIR gene in all individuals. Package "pheatmap" (https://cran.r-project.org/web/packages/pheatmap/index.html) based on statistical software R version 3.2.5 (https://www.r-project.org/) was used to draw a Heatmap containing Eastern Chinese Han and 58 other populations with complete KIR genotyping files of 16 31,32 distributed around the world. The Heatmap is constructed using Hierarchical Clustering algorithm based on Euclidean distance. The Hierarchical Clustering model generally refers to the assumption that irreducible correlation functions are described by the hierarchical relations: ξ n = Q n ξ n−1 2 , where ξ n is the nth order correlation function, and the Q n is constant 33,34 . The D statistic included in recognized "Genetics" package (https://cran.r-project.org/web/packages/genetics/index.html) was used to conduct linkage disequilibrium (LD) analysis (KIRs whose OFs = 100% were excluded). The calculated formula and according statistics principle reveal the sign of coefficient D which represents the same or opposite allelic association 35,36 . Specifically to KIR genes, the completely positive LD (D = 1) indicates both loci are present or absent simultaneously. Oppositely, the complete negativity (D = − 1) means just only one of the two loci is present. According to the observed carrier frequencies data of 13 variable KIR genes (KIR-2DL1, 2DL2, 2DL3, 2DL4, 2DL5, 3DL1, 3DL2, 3DL3, 2DS1, 2DS2, 2DS3, 2DS5 and 3DS1) from the above-mentioned 59 populations and 3 other Han populations including Shanghai Han 25 , Hong Kong Han and Singapore Han 19 , Dispan software 37,38 was utilized to generate the D A genetic distances 38,39 and relevant significances without correction. According to the estimation formula, D A is a direct calculation of genetic association between any 2 populations 38 whilst F ST is a relative measure of genetic differentiation given the total genetic variation presents in the population 40 . From an accuracy point of view, D A genetic distance was commonly employed in studying KIRs 19,41,42 because Nei D A distance is proved to possess the highest probability of obtaining the correct branching pattern of a phylogenetic tree 43 . By using the distance matrix, we drew a Neighbor-Joining tree and assessed its reliability by interior branch test using Mega version 6.0 44 . As for Neighbor-Joining algorithm, it's a simplified version of the Minimum Evolution (ME) method, which doesn't require the assumption of a constant rate of evolution mentioned in Hierarchical Clustering algorithm. The N-J tree reconstruction starts with a starlike tree with no hierarchical structure and the necessary assumption is that there is no clustering of OTUs (operational taxonomic units) 45 . To validate the genetic relationship of the studied populations, we illustrated a Multidimensional Scaling (MDS) plot using the "MASS" packages (http://www.r-tutor.com/category/r-packages/mass). Chi-square test was conducted by SPPS 22.0 to depict the distribution variances between Eastern Chinese Han and Jilin Han (Northeast China) 27

Results and Discussion
Observed KIR carrier frequencies.   Table 2. The black and white box represented the presence and absence of 17 KIR genes in Eastern Chinese Han population. The haplogroup information was obtained from the website (http:// www.allelefrequencies.net/kir6001a.asp). Next, we defined the genotypes and linkage groups. With the exclusion of KIR 1D, we detected a total of 27 distinct genotypes. After careful comparison of the identified genotypes in the database, we didn't observe any new genotypes. The most common genotype (KIR3DL1-2DL1-2DL3-2DS4-2DL4-3DL2-3DL3-2DP1-3DP1, n = 63, ratio = 51.22%) was the same to Xinjiang Han, Yunnan Han, Sichuan Han and Shenzhen Han, among which 32 individuals carried KIR1D (Table 2). In Fig. 1, the classification of haplotypes, genotypes and linkage groups were shown intuitively. As we can see in Fig. 1A, 7 BB genotypes (5.69%), 53 AB genotypes (43.09%) and 63 AA genotypes (51.22%) were determined. Previous studies have demonstrated a great diversity of KIR genes among different populations from Asia, in which AB accounted for the most prevalent genotype in Shaanxi Han (48%) 18 , Chinese Kazakh (52.8%), and Chinese Uyghur (56.1%) 46 . It revealed a great diversity of KIR gene distribution among different groups. As for linkage group (Fig. 1B) Linkage disequilibrium (LD) analysis. Since KIR genes are close to each other by approximately 3 kb genomic DNA region, linkage disequilibrium should be taken into consideration, especially for those two nearest-neighbor KIRs (present or absent simultaneously). Therefore, we conducted the LD analysis, in which D was the parameter representing the test statistics linkage disequilibrium coefficient. P-value < 0.05 signified a strong genetic relation between two KIRs. Based on the genotyping profiles of 123 Eastern Chinese Hans, Table 3 and Fig. 2 listed the D coefficients and conspicuous differences of 45 pairs among 10 KIR genes except for 7 KIR genes which appeared in all individuals. The Linkage Disequilibrium comparisons were listed in an inverted triangle in Fig. 2.
The results of LD analysis showed 35 strong relations in total, among which 6 were negative, all associated with KIR1D (1D-3DL1, 1D-2DS4, 1D-2DL5, 1D-3DS1, 1D-2DS1, and 1D-2DS3). The rest 29 relations were significantly positive on different levels. In terms of group classification, 1 relation was from A group (3DL1-2DS4) while 10 pairs linked A and B groups (3DL1-2DL5, 3DL1-3DS1, 3DL1-2DS1, 3DL1-2DS5, 3DL1-2DS3, 2DS4-2DL5, 2DS4-3DS1, 2DS4-2DS1, 2DS4-2DS5, and 2DS4-2DS3) and 18 linkage relations existed within B group. After further demarcation, 6 linkage relations (2DL5-2DL2, 2DS2-2DL2, 2DS3-2DL2, 2DS2-2DL5, 2DS3-2DL5, and 2DS2-2DS3) and 5 (3DS1-2DL5, 2DS1-2DL5, 2DS5-2DL5, 3DS1-2DS5, and 2DS1-2DS5) were found within C4 and T4 subgroups from B family respectively, consistent to the genomic location of the genes. As for 1D, a known 2DS4 variant, we illustrated a very close linkage relationship between 1D and KIR2DS4 which showed the linkage coefficient was − 0.0245 (p < 0.001). Thus, the results supported that KIR1D is the variant of KIR2DS4 47 . These findings are consistent to other published studies [48][49][50][51] .       variant emerged with ameliorative phenotypes in East Asian, especially Central Chinese by natural selection 52 , we speculated that KIR3DS1 was beginning to surface and KIR2DS3 disappeared gradually due to evolutionary pressure such as microbial infection, communicable disease or natural environment on the route of "out of Africa". However, the potential causes remained to be discovered. The Hierarchical Clustering method was employed to reconstruct the cluster trees and molecular evolutionary structures based on Euclidean distance. The populations and KIRs were listed corresponding to the population genetic and homologue analysis, respectively (Fig. 3). Firstly, we found that molecular evolutionary structure was divided into two clusters. The left was composed of KIR genes defined as B group, while the right consisted of A group and KIR2DP1. The results were consistent with the deduced classification clones from original research 14  Phylogenetic structure and MDS analysis. In the validation study, we constructed another Neighbor-Joining tree (   Table 4. We found that the Eastern Han had the closest distance with Jilin Han (D A = 0.0006), followed by Yunnan Han (D A = 0.0011), with the farthest distance with Pacific Melanesian (D A = 0.1567).
In the phylogenetic tree, we could clearly find two main clusters. The smaller cluster consisted of Palestinian, Adygei, Tuscan, Hazara, Sardinian, North Italian, Barusho, Karitiana, Pima, Pathan, Papuan and Melanesian. Eastern Chinese Han gathered with Jilin Han in another cluster, following the sub branch containing Shanghai Han, Miao, Japanese, Hezhen, Naxi, Daur and Tujia, then with Yunnan Han and Shaanxi Han. Interestingly, all 7 African populations were genetically closer to populations from East Asia than West Europe, America and Pacific Ocean indicating that Negroid race had a closer relationship to Mongoloid than Caucasian. This finding supported the theory of "Out of Africa" [53][54][55][56] .
Using KIR as the panel of genetic marker, the results showed that Eastern Han from Jiangsu province had the closest relationship with Jilin Han, followed by Shanghai Han, indicating that Eastern Han shared a common ancestry, which was consistent to the findings based on other genetic markers, such as autosomal STRs and Y-STRs by Yao et al. 57 . The notable association of Eastern Han with Jilin Han was confirmed by the famous large-scale migration literally called "Crashing into Guandong". Since A.D. 1,650, to control the rapid expansion of Chinese Han population, Han from Eastern China was promoted by the national policy to migrate to Northeastern China. As a result, Chinese Han accounted for 80% of the total northeastern Chinese during the 18th century. Besides, that Eastern Han was genetically close to Shanghai Han could be explained by the close geographic location and long-term cultural interaction 58 Table 5. The chi-square (X 2 ) results of various KIR loci in 10 Han populations. Significant differences (p < 0.05) were labeled with bold and underlined format.
supported by the MDS plot analysis. X 2 analysis was also performed and we found that KIR3DL1, 2DL1, 2DS4, 2DS1, 2DS5 and 1D are highly polymorphic KIR loci specifically to Han Chinese. In conclusion, this research on Eastern Han characterized by generating KIR genotyping files, phylogenetic construction, evolutionary molecular analysis and sub-Han group comparison provided valuable resources for both enriching information pool of forensic and population genetics and benefiting in uncovering underlying genetic mechanisms underlying immune diseases in the future. The comprehensive and comparative analysis on KIRs revealed a unique genetic background of Eastern Chinese Han through phylogenetic construction, evolutionary molecular analysis, and sub-Han group comparison which provided valuable resources for both enriching information pool of forensic and population genetics.