Introduction

α (1,3/1,4) FucT (fucosyltransferase), encoded by the FUT3 (Lewis) gene, regulates the expression of Lewis antigens in the human Lewis blood group system. The human Lewis blood group system is mainly composed of two Lewis antigens, Lea and Leb. These were initially identified as red blood cell (RBC) antigens, but were later discovered in exocrine secretions that were not inherent to RBCs but were absorbed against erythrocyte membranes from plasma1. In this context, there are some complications in regard to the phenotyping of the Lewis system using the haemagglutination test due to poor antibody specificity2. Other phenotyping errors with physiological and pathological causes include the following: (1) some individuals whose erythrocytes are typed as Lewis positive on can subsequently show a Lewis-negative erythrocyte phenotype during disease or pregnancy3; (2) because of the gene dosage effect causing FUT3 heterozygous individuals to exhibit lower α1-4 FucT activity in secretions than homozygous wild-type individuals, genuine Lewis-positive individuals (Le (functional)/le (nonfunctional) may erroneously be typed as showing a Lewis-negative phenotype on erythrocytes4; (3) the histo-blood group A-glycosyltransferase, B-glycosyltransferase and Lewis fucosyltransferase act on a common precursor substance, H type-1, and this competition will result in a reduction of the Leb antigen in A and B individuals compared to individuals with blood group O. Thus, some Leb antibodies may falsely type Le(a-b+) as Le(a-b−)5. In addition, if we want to acquire phenotype information on the Lewis blood system from biological materials other than whole blood, such as bloodstains (which we used in this study), body fluid, or hair, we may need to determine the genotype of the Lewis blood group at the DNA level to infer the corresponding phenotype. Therefore, it is important to determine the genotypes of the Lewis system.

Despite being relatively uncommon, in contrast to the ABO and RH systems, the Lewis blood type displays clinical significance. A few examples of haemolytic transfusion reactions have been attributed to improper phenotyping by using Lewis antibodies6,7. Lewis antibodies have also been implicated in mild symptoms of HDN because Lewis phenotypes might be falsely typed in red cells from women and infants8. In addition, the identical donor–recipient pairs based on Lewis phenotypes were shown to have better graft survival than Lewis-incompatible pairs9. Therefore, it is necessary for the Lewis blood group system to be appropriately considered in transfusion and transplantation.

The protein structure of the Lewis enzyme is composed of an NH2-terminal cytoplasmic tail, a transmembrane region, a stem region and a COOH-terminal catalytic domain10. The point mutations in the coding sequence of FUT3 affecting Lewis enzyme activity depend on the corresponding amino acid characteristics and protein structure. The mutations like T202C, G508A, G667A, G808A, A1007C, T1067A will completely lose enzyme activity, while also some mutations like T59G, G484, C478T, G968C will lead to portion loss of it1,11,12,13. The G47C and G1022T mutations are also predicted to inactivate the enzyme14,15. The alleles composed of inactivating mutations are referred to as Lewis-negative alleles (le). An individual homozygous for le loses the ability to express the Lea and Leb antigens and presents the Lewis-negative phenotype Le (a-b-) on the red blood cell membrane. The alleles with mutations that do not influence enzyme activity are referred to as Lewis-positive alleles (Le).

The most common and important Lewis-negative alleles are le202,314, le59,1067, le59,508, le484,667, and other rare Lewis-negative alleles originating from them13. Moreover, these alleles show racial differences and specificity. For instance, le202,314 and le59,1067 are found mainly in European populations, while le59,508 is common in east Asian and African populations13,16. To date, le484,667 has been detected only in African populations1,13. Previous studies have focused on Lewis-negative genes in many countries in Asia, including East Asian countries such as China2, Japan17, Korea18, and Mongolia13; Southeast Asian countries such as the Philippines11, Thailand11, and Indonesia19; and South Asian countries such as Sri Lanka14. However, there are no reported genetic data on FUT3 from Pakistan, which is a multi-ethnic country located in South Asia. Thus, in this study, we performed a systematic sequence analysis of the Lewis gene coding region by sequencing to investigate the genetic variations of FUT3 and the molecular basis of the Lewis phenotype in Sindhi and Punjabi populations from Pakistan and to better understand the genetic origin of the Pakistani population, in combination with other reports about ancestry-informative markers.

Results

In the current study, the distribution of FUT3 alleles was in Hardy-Weinberg equilibrium. Here, we defined mutations with an rs number in the dbSNP database (https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?geneId=2525) and defined mutations without population data from published reports and in the 1000 Genomes browser (https://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/?assm=GCF_000001405.25) as unreported mutations.

Sequence variations in the coding region of FUT3

We found 18 single nucleotide polymorphisms (SNPs) in the Sindhi population and 12 in the Punjabi population of Pakistan as a result of DNA sequencing from the whole coding region of the FUT3 locus. Among the 18 SNPs found in the Sindhi population, 14 were reported previously, while 4 were unreported, among which G146A (rs1263565737) and G490A (rs767305253) were missense mutations, and G381A (rs144354196) and G561A (rs747036561) were synonymous mutations. Moreover, in the Punjabi population, in addition to 9 previously reported mutations, we observed 3 unreported mutations, among which 2 were synonymous mutations (G24C (new mutation submitted to dbSNP database) and C876T (rs3934140)), and 1 was a missense mutation (T959C (rs762649552)). All SNPs identified in this study and their corresponding frequencies are summarized in Table 1.

Table 1 All mutations identified in FUT3 coding region from Sindhi and Punjabi populations.

Novel haplotypes and inference of the influence of new missense mutations on enzyme activity

We identified the Lewis haplotypes of 11 individuals who carried unreported mutations and undefined haplotypes via PCR cloning and allele-specific PCR. Finally, we identified eleven novel alleles, among which seven were defined by seven unreported mutations, one resulted in a new combination involving the previously reported mutation C882T, and the remaining three were characterized by the presence of G113A, G1061A, and T645C, which have been reported in the 1000 Genomes database, but haplotypes consisting of these three mutations are still unknown. Among these eleven alleles, seven were non-functional, and four were functional, which are summarized in Table 2. Hence, all nonsynonymous mutations found in this study were included in the Lewis-negative alleles confirmed previously, such as the G146A and G1061A mutations located on the chromosome of le59,508; G490A and T959C located on the chromosome of le59,1067; and G113A located on le202,314. Under these conditions, the Lewis enzyme activities generated from the non-functional alleles with these mutations were not examined further. However, we can speculate about the impact of these nonsynonymous substitutions on enzyme activity on the basis of the positions of mutations and the associated changes in amino acids using the PolyPhen and PROVEAN programs. The G113A (R38Q) and G146A (S49N) mutations lie in the stem region of the encoded protein structure, showing a relatively minor effect on enzyme activity, and both were predicted to be benign (score: G113A:0.002; G146A:0) by PolyPhen and to be neutral by PROVEAN. The other three missense mutations, G490A (D164N), T959C (F320S) and G1061A (R354H), are located in the catalytic domain of the enzyme and are more likely to influence enzyme activity. However, the G1061A mutation was predicted to be benign (score: 0) by PolyPhen and to be neutral by PROVEAN, possibly because this mutation does not change the biochemical properties of the amino acids (arginine and histidine are both alkaline amino acids). The G490A (D164N) and T959C (F320S) mutations are predicted to be damaging (score: G490A: 1.0; T959C: 0.989) by PolyPhen and lethal (score: G490A: −4.878; T959C: −7.255) by PROVEAN. A mutation with a score ≤ −2.5 is considered lethal, while a score > −2.5 indicates a neutral mutation.

Table 2 New haplotypes and genotypes identified in FUT3 gene in Sindhi and Punjabi populations.

Phenotype, genotype, and allele frequencies

As shown in Table 3, we identified 18 and 14 alleles and 29 and 24 genotypes in the Sindhi and Punjabi populations, respectively. The seven alleles Le, Le59, le59,1067, le202,314, le202,314,47, le59,202,1067, le59,508, and Le645 were found in both the Sindhi and Punjabi populations, whereas the le1007, le1067, le13,484,667, and Le612 alleles were only present in the Sindhi population, and le202 and le59,445 were only found in the Punjabi population (novel alleles are shown in Table 2). Notably, the three most common FUT3 alleles in the two populations were Le, le59,1067 and le202,314, and the frequency of these alleles in the Sindhi population was 87%, while it was 87.57% in the Punjabi population. Non-functional Lewis alleles accounted for 38.5% of the alleles in the Sindhi population, among which the le202,314 was the most common (17.75%). In the Punjabi population, 43.63% of alleles were non-functional, and le59,1067 was the most frequent (17.52%), showed significant differences between the two populations (P < 0.05). Moreover, the 3 alleles le47,202,314, le59,202,1067 and le202 were relatively rare, but le47,202,314 (2.25% in Sindhi and 4.78% in Punjabi) and le59,202,1067 (1.25% in the Sindhi and 1.27% in the Punjabi populations) presented a considerable frequency in these studied populations. According to the genotypic data, the frequency of the Lewis-negative phenotype was 11.5% in the Sindhi population and 22.93% in the Punjabi population, indicating a higher frequency of the Lewis-negative phenotype in the Punjabi population.

Table 3 Frequencies of allele, genotype, and phenotype in FUT3 locus from Sindhi and Punjabi populations.

Discussion

Pakistan lies in a region that has been invaded by several different groups in the past, including Greeks, Aryans, Macedonians, Arabs and Mongols20. These invaders contributed to the ethnic variety of the Pakistani populations. There are many ethnic groups inhabiting different parts of Pakistan. In this study, the systematic sequencing analysis of the coding region of the FUT3 gene was performed in two ethnic groups, Punjabis, representing 62% of the Pakistani population, and Sindhis representing 18%. In the context of the Lewis blood type and genetic polymorphism mentioned above, these two groups were appropriate for in the current study because they represent>78% of the total Pakistani population. The Lewis blood group system is not only highly polymorphic but also ethnically and geographically specific. Many different sequence variations have been observed in different populations around the world. According to our results, the studied populations exhibit higher sequential variation and a wide variety of alleles at the FUT3 locus.

Initially, the most frequent mutations identified in the studied populations were T59G, T202C, C314T, G508A, T1067A and G47C (as shown in Table 1). The T59G mutation was either present as a singleton or linked with other mutations such as G508A and T1067A. The G508A mutation is most commonly found in Asian, African and Amazonian populations1,13,21,22, but in the currently studied populations, the frequency of this mutation only accounts for 3.75% (Sindhi) and 3.82% (Punjabi). On the other hand, the T1067A mutation, which is frequent in Japanese17, Sinhalese14, Southeast Asian11, and Caucasian populations13, also represents 14.25% in Sindhis and 18.79% in Punjabis. The T202C and C314T mutations, which were found predominantly in the Sindhi (21.75% and 20.25%) and Punjabi (21.66% and 19.75%) populations, have most commonly been found in Caucasian populations13. In most cases, T202C and C314T are in complete linkage, but it was interesting that the T202C singleton, which has been previously identified in Xhosa and Caucasian populations1, was also found in a heterozygous individual from the Punjabi population. It is worth noting that the G47C mutation, which has only previously been identified in the Sinhalese population of Sri Lanka in South Asia14 and the Caucasian panel of Coriell Cell Repositories, was found at a notable frequency in the Sindhi (2.25%) and Punjabi (4.78%) populations, indicating that G47C of FUT3 may be more specifically present in South Asian populations. In addition, some other rare mutations were sporadically identified in the investigated populations. For example, the G13A mutation, which was originally found in African Americans and was common in native Africans, was also present in the Sindhi population1,15. C445A was originally observed in Denmark4 and A1007C has only been reported in Japanese populations12; both enzyme-inactivating mutations were also seen in one heterozygous individual in each of the Punjabi and Sindhi populations. The A612G mutation was found only in Mongols and was identified in 4 heterozygous individuals from the Sindhi population13, while the T645C mutation was shared by both the Sindhi and Punjabi populations. The distribution characteristics of common and rare mutations at the Lewis locus suggested the existence of extensive sequence diversity in the Lewis coding region in the Sindhi and Punjabi populations, and the FUT3 SNPs and alleles shared with other racial populations indicated a mixed trait in the two investigated populations of Pakistan.

Second, more alleles of the human Lewis blood group, including seven novel non-functional alleles and four functional alleles, were found in the current study. Interestingly, all novel non-functional alleles came from the known Lewis-negative alleles with additional mutations. Previously, 90–95% Lewis-negative individuals were identified in Caucasians by screening the four SNPs, T59G, T1067A, T202C, and C314T23,24. Our results showed that the addition of G508A to the above four SNPs was sufficient to define the Lewis-negative alleles in Pakistani populations.

According to the frequency distribution of the Lewis allele and the negative phenotype in the Sindhi and Punjabi populations, the frequency of le59,1067 in the Punjabi population was significantly higher than that in the Sindhi population, although the statistical analysis indicated that the whole frequency distribution of alleles showed no significant differences (P > 0.05). Furthermore, the Lewis-negative phenotype frequency of the Punjabi population (22.93%) was twice that of the Sindhi population (12%). Therefore, the genetic profile of the Lewis blood type system in these two groups is somewhat similar.

The type and frequency distribution of the Lewis alleles, especially non-functional Lewis alleles, are race-specific among many populations1,13,14,17,18,21,22 (Table 4). le59,508 is commonly found in East Asian and African populations, but the frequency is relatively lower in the currently investigated populations. le202,314 is commonly found in Caucasians and Sinhalese ethnic groups of South Asia. Importantly, the highest frequency of le59,1067 was observed in the Sinhalese, followed by the Caucasians and Japanese populations. The le202,314 and le59,1067 alleles are mainly non-functional alleles in the Punjabi and Sindhi populations. Moreover, le47,202,314 and le59,202,1067 are rarely observed in other populations but were frequently found in currently studied populations. A study addressing mitochondrial control region diversity in the Sindhi population showed that the haplogroups constituting the mtDNA library were mainly derived from South Asia (47.6%) and West Eurasia (35.7%)20. Likewise, the Punjabi mtDNA gene pool is primarily a composite of considerable proportions of South Asian haplotypes (65%) and West Eurasian (29%) haplogroups25. Therefore, based on the distribution of alleles at the FUT3 locus, we can conclude that the Punjabi and Sindhi populations from Pakistan are more closely related to Sinhalese and Caucasian population, and the present results conform to those of many other studies in Sindhi and Punjabi populations.

Table 4 Comparison of allele frequencies of FUT3 gene among different populations.

In recent years, new interest in the polymorphisms of FUT3 has been raised by genome-wide association studies (GWASs), which have suggested inactivating polymorphisms (T59G, G508A) of the gene to be associated with the prevalence of ulcerative colitis (UC)26, Crohn’s disease (CD)27 and coronary artery disease28,29. Moreover, these two SNPs also influence the lesion location in UC and CD. In the Sindhi and Punjabi populations, our results demonstrated that T59G, G508A, T1067A, T202C, and C314T, as tag SNPs of Lewis-negative alleles, will be useful for large-scale association studies of Lewis-negative phenotypes with diseases in the future.

On the other hand, recent studies have suggested that the Lewis phenotype is associated with susceptibility to infection by Norovirus30, Rotavirus31 and Helicobacter pylori32. The Lewis-negative phenotype is resistant to norovirus (GI) and rotavirus (P8), which are the leading causes of acute gastroenteritis in children worldwide. According to a previous report, the Lewis-negative phenotype varies from 7% (Asians) or 8% (Europeans) to 19% (Africans)21. Our results showed that the frequencies of the Lewis-negative phenotype were 12% (Sindhi) and 22.93% (Punjabi). Thus, a stable proportion of the Lewis-negative phenotype is maintained under long-term natural selection. The reason may be associated with microorganism infection, possibly related to a protective strategy against widespread disease.

In the past, it was relatively difficult to accurately classify Lewis phenotypes using the haemagglutination test in medico-legal investigations. This method only works for whole blood samples, not for special materials such as body fluid, hair, and bloodstains that are commonly found at crime scenes. However, in this study, we successfully typed the Lewis phenotype using bloodstain samples.

In conclusion, multiple sequence variations and a wide variety of alleles, including eleven novel alleles of FUT3, were identified by systematic sequencing analysis in Punjabi and Sindhi populations. These populations were not previously studied in reference to FUT3, and our present study revealed that the Sindhi and Punjabi populations are a mixture of South Asian and Caucasian ancestry. Thus, a genetically better understanding of the origins of these two ethnic groups is presented in this research.

Materials and Methods

Sample collection

In the current study, we collected bloodstains on FTA cards from 357 (200 Sindhi and 157 Punjabi) unrelated individuals residing in Sindh and Punjab provinces of Pakistan. All participants gave their informed consent either orally and with a thumb print (if they could not write) or in writing after the study aims and procedures were carefully explained to them. The study was approved (2019/060) by the ethical review board of China Medical University, Shenyang, Liaoning Province, People’s Republic of China, and was performed in accordance with the standards of the Declaration of Helsinki.

DNA isolation

Genomic DNA was isolated from FTA bloodstain cards using a modified phenol-chloroform method developed by our group (Supplemental File 1). The extracted DNA samples were quantified using a NanoDrop spectrophotometer (Thermo Scientific, Wilmington DE, USA).

PCR amplification of the FUT3 genes

A DNA fragment (1226 bp) containing the open reading frame (1086 bp) was first amplified by PCR in a 20 μl system including 10 μl of 2×Power Taq PCR MasterMix (Bioteke, Beijing, China), 40 ng genomic DNA, and 5 μmol of each primer. The primer sequences are shown in Table 5. PCR was carried out under following conditions: initial denaturing at 94 °C for 5 min, followed by 35 cycles of denaturing at 94 °C for 30 s, annealing at 65 °C for 30 s, and extension at 72 °C for 1 min.

Table 5 Sequences and positions of PCR primers and annealing condition used for analysis of the Lewis gene.

Direct DNA sequencing

The synthesized PCR products were directly sequenced using Sanger sequencing with the sequencing primers (shown in Table 5). The sequencing conditions were described previously33.

Haplotype identification

Nested PCR was carried out using a 20 μl system containing 10 μl of 2× Power Taq PCR MasterMix (Bioteke, Beijing, China), 2 μl of the 1000-fold-diluted first PCR product and 5 μmol of each primer in 11 individuals, which showed unreported and rare point mutations. The primer sequences are shown in Table 5. The PCR conditions were the same as for the first round of PCR. The obtained PCR products were digested by the restriction enzymes Hind III and Xba I. These target regions were then subcloned into pcDNA3.1. For the determination of individual haplotypes, a minimum of four clones of each plasmid were sequenced.

To verify our results for individual haplotypes, we also performed allele-specific PCR in a 20 μl system containing 2 μl of 10× PCR buffer, 1 μl dNTP mix, 5 μmol of each primer, 2.5 units rTaq DNA polymerase and the 1000×-diluted first PCR products as templates. We designed a total of 4 upstream primers and a common downstream primer (Nest-FUT3) for amplification in all individuals. The primer sequences are shown in Table 5. The PCR conditions were as follows: 94 °C initial denaturation step for 5 min, followed by 25 cycles of 30 s at 94 °C, 30 s at the annealing temperature (Table 5), and 1 min at 72 °C. The allele-specific PCR products were sequenced by the Sanger sequencing method.

Statistical analysis

The DNA sequences were analysed by using DNAMAN8 software with the NCBI sequence NG_007482 as a reference. Allelic frequencies and genotypes were calculated by the direct counting method, while Hardy-Weinberg equilibrium (HWE) was assessed with the chi-square test. The differences in the allele frequency distribution between the currently studied populations and reference populations were calculated by using SPSS version 21.0 software. The effect of point mutations on enzyme activity was inferred with the PolyPhen and PROVEAN programs.