Introduction

The N-acetyltransferases (NAT; EC 2.3.1.5) are involved in the initial biotransformation metabolism of aromatic amines and hydrazines and catalyzes the transfer of the acetyl group from acetyl CoA to the nitrogen of the substrate.1 NATs are polymorphic in the population and metabolize different types of carcinogens that have been directly implicated in tumor progression. The NAT2 gene (MIM no. - 243400), which codes for the NAT2 protein, is located on chromosome 8p22 and spans 9.9 kb. The human NAT2 gene has two exons, with a noncoding exon at the 5′ end and an uninterrupted coding region (exon 2) of 870 nucleotides that encodes a 290 amino-acid protein. The acetylation polymorphism (rapid or slow acetylator phenotype) was discovered more than 60 years ago following differences observed in tuberculosis patients to isoniazid toxicity.2 Genetic variations in the NAT2 gene have been known to result in individual acetylation polymorphisms. To date, 66 alleles have been described; however, many of these NAT2* alleles share sequence variations, and these sequence variations do not always lead to changes in the enzyme activity of the encoded protein (http://nat.mbg.duth.gr/Human%20NAT2%20alleles_2013.htm).

The NAT2 c.282C>T and c.481C>T polymorphisms are silent mutations that do not change the tyrosine at amino acid 94 (p.Tyr94Tyr) or the leucine at amino acid 161 (p.Leu161Leu), respectively, in the NAT2 protein. On the converse, the NAT2 c.191G>A and c.590G>A polymorphisms change the charged arginines to polar glutamines at residues 64 (p.Arg64Gln) and 197 (p.Arg197Gln), respectively. The NAT2 c.341T>C polymorphism alters the hydropathy profile of the protein by changing isoleucine to the more polar threonine at amino acid 114 (p.Ile114Thr).3 The NAT2 c.803G>A polymorphism is considered to be a conservative amino-acid change because it changes lysine to an arginine at position 268 (p.Arg268Lys). The NAT2 c.857G>A polymorphism changes glycine 286 to the more polar glutamic acid (p.Gly286Glu) in the C-terminal tail. Early genotyping studies have screened the c.481C>T, c.590G>A, c.857G>A and sometimes the c.191G>A polymorphisms and suggested their potential role in slowing down the acetylation phenotype.4 The c.481C>T single-nucleotide polymorphism (SNP; rs1799929: p.Leu161Leu) is a silent mutation designated as NAT2*11A, and it is not involved in altering the acetylator phenotype. The nucleotide substitutions at c.590G>A (rs1799930) and c.857G>A (rs1799931) are designated as NAT2*6 and NAT2*7, respectively, and the combination of these alleles causes the slow acetylator phenotype. Furthermore, a threefold decrease in clearance was reported between rapid acetylators and slow acetylators.5 Phenotype studies in the last five decades have discovered distinct ethnic differences for slow acetylator frequencies.6 Region wise, in the Caucasian and African populations, the frequency of the slow acetylation phenotype varies between 40 and 70%, whereas in the Asian population, in particular East and Southeast Asian populations, such as Japanese, Chinese, Korean and Thai, it ranges from 10 to 30%.7

Acetylation is a major biotransformation route for many arylamine and hydrazine drugs, as well as for a number of toxins and known carcinogens present in some diets, cigarette smoke and in the environment.8 Therefore, it is important to study the NAT2 gene to understand the molecular basis of biotransformation. Attempts have been made to study the NAT2 polymorphisms at the global level,9 and because of their unique population structuring,10 Indian populations may provide an immense opportunity to test various assumptions that were previously made.9 Therefore, in order to study the genetic variations of the NAT2 gene in six major Indian populations, three SNPs (rs1799929, rs1799930 and rs1799931) were selected for genotyping analysis. There are some potential limitations in assigning NAT2 alleles/haplotypes and deducing phenotypes solely from these three SNPs; however, in our preliminary analysis using HapMap GIH (Gujarati—a West Indian population) samples, we found these SNPs to be highly polymorphic and present in separate linkage disequilibrium (LD) blocks with many other potential SNPs. As GIH is the closest population group to the six ethnic groups analyzed in this study, it is likely that these three SNPs are the best fit for this study.

Materials and Methods

Subjects

The study populations included a total of 212 unrelated males belonging to six ethnic groups. Of these groups, Reddy belongs to the upper caste, Balija to a backward caste, Mala, Madiga and Sugali are from a scheduled tribe, whereas Muslims are a religious group. All of the populations, with the exception of Muslims (who are a religious group), speak a branch of the Dravidian linguistic group, which is found in the southern peninsula of India. The names of the populations, sample sizes, linguistic affiliations and place of inhabitation are presented in Table 1. All of the subjects were apparently normal healthy volunteers, and no diagnosis was performed on the individuals. All of the participants provided written informed consent. The procedures for the protection of the human subjects in this study were approved by the Institutional Ethical Review Committee of Narayana Medical College, Nellore, India. Intravenous blood samples (~3 ml each) were collected, and the DNA was isolated using a standard protocol.11

Table 1 Name of the population, their linguistic affiliation, geographic location and number of chromosomes analyzed

Genotyping

Three SNPs from the NAT2 gene c.481C>T (p.L161L, dbSNP rs1799929), c.590G>A (p.R197Q, dbSNP rs1799930) and c.857G>A (p.G286E, dbSNP rs1799931) were genotyped in the 212 individuals. The primers and probes for all of the SNPs used in this study were purchased from Applied Biosystems (Foster City, CA, USA). Each reaction contained 2.5 μl of TaqMan Universal PCR Master Mix, 0.125 μl of TaqMan SNP Genotyping Assay, 1.375 μl of distilled water and 1 μl of DNA (10 ng/μl), with a final reaction volume of 5 μl. For each SNP assay, a positive control for the wild-type, heterozygote and variant genotype was provided with at least two negative controls. Before analyzing the DNA, a pilot test was conducted to confirm the accuracy of the assay. After a successful pilot test, the sample analysis was carried out in 384-well optical reaction microplates (Applied Biosystems). The fluorescence was measured using an Applied Biosystems 7900HT Fast Real-Time PCR System and analyzed with its System SDS software, version 2.3.

Statistical analysis

The allele frequencies in each population were determined via direct counting. The Hardy–Weinberg equilibrium ratios were calculated using the software HWSIM, a DOS-based program.12 As phase-unknown genotypes were collected, the haplotype sites and frequencies were estimated using maximum likelihood with an expectation-maximization method in Arlequin.13 The LD (D’ and r2) was estimated using the program HaploView 3.12.14 For a worldwide comparison in a wider context, we also extracted the regions 20-kb up- and downstream of the SNP rs1799930 from a large pool of populations distributed worldwide.10,15 The physical map for the 20-kb regions up- and downstream of rs1799930 is shown in Supplementary Figure S1. The data were phased using BEAGLE.16 A homozygosity analysis was conducted using PLINK 1.07 (A sliding window of 25 SNPs was surveyed for the homozygosity).17 The threshold of homozygosity match was 0.99.

Results

The location, sequence and wild-type allele for the polymorphic sites, in addition to their NCBI reference IDs, are shown in Table 2. The wild-type and mutant alleles of all of the SNPs used in the present study (rs1799929, rs1799930 and rs1799931) were designated according to their respective nucleotides. All of the individual site allele frequencies for all of the populations are given in Table 3.

Table 2 SNP location, relative position on gene, amino-acid change and wild-type allele for all SNPs
Table 3 Genotype, allele frequencies and Hardy–Weinberg Χ2 in the NAT2 gene among six Indian populations

Consistent with our preliminary observations regarding the HapMap GIH populations, all three of the markers remained highly polymorphic in the studied populations. All of the SNPs of NAT2 followed Hardy–Weinberg proportions. The minor allele frequency (MAF) of rs1799929 was found to be greater than 23%, with a minimum of 23.5% (Madiga) and a maximum of 33.8% in the Balija populations. The rs1799930 site showed a minimum minor allele frequency of 30.9% (Balija) and a maximum of 44.4% (Sugali). A subsequent MAF analysis of the MAF data for rs1799930 in other world populations showed wide variations, with the highest frequencies in the GIH population (35.2%) followed by the MKK population (30.4%); the lowest frequencies were in the CHB (19%) and MEX (18%) HapMap populations. Similarly, the Indian populations also exhibited wide variations, ranging from 6.2% in the Tibeto-Burman to 42.3% in the Austroasiatic populations (Supplementary Table S1). It is interesting to note that, although the migrations of the Austroasiatic and Tibeto-Burman populations have been suggested to come from the East,18,19 the present study confirms that the East/Southeast Asian ancestry of both of the groups is derived from at least two distinct sources. Moreover, the rs1799931 SNP exhibited a lower MAF than the other two SNPs. The MAF of rs1799931 was found to be minimum (3%) in the Balija and maximum (11.8%) in the Madiga. The χ2 test, to determine departure from the single site Hardy–Weinberg expectations, was applied to each NAT2 polymorphic site in each population, and the results are shown in Table 3. The genotype frequencies for all of the polymorphisms meet the Hardy–Weinberg expectations in all of the populations.

The haplotypes are coded according to the site order in Table 2. The haplotype-based analysis showed striking variations among the populations (Figure 1). Of the eight possible three-site haplotypes, only three haplotypes were shared by all of the populations, with a cumulative frequency ranging from 88.2% (Madiga) to 97.0% (Balija). The wild-type haplotype (CGG) was shared by all of the populations and its frequency ranged from 18.8% in Muslims to 35.3% in the Balija, where it is a major haplotype. In all of the populations, with the exception of the Balija, CAG is the major haplotype, with a frequency ranging from 35.5% (Mala) to 44.4% (Sugali); TGG is the second most prominent haplotype in Reddy (32.6%), Sugali (29.6%), Muslim 32.5% and Mala (30.0%) subjects, and the CGA haplotype was only found to be more than 5% in the Mala (10%) and Madiga (11.8%) populations, which belong to the Scheduled tribe group, suggesting that CGA is a tribal-specific haplotype.

Figure 1
figure 1

The inferred NAT2 haplotype frequencies in the different Indian populations. The haplotypes are coded according to the site order in Table 2.

To calculate the pairwise LD for all three of the NAT2 SNPs in the studied populations, we used the two most common LD measures, D′ and r2. A strong LD between rs1799929 and rs1799930 was observed in all of the populations, except the Madiga (D′=1.0; r2=0.19; Figure 2). None of the populations had significant LD between rs1799929 and rs1799931. Similarly, rs1799930 and rs1799931 did not reveal any strong LD because of low heterozygosity at the rs1799931 locus. Comparison of the 20-kb up- and downstream regions surrounding rs1799930 in large number of worldwide samples revealed the strong LD of this SNP with another NAT2 SNP (rs1112005) in the majority of the populations (Supplementary Figure S2). Interestingly, the homozygosity analysis of the Indian populations showed that the caste populations, who mostly depend upon agriculture and a vegetarian diet, did not have any homozygous blocks in this region (Table 4).

Figure 2
figure 2

The pairwise linkage disequilibrium between the SNP markers of the NAT2 gene in the different Indian populations. The color coding represents the D'/LOD values, and the values in the cells are the r2 multiplied by 100. (a) Mala; (b) Madiga; (c) Muslim; (d) Balija; (e) Sugali; (f) Reddy.

Table 4 The number of ROH (runs of homozygosity) segments among Indian populations

NAT2*4 refers to the NAT2 reference sequence (GenBank accession X14672). The NAT2*4 allele acts dominantly to result in rapid acetylation, and the presence of a mutant genotype (rs1799929, rs1799930 and rs1799931) leads to slow acetylation.20 On the basis of this assumption, the acetylator status of all of the samples in each population under analysis was determined. Samples possessing at least two mutant alleles were considered to be slow acetylators (Table 5). In several of the populations (Reddy, Mala, Sugali and Muslim), the prevalence of slow acetylators was greater than rapid acetylators, whereas the Balija and Madiga populations showed more rapid acetylators (52.9%) than slow acetylators (47.1%; Figure 3).

Table 5 NAT2 deduced phenotype in population groups
Figure 3
figure 3

The NAT2 rapid and slow acetylator phenotypes in the different Indian populations.

Discussion

The NAT2 enzyme is capable of N-acetylation, O-acetylation and N,O-acetylation, and it is implicated in the detoxification of a wide spectrum of naturally occurring xenobiotics, including carcinogens and drugs.21 The acetylator phenotype is determined by studying the acetylation of certain drugs, such as sulfadimidine, isoniazid (INH) dapsone or caffeine;22 therefore, NAT2 gene polymorphisms have been linked to human acetylation capacity, which alters susceptibility to cancer and adverse drug reactions.23 In the present study, we analyzed three SNPs: rs1799929 (p.Leu161Leu), which is a silent mutation that does not alter the acetylator phenotype, whereas the remaining two SNPs result in amino-acid substitutions (rs1799930: p.Arg197Gln; rs1799931: p.Gly286Glu) that lead to a significant decrease in acetylation capacity. Genotypic screening of these primary NAT2 polymorphisms in South Indian populations revealed high variations in their allele frequency spectra but without any deviation from Hardy–Weinberg equilibrium in any of the populations. Only three major haplotypes carry 88–97% of the chromosomes among all of the studied populations. A strong LD between rs1799929 and rs1799930 was found in five out of the six studied populations. Among the other Indian populations, the LD varied considerably. The SNP rs1799930 was not in LD with the majority of the SNPs in the Austroasiatic and Indo-European linguistic groups, whereas in the Dravidians, the same marker was in LD with many other markers, forming the largest LD block (Supplementary Figure S3). Notably, the Onge, Siddi and Tibeto-Burman groups did not have any other markers in LD with this SNP. The great variation in LD among the Indian populations is likely because of the heterogeneity caused by the combination of drift and selection due to various environmental and cultural pressures (Supplementary Figure S3).

The frequency of c.481C>T was found to be 7–9% in African24 and 3.8% in Thai populations.25 A higher frequency of c.481C>T was reported (30%) in Brazilians, Spanish, Iranians, Emiratis and African Americans.26–30 A strong LD between NAT2*11A (c.481C>T) and 2*6 (c.590G>A) has been reported in Indian31 and Brazilian admixed populations;32 therefore, the c.481T variant is sufficient for detecting NAT2*11A normal acetylators.33 The low frequency of the c.857G>A polymorphism in many populations reduces the predictive power of slow acetylators. A recent study using the SNPs rs1801280 and rs1799930 showed five different phenotype categories (corresponding to the genotypes NAT2*4/*4, NAT2*4/*5 or *4/*6, NAT2*5/*5, NAT2*5/*6 and NAT2*6/*6) that corresponded to a higher or lower acetylation capacity in vivo.34 Using accurate and reliable estimates for the NAT2 haplotype frequencies and the individual haplotype phases, no LD was observed between NAT2*5 and 2*6 in the Spanish, Korean and Black South African samples.35 Analyses of seven NAT2 SNPs revealed interethnic variability for some of the SNPs but intraethnic variability for other SNPs; c.282C>T, c.806A>G and c.857G>A have not shown any major intraethnic variability.36

The adaptation of agricultural practices from a hunter–gatherer lifestyle has allowed humans to shift their diet towards less protein and more starchy foods. This shift in diet has created various selective pressures on the human genome, which acts on the genetic variations in different human populations. The relationship between human dietary habits and NAT2 has been hypothesized in several studies.9,37 It was suggested that the hunter–gatherers carried rapid acetylator phenotypes, whereas agriculturists possess the slow acetylator phenotype.38,39 In our analysis, all of the populations predominantly carried the slow acetylator phenotype, with the exception of the Balija and Madiga (Figure 3), suggesting that the dietary shift of these two populations from the hunter–gatherer lifestyle likely happened recently. The 20-kb up- and downstream genomic analysis also showed a similar pattern, where the tribal and lower caste populations showed a variable number of homozygous blocks in the analyzed region (Table 5), suggesting their association with the rapid acetylator phenotype.

Therefore, our analyses of the NAT2 gene polymorphisms in a large body of data have provided strong support for evidence of a high diversity of slow acetylator phenotypes in the majority of the world’s populations, which has been shaped by natural selection40 and population-specific selective pressures acting on this locus.38 Moreover, we also contribute to the knowledge regarding the distribution of NAT2 gene polymorphisms among the various ethnic groups of India. We observed that because of the vast amount of genetic, cultural and phenotypic variations found in Indian populations, all of the studied SNPs have great variations in the LD among the Indian populations, and a complete study of the NAT2 gene in these populations will help in drawing more inferences about the processes that have led to the current gene frequency patterns.