Introduction

Neuropeptide Y (NPY) is the most abundant neuropeptide in the brain and is a widely diffused system that is involved in the regulation of multiple biological functions. NPY is expressed in several regions of the nervous system and is a pleiotropic factor that participates in the control of multiple physiological processes, such as cognitive functions, eating behavior, circadian rhythms, neuroendocrine mechanisms, reproductive and cardiovascular functions (McDonald 1990; Gabriel et al. 1996; Gribkoff et al. 1998; Bray et al. 2000; Magni 2003). NPY is a 36-amino-acid peptide that consists of an alpha-helix folded underneath a proline helix with a tyrosine residue at the carboxy terminus. The gene is highly conserved with 92% amino acid sequence identity between the cartilaginous fish Torpedo marmorata and mammals, which are separated by an evolutionary distance of more than 400 million years (Larhammar 1996). This high level of conservation indicates that NPY presumably has a critical physiological function and that interindividual variation at this locus is likely to be minimal.

A thymidine (T) to cytosine (C) polymorphism is present at nucleotide 1,128 of the human NPY gene, resulting in a substitution of leucine to proline (Leu7–Pro7) in the signal part of preproNPY (Karvonen et al. 1998). Individuals with the Leu7/Pro7 genotype have an average 42% maximal increase in plasma NPY in response to physiological stress when compared with Leu7/Leu7 individuals (Kallio et al. 2001). The Leu7Pro polymorphism in the NPY gene has been shown to be implicated in lipid metabolism regulation. Presence of the Pro7 variant is a suspected cardiovascular risk factor in Caucasians (Karvonen et al. 1998, 2001; Niskanen et al. 2000a). The Leu7Pro polymorphism has been associated with accelerated progression of carotid atherosclerosis in both obese healthy subjects (Karvonen et al. 2001) and patients with type-2 diabetes (Niskanen et al. 2000a). Type-2 diabetic patients with the Pro7 substitution had a higher prevalence of diabetic retinopathy (Niskanen et al. 2000b; Pettersson-Fernholm et al. 2004).

In general, the frequency of the Pro7 allele has been reported to be 8% among Finns and 3% among Dutchmen (Karvonen et al. 1998). There was no report of the presence of this allele among Japanese (Drube et al. 2001; Makino et al. 2001; Itokawa et al. 2003) and Korean populations (Ding et al. 2002). Jia et al. (2005) reported an extremely low frequency of the Pro7 allele in China. Although the NPY gene has been studied in some European and Asian populations, no published studies exist on Indian populations, which comprise nearly 5,000 endogamous populations.

There are 4,635 well-defined endogamous populations in India, which are culturally stratified as tribes and non-tribes. The 532 tribal communities, who are assumed to be the aboriginal inhabitants of the subcontinent, constitute 7.76% of the total population (http://www.censusindia.net). A few studies reported a closer affinity of Indian castes with either European or Asian populations (Bamshad et al. 2001). Our previous study suggested that the vast majority of the Indian maternal gene pool (>98%), consisting of Indo-European and Dravidian speakers, is genetically more or less uniform (Thanseem et al. 2006). The populations selected in the present study encompass the entire range of Indian social structure. In this study we evaluate the allele frequency, haplotype frequency and intermarker linkage disequilibrium (LD) distributions for three single-nucleotide polymorphisms (SNPs) of NPY gene in 14 different Indian populations. We find important differences in the distributions of allele and haplotype frequencies among different populations. These differences in the allele frequency and haplotype frequencies among populations will be useful in association studies or linkage analysis to determine phenotypes such as obesity, hypertension, and alcoholism.

Materials and methods

Populations

Study populations included a total of 654 unrelated males (to allow future studies of both mitochondrial and Y chromosomal variations) belonging to 14 ethnic groups, who inhabit geographically diverse regions of India (Fig. 1, Table 1). All subjects were normal healthy volunteers. The study protocol was approved by the ethics committee of the Centre for Cellular and Molecular Biology, Hyderabad, India. After obtaining informed written consent, a blood sample was collected from each individual.

Fig. 1
figure 1

Map of India with the locations of different populations. The lines within the map indicate the borders of different states

Table 1 Population names, geographic locations, and number of samples analyzed for each

Genotyping

Genomic DNA was extracted from all participants using the published protocol (Thangaraj et al. 2002). Forward (NPE2F 5′-CCTGGGTTCTCTCTGCGGGACTG-3′) and reverse (NPE2R 5′-CCCATTTTGTGTAGAGTGTGCCCTGT-3′) primers were used to amplify 516 bp of exon 2 and its flanking region of NPY. Amplifications were performed in a 10- μl volume containing one unit of AmpliTaqGold (PerkinElmer, Wellesley, MA, USA), 1X PCR buffer, 1.5 mM MgCl2, 200 μM deoxynucleotide triphosphates, 1 pmol of forward primer, one pmol of reverse primer, and 40 ng of genomic DNA. PCR products were visually verified on 2% agarose gels and were directly sequenced using Big Dye Terminator cycle sequencing on an ABI PRISM 3730 DNA analyzer (Applied Biosystems, Foster City, CA, USA).

Statistical analysis

All NPY alleles were tested for Hardy–Weinberg equilibrium using χ 2tests with one degree of freedom and Monte Carlo simulation using the HWSIM program (Cubells et al. 1997). The relationship between the geographic latitude of the population and the associated Pro7 allele frequency was estimated by simple linear regression using SPSS software. F statistics were calculated according to the methods of Weir and Cockerham (1984) with FSTAT (Goudet 1995). Haplotypes were constructed by maximum likelihood with an eExpectation–maximization method. Within each population, the haplotype frequencies were obtained from three SNPs by summing the frequencies of all haplotypes with each specific combination of alleles at two sites. Haplotype frequencies were estimated by Arlequin (Schneider et al. 2000). Linkage disequilibrium (D′ and r 2) was estimated using HaploView 3.12 (Barrett et al. 2005).

Results

A total of three polymorphic sites were observed in exon 2 of NPY gene, only one of which resulted in an amino acid change (T1128C/Leu7Pro; A1198G/Ala28Ala; G1258A/Ser50Ser). We designated the wild and mutant alleles with their respective nucleotides for all three coding SNPs. For haplotypes, we used these designations in order from 5′ to 3′. For example, a haplotype described as TGG indicated the presence of all ancestral alleles at the three coding sites. A significant difference in allele and genotypic frequencies was observed among populations (Table 2). The Pro7 allele was observed in nine out of 14 populations at frequencies ranging from 0.014 (Kurumba) to 0.233 (Kota); Pro7 homozygotes were found only in the Kota (Table 2). To investigate geographical trends in Pro7 allelic variation, we plotted the frequency of Pro7 frequency against degrees of latitude. A weak tendency for the Pro7 frequency to decrease north-to-south was observed, although the Kota were an outlier (Fig. 2) with a correlation coefficient r = 0.250 (p > 0.05). Of the 15,937 individuals screened around the world thus far, including the present study, the average frequency of Pro7 was found to be 0.048.

Table 2 Genotype frequencies, allele frequencies, Hardy–Weinberg proportions and Nei’s gene diversity (Nei 1987) of NPY SNPs for all populations
Fig. 2
figure 2

Allele frequency of Pro7 in various world populations plotted against latitude north (deg)

The Ala28Ala marker was polymorphic only in the Siddi, with a minor allele frequency of 0.026. The Ser50Ser marker was polymorphic in all studied populations, with the minor allele (A) frequency ranging from 0.186 (Kota) to 0.682 (Onge). None of the populations deviated significantly from Hardy–Weinberg equilibrium for all the polymorphic loci.

When haplotypes were constructed, a total of five haplotypes were observed out of which only two (TGG, TGA) were found at high frequency in all populations (combined frequency was 0.94–1.0) (Fig. 3). Haplotype TGG carried all ancestral alleles and haplotype TGA carried the derived allele at the third marker. Haplotype CGG was found at very low frequencies in several populations, but reached a frequency of 0.233 in the Kota because of the high frequency of the Pro7 allele in this population. In the Pro7 allele-bearing haplotypes, the “C” allele (Pro7) was found in association with the two wild-type alleles (CGG) with frequencies ranging from 1.7 to 23.3%, while in Kurumbas the Pro7 allele was associated with the mutant allele (A) of Ser50Ser locus (CGA). Haplotype CGA was found only in the Kurumba (0.014), while haplotype TAG was found only in the Siddi (0.026). Linkage disequilibrium was detected only between Leu7Pro and Ser50Ser (D’ = 1; r 2 = 0.069; χ 2 = 13.969; p = 0.001) loci in the Badaga.

Fig. 3
figure 3

Relative frequencies of the five NPY haplotypes, shown as “stacked” areas, for 14 populations. The key for the shading is given at the bottom of the figure

Discussion

We report, for the first time, significant frequencies of the Pro7 allele of NPY in the majority of the Indian populations studied [0.014 (Kurumba) to 0.233 (Kota)]. The Pro7 allele has already been found in a few populations worldwide, such as Asians [Japanese (Drube et al. 2001; Makino et al. 2001; Okubo et al. 2001; Lappalainen et al. 2002; Itokawa et al. 2003); Koreans (Ding et al. 2002); Japanese in Tokyo and Han, Chinese in Beijing (http://www.hapmap.org)] and Africans [African–American (Lappalainen et al. 2002); Yoruba in Ibadan, Nigeria (http://www.hapmap.org)]. Jia et al. (2005) reported an extremely low frequency of Pro7 in a Chinese population, although Duan et al. (2005) failed to detect this allele in the same region. In our study, the highest frequency of this allele (0.23) was observed in the Kota (haplotype CGG frequency is 0.23) Since the Kota have been a small, isolated population for at least 2,000 years (Breeks 1873; Saha 1976; Roychoudhury et al. 2001), the high frequency of Pro7 is likely due to a founder effect or genetic drift. The Pro7 allele is also found in the Siddi, who are a recent migrant group from Africa (haplotype CGG frequency is 0.017). Ding et al. (2003) compiled data on 6,626 individuals from several studies that found the Pro7Pro genotype in only seven samples (six Finns and one European American). They proposed that the Pro7 allele might have originated in the north of Europe and spread to neighboring groups in varying frequencies due to the effects of genetic drift and isolation by distance. Lappalainen et al. (2002) also found the Pro7 allele at significant frequencies in Bedouin populations from the Middle East (0.078), and at lower frequencies in north and northeast African populations (0.022 in Moroccans and 0.011 in Ethiopian Jews), suggesting a possible African/Middle East origin for the Pro7 allele. The presence of the Pro7 allele in the Siddi, an African immigrant population, would support this interpretation. However, Pro7 is not found among populations thought to be the first migrants out of Africa, such as the Onge (Thangaraj et al. 2005), suggesting that its origin may postdate the first migration out of Africa.

Previous studies suggested that the “A” allele Ala28Ala locus was an “African-specific” polymorphism (http://www.hapmap.org). The frequency of this “A” allele in African populations is 0.05 (in Harare, Zimbabwe), 0.025 (Nigeria) and 0.045 (African American) (http://www.hapmap.org). The fact that this allele is found only in the Siddi is in accord with the African ancestry of the Siddi. In addition to the two major common haplotypes (TGG and TGA), three others (CGG, CGA and TAG) were observed in at least one of the populations studied and have an average frequency of >1%. Lack of significant LD between pairs of loci in the NPY gene region may be attributed to the low frequency of the alleles, since all measures of LD show some allele frequency dependence in finite sample sizes (Zapata 2000; Abecasis et al. 2001; Mueller 2004). The observation that the TGG haplotype is both common and highly conserved suggests that other factors may be acting to maintain the haplotype at high frequency. One possibility is positive selection that is so recent there has been little opportunity for recombination to break up the haplotype. This may result in LD around a specified allele that is disproportionately long-range given its population frequency. Alternatively, epistatic selection for combinations of alleles at two or more loci on a chromosome may influence LD patterns across haplotypes.

Understanding the evolutionary history of the NPY locus may provide insight into the evolutionary history of the gene in these populations and help to identify the haplotype blocks (The International HapMap Consortium 2003) that are most informative when identifying potential functional variation in this locus, which is associated with many physiological mechanisms.