Introduction

Detection of signatures of natural selection in the human genome is a useful tool to identify genes that might underlie variation in disease resistance or drug metabolism. Any form of selection affects some regions of the genome more than others, whereas population history, demography, migration and the mating system will affect the whole genome in the same way.1 Consequently, genomic regions that are different from the rest of the genome are likely to contain genes that are involved in local adaptation. If selected loci are present in the sample, the variance in single locus FST estimates should be higher than if only neutral loci were sampled.

Neutral polymorphic loci are used in making inferences about patterns of differentiation within or among populations of the same or closely related species, because under particular models of population structure, they are related to demographic or historical parameters. Selection may shape the distribution of genetic variation at the so-called ‘neutral markers.’2, 3, 4, 5 A neutral locus will respond to selection whenever it is in linkage disequilibrium with other loci that are subject to selection, depending on the local rate of recombination.6 The process of balancing selection of a locus tends to maintain an elevated level of variation at closely linked neutral loci.7, 8 Other modalities of selection brought a reduction in variability at linked sites. There are two mechanisms responsible: background selection and genetic hitchhiking.9, 10 Background selection removes deleterious mutations and eliminates variation at linked sites, varying with the recombination rate, the magnitude of selection and the mutation rate.11, 12 Genetic hitchhiking is an outcome of positive selection: if a mutation increases in frequency in a population as a result of selection, it will be accompanied by linked neutral variation.13

The first multilocus test to detect selection was developed by Lewontin and Krakauer,14 on the basis of the sampling distribution of a statistic F (an estimator of FST). In brief, the variance in FST values among loci was used to identify those loci that deviated more than expected. This test was soon shown to be invalid, as the underlying neutral model was far too restrictive because various models of population history have been shown to inflate the expected variance.15, 16, 17 Lewontin and Krakauers study approach was refined by Bowcock et al.18 and by Beaumont and Nichols.19 Conditional distribution of FST estimates is found to be robust to the details of the neutral model.19 Moreover, the problem of an unknown population history can be reduced by restricting the analysis to simple scenarios that consider a pair of populations rather than several populations simultaneously.20

In recent years, several methods have been developed to detect patterns of natural selection.21, 22, 23, 24, 25, 26 In general, the neutrality test is based on different assumptions and parameters; hence, the detection of outlier loci with more than one statistical approach simultaneously will strengthen the candidate status of a particular locus.27, 28, 29

As knowledge of markers responding to selection is a useful tool for clinical and molecular anthropology studies, in this study, we evaluate the presence of signatures of natural selection in 17 short tandem repeats genotyped in six human populations from the Mediterranean area, applying three different approaches based on different models. We applied so-called ‘multiple marker based neutrality tests,’ which use information from several loci or genomic regions to construct the neutral null distribution on the basis of the variability characteristics of the markers in the analyzed samples.30 Following a candidate gene approach, we choose short tandem repeat loci localized in coding and noncoding regions of genes for enzymes of oxidative metabolism, immunity system and erythrocyte membrane components,31 which are frequent targets of natural selection.32, 33 Erythrocyte membrane components such as spectrin and ankyrin have been associated with spherocytosis, ovalocitosis34, 35, 36, 37 and interactions with different species of plasmodium.38, 39, 40 TNF genes encode for proteins belonging to the group of inflammatory cytokines. In particular, they exert an effect in immunostimulation and modulation of host response to infectious agents and cancer cells.41, 42

Nitric oxide (NO), synthesized by enzymes encoded by NOS genes, has an important role in the innate immune response, such as control of viral, bacterial and parasitic infections.43 It regulates the functioning of the immune system through different mechanisms, dependent and not dependent on cGMP, and modulates cytokine production.43, 44 Genes of oxidative metabolism (SOD3, HO1, PON1) are involved in different processes, including cytoprotective action, removal of radical species and protection against bacterial endotoxins.45, 46, 47

Microsatellites offer the advantage of a multiallelic marker, which is highly informative. They are unlikely to be targets of natural selection, but linkage to a genomic region that has been the target of selection is expected to cause a deviation from neutral expectations.

Materials and methods

Samples were taken from 429 individuals from Spain (N=126), the Balearic Islands (N=62), Sardinia (N=90), Tuscany (Italy) (N=51), Sicily (Italy) (N=47) and Morocco (N=53). The samples were from unrelated individuals of both sexes, born and resident in their countries of origin, as their relatives had been for at least three generations. Informed consent was obtained from all participants of the study. The protocols and procedures used in this research were in compliance with the declaration of Helsinki.

DNA was extracted from whole blood using the standard phenol–chloroform technique. DNA samples were amplified using the PCR method and fluorescent primers as previously described.48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58 The list of polymorphisms analyzed and details of chromosome localization are indicated in Table 1. PCR products were analyzed by an ABI 3730 DNA Analyzer (Applied Biosystems, SA, France). Genotypes were identified using GENESCAN and GENOTYPER softwares (Applied Biosystems).

Table 1 Chromosome location, percentage of significant P-values of Hardy–Weinberg equilibrium and mean number of alleles of markers analyzed

The Hardy–Weinberg equilibrium was tested using the Markov–Chain Method with 1 000 000 steps. Standard diversity indexes were calculated according to Nei.59 To analyze the genetic differentiation between populations, we applied the RST pairwise test with 10 000 permutations.60 These analyses were performed using Arlequin 3.1 software.61

To detect divergent selection, we applied three methods that identify outlier loci on the basis of various estimators of population divergence,19, 21, 22, 23 performing pairwise comparisons between populations to solve the problem of an unknown population history.20 As all applied neutrality tests are based on different assumptions and parameters, the detection of outlier loci simultaneously with more than one statistical approach will strengthen the candidate status of a particular locus.27, 28

The first method developed by Beaumont and Nichols19 and implemented in the software FDIST2, available at http://www.rubic.rdg.ac.uk/~mab/software.html, identifies outlying values of FST in a plot of FST vs heterozygosity using a null distribution based on a symmetrical island model of population structure. This method is based on the assumption that loci showing unusually low or high levels of genetic differentiation are often assured to be subject to natural selection. FST is estimated by the statistic β,62 whereas the expected FST is calculated from the data as the average among loci weighted by their heterozigosity.63 Coalescent simulations are performed using the island model with 100 islands, and the distribution of FST as a function of heterozigosity is characterized by estimating the quantiles of the distribution. We simulated 100 000 independent loci, setting sample size to 50 individuals per population in all simulations.

The second method identifies loci that differ in variability from the rest of the genome by calculating the ratio of gene diversity59 in two populations (RH). LnRH is approximately normally distributed under neutrality.23 After standardization, 95% of neutral loci are expected to have values between −1.96 and 1.96; 99% between −2.58 and 2.58; and 99.9% between −3.29 and 3.29.

The last method is based on a model proposed by Vitalis et al.,21, 64 implemented in the software DETSEL 1.0 (http://www.univ-montp2.fr/~genetix/detsel/detsel.html). The method is based on a model of population divergence by pure random drift. We performed 100 000 coalescent simulations for each pairwise comparison. The nuisance parameters were used in different combinations to generate null distributions with a similar number of allelic states as in the observed data sets. Loci that fall outside the specified ‘probability region,’ compared with the simulated data points, are reported to be potentially affected by selection. This test assumed that no mutations occurred after the divergence of two populations from the common ancestor; hence, we determined whether stepwise-like mutation, in addition to genetic drift, has contributed to genetic differentiation among the studied populations. We calculated whether the observed RST value is significantly larger than randomized RST; in this case, the stepwise-like mutations have contributed to the observed differentiation pattern.65 Analysis was performed with SPAGeDI 1.1,66 using 20 000 permutations.

Results

Results for the Hardy–Weinberg equilibrium test and the mean number of alleles of markers analyzed are indicated in Table 1. Markers showed departures from Hardy–Weinberg equilibrium in 25.5% of tests. Markers with the highest number of departures were NOS1 (CA)n 5′, with 83.3% of significant P-values, followed by TNFe (GA)n and NOS1 (CA)n exon 29 (NOS1 (CA)n e29), with 66.7% of significant departures. On the contrary, HO-1 (GT)n, PON1 (GT)n, NOS2 (CCTTT)n and SPTA (GT)n are in Hardy–Weinberg equilibrium in all populations analyzed (Table 1). The population with the highest number of departures is that from the Balearic Islands (41.2%), whereas Tuscany shows the lowest number (5.8%) (data not shown).

The number of alleles does not show significant heterogeneity across populations (the Kruskal–Wallis test: P=0.701). The population with the highest mean number of alleles is Spain (8.41±5.54), whereas the population with the lowest mean number of alleles is Morocco (6.29±4.57).

In the same way, the observed heterozigosity does not show significant heterogeneity across populations (the Kruskal–Wallis test: P=0.806).

To evaluate the differences between populations analyzed, we performed an RST analysis. We observed a high differentiation across populations. A total of 86.7% of pairwise comparisons are significant, with the exclusion of Sardinia vs Spain and Balearic Islands vs Morocco.

The observed multilocus RST values were significantly higher than permuted RST values (P<0.05) in two pairs of populations: Sardinia–Tuscany and Sicily–Tuscany. For these populations, stepwise-like mutations increased the differentiation between populations.

To apply neutrality tests, we performed pairwise comparisons between the six population samples analyzed, for a total of 15 comparisons for marker, and a total of 255 for the whole set of polymorphisms for each neutrality test (765 comparisons in total). Results for the three analyses are summarized in Table 2. In general, 15.6% of comparisons showed departures from neutrality: 32.8% (P<0.05), 37.0% (P<0.01) and 30.3% (P<0.001). If we consider the different tests, the highest number of comparisons is shown by the F-test (34.5%), whereas the FST test and the LnRH test showed 6.7 and 5.9% significant comparisons, respectively. Comparisons between the three analyses are showed in Table 3. Markers that show departures from neutrality in two tests are SOD3 (GT)n (53.3%); TNFe (GA)n (40.0%); NOS1 (CA)n e29 (20.0%); NOS2 (AAAT)n and NOS2 (CCTTT)n (13.3%); and TNFb (GA)n (6.6%). Markers that show departures from neutrality in three tests are TNFe (GA)n (Sicily vs Tuscany); NOS1 (CA)n e29 (CA) (Sicily vs Morocco); and (Balearic Islands vs Morocco). Results of the analysis of marker NOS1 (CA)n e29 are shown in Figures 1, 2, 3. Otherwise, we can detect populations under selection, if the population is significant in all pairwise comparisons. For marker TNFe (GA)n, the population from Tuscany is significant in three tests in one comparison (with the Sicily population), but significant in two tests in three comparisons (with Sardinia, Morocco and Balearic Islands). For other loci identified, the Moroccan population is significant in the three tests in two pairwise comparisons, and in two tests in other comparisons.

Table 2 Number of significant comparisons for each neutrality test applied
Table 3 Results of each pairwise comparison of neutrality tests applied
Figure 1
figure 1

Results of the FST test for Sicily and Morocco for all markers. Each dot corresponds to a marker. The two external lines correspond to 95% confidence intervals, and the internal line corresponds to median values. Markers out of area (in this case, NOS1 (CA)n e29) are significant at P<0.05.

Figure 2
figure 2

Results of the LnRH test between Sicilian and Moroccan populations. Dotted lines indicate 95.0%, 99.0% and 99.9% confidence intervals.

Figure 3
figure 3

Results of the F test between Sicilian and Moroccan populations with regard to ANK1 (CA)n and NOS1 (CA)n e29. Markers out of area show a level of significance P<0.01. F1 and F2 values are represented in X and Y axes.

Discussion

In this study, we evaluated the presence of patterns of divergent selection using a set of 17 polymorphic microsatellite markers located in different chromosomes. We genotyped a total of 429 individuals from six different human Mediterranean populations. We used three methods based on different models, considering markers that showed significant results in all neutrality tests used, according to an approach adopted by other authors.27, 28

When we compared the results obtained using the three methods, the markers responding to selection were TNFe (GA)n in comparison between Sicilia and Tuscany, and NOS1 (CA)n e29 in comparison between Sicily and Morocco and Balearic Islands and Morocco. If we consider the number of significant comparisons of such populations, we can underlie that the Moroccan population, for NOS1 (CA)n e29, is significant in all comparisons with other populations, in particular in two cases with the three tests, and in other cases in two tests. The same consideration can be made for Tuscany for the marker TNFe.

It is necessary to reject results for Sicily–Tuscany, because the observed RST value is significantly higher than randomized RST (P<0.05), indicating that stepwise-like mutation created a difference between these two populations. Consequently, the results obtained with the F-test, which assumes divergence by pure genetic drift, could be false positives.

We conclude that the only marker that responds to natural selection is NOS1 (CA)n e29, in population comparisons between Morocco and Balearic Islands and Sicily. In particular, the population that responds to selection seems to be the Moroccan population because its comparisons are always significant in two or three tests with other populations.

A limitation of this study is the number of markers used with respect to analysis performed with microarray data. In fact, for this study, we used a candidate gene approach, and hence the study suffers from two limitations of such a type of analysis: a priori hypothesis on the candidate gene used and the confounding effect of the genetic drift.67

Despite these limitations, the results we obtained are validated by the method used, because the simultaneous detection of outlier loci in different tests based on different assumptions and parameters reduces the possibility of false positives.27, 28, 29

Moreover, we highlight that results obtained with regard to signals of natural selection in the genome are similar to those of some previously published papers in which genomic scans with microsatellite markers have been performed.27, 68, 69 In these papers, the percentage of markers responding to selection ranges from 2.1% to 10.6%, whereas we detected a value of 5.9%.

The NOS1 gene codifies for neuronal nitric oxide synthase (NOS), which synthesizes for NO responsible for neurotransmission of NANC synapsis; moreover, it is implied in the physiopathology of several cerebral damages. In addition to the neurons of the central and peripheral nervous system, NOS1 is expressed in skeletal muscles and in the cells of the epithelium of the respiratory system.70, 71 A recent study has reported the first example of NOS1 being involved in the elimination of an infection in mice.72 The NOS1 (CA)n repeat localized in the 3′ untranslated region of exon 29 and the existence of allelic mRNA sequence variation have been reported.73 An association has also been demonstrated between asthma and this dinucleotide repeat polymorphism74, 75 confirming a functional role of this short tandem repeat. NO has been shown to have a crucial role in immunoregulation and it is implicated in host nonspecific defense in a variety of infections such as malaria, toxoplasmosis, leishmaniosis, trypanosomosis and schistosomosis.76

Recently, we suggested a selective pressure by endemic malaria exerted on NOS1 (CA)n e29 alleles that correlated to NO overproduction in Corsica.77 In this paper, we showed that alleles previously correlated with NO overproduction (alleles 16 and 17) are more represented in a sample of β039 carriers than in control samples, suggesting a selective effect of malaria on alleles that correlated with NO overproduction. If we consider allele frequencies in populations analyzed in the present study, we observe that Morocco has the highest frequencies of alleles 16 and 17 with respect to other populations, suggesting a positive selection for these alleles. This result is according to the hypothesis of malaria infection,77 because of a high impact of endemic malaria in the North African region compared with the South European region.78 In Sardinia, where malaria had a high prevalence since eradication immediately after World War II, signals of natural selection in the NOS1 gene were absent, probably due to genetic drift and isolation, which could have modified allele frequencies, and also due to the phenomena of local adaptation.

In recent years, the development of genome-wide technologies has permitted the analysis of natural selection patterns with high resolution. To date, 21 genome-wide scans for positive selection have been performed in human populations, producing maps of positive selection. Several analyses have been performed in worldwide populations, and several regions have been detected as targets of positive selection.67 Positive selection at the NOS1 gene, located in chr12 at 117650979–117799582, has not been detected in these studies. This could be because of local adaptation phenomena, because, as observed in previous studies, signatures of natural selection are not uniformly distributed across populations, but rather show clear spatial heterogeneity. We use different and more restricted populations than other studies that have used HapMap samples or samples from large geographical areas. There are a lot of papers describing spatially varying patterns of selection.79 Local adaptation is due to the environmental heterogeneity that the human population has been confronted with throughout the world after the ‘out of Africa’ migration. Moreover, previous analyses used different statistical methods, which is an important factor in interpreting results across studies,67 and it is important to emphasize that the lack of overlapping of regions under selection detected in different studies is quite common. In 21 studies analyzed by Akey et al.,67 only 722 regions over 5110 totally identified were detected in two or more different studies; 271 were identified in three or more studies and 129 regions were identified in four or more studies.

Considering the performance of three neutrality tests, the F-test performed reveals the highest number of significant comparisons (34.5 vs 6.7% and 5.9% of the Fst test and LnRH, respectively). Similar results were obtained by Vasemagi et al.,28 considering a screening of 17 genomic and 78 expressed sequence tag-associated mini and microsatellites. In their work, discrepancy is lower if they consider a small spatial scale rather than a large spatial scale, explaining this with the effect of mutations and by the moderate effect of genetic drift on the genetic parameters used to infer the outlier loci.19, 21, 22 In our study, inconsistency of results in this way can be explained for only two pairs of populations (Sardinia vs Tuscany; Sicily vs Tuscany), as indicated by the RST permutation test of Hardy et al.65

An assumption that has possibly been violated is the absence of migration. The F-test assumed that no migrants have been exchanged after the divergence of two populations, even if moderate levels of migration do not increase the false positive results.21

In conclusion, the marker that we could consider under divergent selection is NOS1 (CA)n e29. We should underline that significant deviation from neutral expectations using one or multiple tests does not necessarily mean that a particular locus has been affected by selection. These significant results only raise the candidate status of a particular locus and does not demonstrate selection per se,22, 80, 81 because the violation of test assumptions is another factor potentially producing false positive results. This candidate locus will serve as a basis for further sequence analysis to validate the role of divergent selection in this gene.

Several authors have argued that positive selection might be frequent in the genomes of humans and other organisms.82 If this is true, we have the necessary statistical methods for identifying loci that have undergone selection on the basis of comparative data. It will be possible to make systematic searches for genes that have undergone positive selection in the lineage leading to humans and identify the adaptive changes at the molecular level that were important in the evolution of modern humans.

Identifying selection in the genome might very well become one of our most powerful tools for identifying causes for species-specific differences and for identifying genomic regions of functional, and perhaps, medical importance.