Introduction

A major goal in evolutionary biology is to understand how phenotypes and genotypes change in response to selection. Adaptation is characterized by a change in the phenotype of individuals in a population towards a phenotype that best fits the present environment (Orr, 2005; Tiffin and Ross-Ibarra, 2014). One important link that remains to be made is how selection on phenotypes relates to genetic and genomic changes (Ehrenreich and Purugganan, 2006). Considerable attention is currently being paid to understanding the genetic basis of adaptation of living organisms to their environment (Pritchard and Di Rienzo, 2010). The study of the phenotypic and genetic consequences of adaptation calls for different sets of approaches (Franks et al., 2014). The study of selection has generally been based on either the genetic and genomic signature of selection or the phenotypic signature, and more rarely both.

The study of phenotype variation along environmental gradients is certainly one of the oldest approaches used to decipher the action of natural selection and its consequences in terms of phenotypic adaptation (Endler, 1986). However, because in situ observation cannot distinguish phenotypic variation potentially associated with selection from phenotypic plasticity, common garden or reciprocal transplants are generally used to confirm adaptation (West-Eberhard, 2003; Merilä and Hendry, 2014). With the increasing availability of genomic sequencing, it became possible to directly trace the signature of selection at the molecular level (Nielsen, 2001). These approaches use statistical methods that allow identification of an outlier locus (Lewontin and Krakauer, 1973; Watterson, 1979; Hudson et al., 1987; Tajima, 1989; McDonald and Kreitman, 1991). More recent approaches have been developed based on the allele frequency spectrum (Nielsen et al., 2005), or haplotype homozygosity (Sabeti et al., 2002) but also methods that use environmental data to correlate genetic variation and environmental variables (Sgrò and Hoffmann, 2004; Coop et al., 2010; De Mita et al., 2013; Günther and Coop, 2013; McGaughran et al., 2014). These methods led to the identification of several candidate markers and genes, but one always has to keep in mind that selection signatures could be false positives (Sabeti et al., 2006; Hancock and Di Rienzo, 2008; Pérez O’Brien et al., 2014).

Once selection signatures at the molecular level identify candidate genes, their link to phenotypic variation remains to be demonstrated. Different methods have been developed to identify this link. Two of the available methods to validate genotype/phenotype links are linkage mapping approaches using crosses, that is, QTL mapping (Ehrenreich and Purugganan, 2006) and population association mapping. In association mapping, the genomic region controlling variation is identified using existing populations (Bergelson and Roux, 2010). This approach exploits both historical recombination and the natural diversity built up within populations during the evolutionary history of each species (Yu and Buckler, 2006; Beló et al., 2007). Because it is used on actual populations, it is easier to link to the result of detection of selection signature approaches also based on population diversity. This population association method has been successfully applied to search for genes underlying variations in traits in several plant species, including Arabidopsis thaliana (Atwell et al., 2010; Brachi et al., 2010; Li et al., 2010), pearl millet (Saïdou et al., 2009; Mariac et al., 2011), and maize (Remington et al., 2001; Yan et al., 2011; Wallace et al., 2014).

Identifying markers under potential selection, linking their genotype to a phenotype and studying the evolution of this phenotypic trait along an environmental gradient is providing stronger support for ongoing environmental selection (Hoffmann and Willi, 2008), and is a first step toward in situ validation.

In this study, we assessed phenotypic variability in a common garden experiment using wild pearl millet (Pennisetum glaucum) populations sampled along environmental gradients. Wild pearl millet grows up to the limit of the Sahara, in extreme rainfall and temperature environments. This species is the closest wild relative of a cereal that plays an important role in food security in sub-Saharan Africa. Using an association mapping framework, we assessed the link between phenotypic variability and SNP variation in 181 previously identified selected candidate genes. Among the most interesting genes, a Myosin XI was associated with the number of flowers, and consequently could be related to the adaptation of pearl millet to aridity.

Materials and methods

Samples, field experiments and genetic data

We studied 11 populations of wild pearl millet sampled along a North-South aridity gradient, six populations from Niger and five from Mali (Figure 1,Supplementary Table S1). Phenotypic variations in the 11 populations were evaluated in three different trials in Niger and Senegal in 2013 and 2014. The first two field trials were performed in Senegal and Niger during the rainy season in 2013. The first trial was conducted at the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT) field station in Sadore (13°14′N, 2°17′E), Niger. The second was conducted at the Institut Sénégalais de Recherche Agricole (ISRA) field station in Bambey (14°70′N, -16°47′W) in Senegal. A total of 550 individuals, with 50 per population, were sown in each field trial. Plants were randomized; spacing was 1 × 0.8 m in the trial in Sadore and 1 × 1 m in the trial in Bambey. To avoid side effects, cultivated pearl millet lines were used to border the plots used in both trials. Sowing dates were 15 July 2013 in Sadore and 20 July 2013 in Bambey. The trials were conducted under rainfed conditions with supplementary sprinkler irrigation if necessary. The fungicide Eperon (3.88% Metalaxyl-M+64% Mancozeb) was used at the seedling stage. In 2013, for each plant, five inflorescences from the Sadore experiment were selfed and bulked. These progenies were used for the 2014 field trial also in Sadore. Ten progenies per single selfed plant were sown on 3 July 2014 using the exact same protocol of the experimental conditions of 2013. In all three experiments, we phenotyped 11 traits associated with plant morphology and fitness (Supplementary Table S2, Supplementary Figure S1): time from sowing to heading (HT), average spike length (SLA) estimated on five spikes, average spike diameter (SDA) estimated on five spikes, the number of spikes per individual (SNI) at maturity, average number of involucres per spike (ANIS) estimated on five spikes, main stem length (MSL), main stem diameter (MSD), the number of basal branches, the number of aerial branches (NAB) and dry matter weight (DMW) in grams. The average number of involucres per individual (ANII), was obtained by multiplying the ANIS by the NSI. In the 2014 experiment, the phenotypic values were averaged across progeny.

Figure 1
figure 1

Population sampling site. Each circle corresponds to the sample site of one population. The sampling covered a gradient of aridity from latitude 15 to 19 in Niger and Mali. Population latitude and longitude are in decimal degree. The map of Africa at the top left shows the sampling area in Niger and Mali.

Genotyping

A previous study identified 540 contigs having evidences of signature of selection out of 11 155 contigs (Berthouly-Salazar et al., 2016). Theses contigs were consequently good candidates for genes associated with adaptation. This initial study was done on two extreme populations along the aridity gradient in Niger, and in Mali. The 11 populations we studied here were sampled along the two same gradients but do not include these extreme populations. Genotyping was performed on all the individuals from the 2013 field experiment from the 11 populations. The exact same individuals used in the phenotyping in 2013 were genotyped. We used 181 SNPs data (Supplementary Table S3) derived from 113 contigs showing the strongest evidence of signature of selection (Berthouly-Salazar et al., 2016). We also used 35 SNPs (Supplementary Table S3) randomly drawn from the 35 non-selected contigs (Berthouly-Salazar et al., 2016).

Inference of population structure and genetic relatedness

To infer population structure, we used the genotypic data from 35 random ‘neutral’ SNP markers from all individuals phenotyped in the 2013 experiments in Niger and Senegal. We used the Bayesian method STRUCTURE with the admixture model, which considers that the genes / alleles of one individual can have different origins (Pritchard et al., 2000; Falush et al., 2003). A total of 100 000 burn-ins and 100 000 iterations were performed per run. We evaluated from 2 to 20 possible clusters (K) with five runs for each K value. We used STRUCTURE HARVESTER (http://taylor0.biology.ucla.edu/structureHarvester/) to handle files and multiple runs (Earl and vonHoldt, 2012), and to calculate the ad hoc method based on the second-order rate of change of likelihood (Evanno et al., 2005). We used CLUMMP (Jakobsson and Rosenberg, 2007) and DISTRUCT (Rosenberg, 2004) to represent and to obtain coefficients of ancestry for each individual. Here, coefficient of ancestry is a statistical measure of the proportions of alleles for each individual that could be traced to each of a set of K populations. Genetic relatedness between individuals was calculated using the efficient mixed-model association (EMMA) package (Kang et al., 2008) in R (R Core Team, 2015). We used the IBS Kinship matrix method to calculate this matrix using the 35 random ‘neutral’ SNP markers (Kang et al., 2008). The matrix was calculated between all individuals without considering the population level (named KINSHIP). But, we also calculated a second matrix considering relatedness inside population only (named KINSHIPpop).

Analysis of the phenotypic data and association mapping

We first investigated variations in the 11 traits across the 11 populations studied.

We used an analysis of variance (AOV) in R (R Core Team, 2015) to test the effects of population and experimental environment on phenotypic variability. We used the model yijk=μ+αi+βj+λij+εijk, where yijk is the phenotype of individual, α is the effect of population, β is the effect of experimental environment and λ is the effect of interaction between α and β. The value ε is the residual error effect and μ the grand mean. The indices i, j, k are respectively the population, the experimental environment and the individual studied. We visualized the error distribution of all traits studied using R. We also used a Box-Cox transformation in R to normalize our phenotypic data for new AOV. We compared the AOV results obtained for the original phenotypic data and the transformed data for this experiment.

Next, we investigated how the phenotypic variation was distributed along the two gradients. We extracted 19 climatic variables (Supplementary Table S4) from WORDCLIM (http://www.worldclim.org; Hijmans et al., 2005). We performed a principal component analysis (PCA) using climatic and geographic coordinates to synthesize climatic information (Supplementary Table S4). We then estimated correlations between phenotypic variation and latitude, longitude and the climate data, including axis 1 and axis 2 of the PCA using the Kendall non-parametric method of correlation available in R.

The STRUCTURE software analysis identified 11 clusters; they did not perfectly fit the 11 populations. AOV analysis showed that original population controls (Model ‘POPULATION’) improved the number of false positive compared to Bayesian clustering (Model ‘STRUCTURE’). For that reason, for the identification of SNPs, which are significantly associated with variations in morphological traits, we first extracted residuals of the AOV, which only takes the original population into account. We used these corrected phenotypes (residues) and the matrix of kinship coefficient (K) calculated between all individuals to apply a linear mixed model association (Model ‘POPULATION+KINSHIP’) via t-test with REML estimates (emma.REML.t) from package EMMA (Kang et al., 2008) in R (R Core Team, 2015). To assess robustness of our results to this model choice, we compared the result of our analysis first with a linear mixed model with the matrix of kinship coefficient only (Model ‘KINSHIP’) and then with a model with a correction for population only (Model ‘POPULATION’). We used the Quantile-Quantile (Q-Q) plot method to compare the P-values obtained with the four models previously cited and the naïve or ‘NULL’ model without any corrections. We also assessed significance of our SNPs with another linear mixed model association (Model ‘POPULATION+KINSHIPpop’), which used the corrected phenotypes (residues) and the matrix of kinship coefficient considering relatedness inside population only.

We used a false discovery rate (FDR) of 5% for the control of the error rate under multiple testing (Benjamini and Hochberg, 1995). We then also only considered SNPs that had a significant effect (5% FPR threshold) on phenotypes in both the 2013 trials (Sadore+Bambey) and 2014 (Sadore).

For these significant SNPs, we calculated and statistically assessed the correlation of the SNP frequency with latitude separately for the populations from Niger and Mali, and compared the two correlations. We also calculated the correlation for all the SNP frequencies (‘neutral’ and ‘selected candidate’) with the latitude and the first axis of the PCA on all the 11 populations using Pearson’s method in R (R Core Team, 2015). To assess the significance of correlation coefficients, we compared them to the histogram of distribution of R2, showing all SNP frequencies with the latitude and the first axis of PCA, respectively. We constructed a histogram of the frequency distribution of the correlation coefficient (R2) of 35 ‘neutral’ and 216 ‘selected candidate’ SNPs with latitude and the first axis of PCA. We also assessed the significance of each correlation with a t-test in R (R Core Team, 2015).

Interesting contigs, that is, that contain SNP(s) with significant association, were BLASTed (BLASTN, http://blast.ncbi.nlm.nih.gov) against the National Center for Biotechnology Information (NCBI) database. We used the nucleotide collection (nr/nt) database and an algorithm for highly similar sequences.

Contig nucleotide sequences (Supplementary Table S5) were translated using the ExPASy server (http://web.expasy.org/translate/; Gasteiger et al., 2003). Polymorphism changes were assessed for synonymous or non-synonymous substitution.

Results

Phenotypic analysis

With lower rainfall and higher temperature, we observed key phenotypic changes in wild pearl millet populations. In the Niger and Senegal trials in 2013, there were 430 and 204 survivors, respectively. In the 2013 experiments, significant effects of population (P<0.001) and strong environment effects (P<0.001) were detected for all traits except the number of aerial branches (Supplementary Table S6-2013). Less variance was explained by the interaction between environment and population than by direct population and environment effects (F1, 718=2.27 to 9.58, P<0.001). No significant P × E interaction was found for heading date, main stem and spike diameter (Supplementary Table S6). With this AOV, we also found that the distribution of the errors for all traits studied were rather similar to a normal distribution (Supplementary Figure S2). Nevertheless, using a Box-Cox transformation gave the same results (Supplementary Table S7). A total of 260 progenies had enough seed to be included in the 2014 experiment. In the 2014 experiment, we still detected a significant population effect on heading date, average number of involucres per spike, spike length and diameter, and main stem length (P<0.01). The lack of significance for the other traits may be due to the smaller number of individuals available in the 2014 experiment (260 compared to 634 in 2013), which reduced statistical power.

We found that the plants tended to flower early, reduce seed production and overall dry mass at higher latitudes, with lower rainfall and higher temperatures. Most of the traits showed a significant negative correlation with latitude in both 2013 and 2014 (Supplementary Table S8). Similar results were obtained in the Bambey experiment. In all three experiments, the strongest and negative correlations were found for average spike length.

Correlations between phenotypes and longitude were less significant (Supplementary Table S9). Almost no significant correlations were found in the 2014 experiment. Only heading date, number of spikes per individual and dry matter weight were found to be significantly correlated to longitude in both Sadore and Bambey in 2013; but heading date was positively correlated in Sadore and negatively correlated in Bambey.

These correlation studies showed that the observed phenotypic variation is mainly explained by the effect of latitude (Supplementary Table S8). The effect of longitude was much less pronounced (Supplementary Table S8).

The first axis of the PCA (Supplementary Table S10) mainly explained rainfall variables and to a lesser extent temperature seasonality. The rainfall variables that contributed to the formation of this axis were annual precipitation, precipitation in the wettest month and the wettest quarter (Supplementary Table S10). The first axis explained 59% of the variance and the second axis 25%. In the 2013 experiment, the first axis of the PCA revealed significant positive correlations among all the traits studied with the exception of heading date, which was correlated with the second axis of the PCA (Supplementary Table S11 and S12). However in the 2014 experiment, a positive correlation was detected for heading date with first axis of PCA (Supplementary Table S10). In all three experiments, the strongest correlation was found for average spike length. Overall, the second axis of PCA explained smaller effects (Supplementary Table S11 and S12).

Analysis of the genetic structure of wild pearl millet populations

A total of 35 random ‘neutral’ SNP markers on 634 individuals were used for population structure studies. The second order rate of change of the likelihood showed a maximum for K=11 (Supplementary Figure S3), supporting 11 clusters as one of the possible scenarios. At K=2, the two major clusters mainly separated the two gradients (one from Mali and the other from Niger; Supplementary Figure S4). Individuals from Niger tended to be in the first cluster (Proportion correctly assigned) and individuals from Mali in the second cluster (Proportion correctly assigned). From K=3 to K=11, different populations appeared mainly corresponding to populations as sampling units. At K=11, the clusters tended to correspond to the sampled populations, but individual coefficient of ancestry was noisy. Indeed, inside a population, each individual was not perfectly assigned to its respective population (Supplementary Figure S4).

Association study

The control of population structure was mostly captured by the population of origin rather than by inferring the coefficient of ancestry. Both the individual coefficient of ancestry (Model STRUCTURE) and the individual population of origin (Model ‘POPULATION’) corrected genotype/phenotype associations better than a naive or ‘NULL’ model (Figure 2, Supplementary Figure S5), but we found that for most traits, control was better using the population of origin (Supplementary Figure S5). Very few traits, including the number of basal branches, did not covary with population structure and consequently all models gave very similar results (Supplementary Figure S5). In our final association study model, we consequently used the population of origin rather than inferred coefficient of ancestry. We also added a kinship matrix between individuals to the statistical analysis (Model ‘POPULATION+KINSHIP’). The Q-Q plot distribution for all five models also showed that the model ‘POPULATION+KINSHIP’ reduced the number of false positives better than the ‘NULL’,‘STRUCTURE’, ‘POPULATION’ and ‘KINSHIP’ model (Supplementary Figure S6). This last model corrected only for kinship between individuals.

Figure 2
figure 2

The Q-Q plot method distribution. The quantile-quantile (Q-Q plot) method distribution was shown for a null model, considering structure and considering the original source population of the individual. Phenotype is average number of involucres per spike (ANIS) estimated on five spikes in 2013 experiments. Axes represent the observed P-values versus the expected P-values. NULL corresponds to the model where there is no correction for structure and the original source population of the individual, STRUCTURE to the model with correction for structure and POPULATION to the model with correction for original source population. The gray line corresponds to distribution of observed P-value equal to expected P-value. The three different models were performed based on analysis of variance (AOV). The P-values were calculated using R.

We tested 216 SNP associations using model ‘POPULATION+KINSHIP’ with the 11 phenotypic traits in 2013 and 2014 (Supplementary Table S13a). Using a 5% FPR threshold, we detected a total of 86 and 81 SNPs associated with at least one trait in 2013 and 2014, respectively (Supplementary Table S13a). The traits with the highest number of associations were the number of spikes per individual (15 SNP) in 2013 and the average spike diameter (17 SNPs) in 2014 (Supplementary Table S13a). Considering the two experiments, the number of spikes per individual had the largest number of associations (23 SNPs). Results were similar if we used the KINSHIPpop matrix (Supplementary Table S13d)

When we used a FDR of 5%, SNP22 was associated with the length of the main stem, the diameter of the main stem and dry matter weight in 2013, and SNP210 was associated with dry matter weight in the 2014 experiments (P<0.05). However, the allele frequency of these SNPs was low, so these associations might be spurious (Supplementary Table S13a). We also found that SNP20 and SNP21 associated significantly with the average number of involucres per spike in both 2013 and 2014 (P< 0.05). These two SNPs are located on the same contig and show high LD (r2=0.82). The same two SNPs were also associated with significant P-values in analysis using ‘POPULATION’ or ‘KINSHIP’ (Supplementary Table S13b, c).

The allele C frequency of SNP20 and SNP21 (Supplementary Table S14) increased (R2>0.79, P<0.007) with latitude (Figure 3,Supplementary Table S15) and decreased (R2>0.7, P<0.001) with the first axis of PCA (Figure 4, Supplementary Table S15). The R2 values of SNP20 and SNP21 obtained with the correlation of the latitude and the first axes of PCA were extreme compared to the overall distributions, and fell out of the neutral SNP distribution (Supplementary Figure S7, Supplementary Table S15).

Figure 3
figure 3

Correlation of SNP frequency with latitude. The correlation of SNP20 (a) and SNP21 (b) frequency in the populations from Niger and Mali with the latitude. R2=0.79 for SNP20 (Niger, P<0.05), 0.79 for SNP20 (Mali, P<0.05), 0.86 for SNP21 (Niger, P<0.05) and 0.79 for SNP21 (Mali, P<0.01).

Figure 4
figure 4

Correlation of SNP associated frequency in the populations with the Axis 1 of PCA. The correlation of SNP20 (a) and SNP21 (b) allele frequency in the populations with the first axis of PCA notably explained by rainfall variables. R2=0.74 for SNP20 (P<0.01) and 0.71 for SNP21 (P<0.01).

The same two alleles (C and T) were identified for SNP20 and SNP21. For the two SNPs, genotype T/T had a greater average number of involucres per spike than genotype C/C and T/C (P<0.05). The average number of involucres per spike of SNP20 was 164.9 (SE±4.33) for C/C, 200.8 (SE±9.73) for T/C and 273.6 (SE±34.40) for T/T in 2013. Similar trends were observed in the two experiments (2013 and 2014) and for the two SNPs (Figure 5, Supplementary Figure S8).

Figure 5
figure 5

The average number of involucres per spike (ANIS) by genotypes. We presented here the average number of involucres per spike (ANIS) by genotypes (C/C, T/C and T/T) for SNP20. The SNP20 showed a significant effect on ANIS. On the y axis, The means value and standard errors of ANIS for each genotype of the SNP20 are given.

The average number of involucres per spike (P<0.001) and frequency of allele T (P<0.05) decreased with latitude. So southern individuals had a high probability of having a T/T genotype and consequently a high average number of involucres per spike, whereas northern individuals had a high probability of having a C/C or T/C genotype with the small average number of involucres per spike. When performing a BLASTN against GenBank database, we found that the sequence of the contig carrying both SNPs (Supplementary Table S3) shared 83% identity with the sequence of the gene myosin XI identified in Oryza brachyantha. The two mutations are synonymous or located in the 3′untranslated region (3′UTR).

Discussion

More compact phenotype with drier climate

In pearl millet, more compact phenotypes are associated with lower rainfall and higher temperatures during the growing season. Wild pearl millet originating from the drier area flowers earlier has shorter spikes with a smaller diameter, a smaller total number of spikes and spikelets, and a shorter stem with a smaller diameter. In the present study, we found that dry mass decreased with increasing latitude and a drier environment. This change was observed in the two field studies in 2013, but not all the traits showed a statistically significant pattern in 2014. This difference can certainly be explained by the fact that fewer plants were studied in 2014, and we consequently had less statistical power.

Plants typically express phenotypic variation along environmental gradients (Teklehaimanot et al., 1998; Ivancich et al., 2012). With the lower rainfall and higher temperatures, as expressed by the first PCA axis, the phenotypic traits of these pearl millet populations allow seeds to be produced with less overall investment in aboveground biomass. This is a well-known trade-off in evolutionary biology for an adaptation to a more stressfull environment (Chapin et al., 1993): rapid initial growth but relatively less production of aboveground biomass. In the present case, if the total number of flowers (and ultimately seed) is changed, it is because the number of flower per inflorescence is changed not the number of inflorescence overall. Producing an inflorescence more rapidly certainly implies producing less flower per inflorescence. So rapid initial growth (of an inflorescence) is certainly key feature for this adaptation. Similar observations have been made in temperate climates, with in this case, a shorter growing season associated with winter and frost (Jonas et al., 2008; McKown et al., 2014). In Populus trichocarpa, biomass accumulation and growth rates and ecophysiological traits correlated strongly with latitude, maximum day length and temperature in the area of origin of the tree (McKown et al., 2014). In dryland areas, the height of Eucalyptus trees for a given trunk diameter declines with decreasing rainfall from 2000 to 300 mm and increasing dry season length (Cook et al., 2015). In Chaetanthera moenchioides populations derived from the drier gradient showed significantly shorter flowering and fruiting phenology and smaller capitula than the other populations (Bull-Hereñu and Arroyo, 2009). Taken together, these studies suggest that in harsher environments, reduced development and less investment in aboveground biomass may be a widespread strategy (Chapin et al., 1993). In conclusion, we highlighted a decrease in the total number of flowers due to the decrease in the size of the inflorescence. We showed non-significant changes in the number of inflorescences. Among the different phenotypic trade-offs, only a decrease in the size of the inflorescence appears to have been selected.

Gradient, pseudo-replication and association genotype /phenotype

The genetic differentiation between the two gradients was identified by the STRUCTURE analysis. We can therefore consider that the experiment with individual plants from Niger was a pseudo replication of the experiment with individual plants from Mali. Consequently, correlations of the SNP in both gradients are a somewhat natural repetition of evolution against similar climatic conditions.

We also showed that information concerning the original sampling of individuals was better than coefficient of ancestry at controlling genetic structure (Astle and Balding, 2009; Hubisz et al., 2009). This better correction is likely due to the small number of ‘neutral’ SNPs used here and the consequently poorer estimation of plant individual coefficient of ancestry. Accordingly, the direct use of the population of origin to control for false positive associations is recommended in this particular case (Zhao et al., 2007; Kang et al., 2008; Simko and Hu, 2008). We refined the model to include the population of origin and a kinship matrix as this approach is widely recommended (Kang et al., 2008; Yu et al., 2006; Saïdou et al., 2009; MacKenzie and Hackett, 2011).

Two SNPs (SNP20 and SNP21) on the same contig were shown to be significantly associated with the average number of involucres per spike (ANIS) in both the 2013 and 2014 experiments. Several results suggest that these SNPs are of particular interest. Their frequencies were significantly associated with latitude in both Niger and Mali gradients. Their association was stronger than any other SNP in both the ‘neutral’ and ‘selected candidate’ histogram of the distribution of correlation coefficients. We highlighted the fact that the distribution of ‘neutral’ and ‘selected’ candidates was similar with the exception of a particular bump in the distribution in the ‘selected’ candidates with a higher correlation with latitude and with the first axis of the PCA. Those SNPs might be in genes and alleles of interest for the study of adaptation to these gradients. These adaptations may also correspond to phenotypic variations that were not studied here, since they did not show up in our association analyses.

To sum up, we found an association with phenotypic traits, a correlation with the latitude/environment data and we also observed that the average number of involucres per spike (ANIS) in populations decreased significantly with latitude. The sequence surrounding these two SNPs shared 83% identity with a myosin XI gene. The myosin XI is responsible for cytoplasmic streaming and transport of intracellular organelles (Shimmen and Yokota, 2004). When RNA interference was used to specifically silence the myosin XI gene, an effect on tip growth was demonstrated in Physcomitrella patens (Vidali et al., 2010). Moreover, the use of transgenic Arabidopsis thaliana plants expressing different amounts of myosin XI showed that this gene plays an important role in variations in plant size (Tominaga et al., 2013). The transgenic plants that expressed high- and low-speed moving myosin XI along the actin bundle produced respectively large and small plants compared to the wild control (Tominaga et al., 2013). Although the study of this gene requires further validation, the literature suggests that changes in this gene may actually affect growth.

One SNP is a synonymous mutation and the other a non-coding mutation. We could not rule out the two SNPs might be the real ‘causal’ polymorphism affecting, for example, expression regulation, or simply being in linkage disequilibrium with real causal SNPs. A recent study of pearl millet showed that the association signal rarely goes further than the genes studied (Saïdou et al., 2014). Consequently, there is a high probability that the causal SNP lies within the myosin XI gene itself. Final validation of the gene might be achieved through a finer study of the region, and/or by functional validation.

A very recent study of pearl millet domestication highlighted a signature of selection associated with domestication around the myosin XI gene (Varshney et al., comm. Pers.). In the cultivated millet, diversity was depleted and strong differentiation was observed between a representative sample of wild and cultivated plants. This result strongly suggests that the polymorphism found in the wild relative was targeted during the domestication of pearl millet. We still do not know for what specific characteristic myosin XI was selected during pearl millet domestication, but the present study suggests that myosin XI is associated with the increase in the number of flowers. The number of flowers is one of the strong and important traits selected during crop domestication, including that of pearl millet.

Conclusion

In this study, we have demonstrated that wild pearl millet shows significant phenotypic variation along environmental gradients and that the control of population structure is mostly captured by the population of origin rather than by inferring coefficient of ancestry. We identified two SNPs on the same contig associated with the average number of involucres per spike (ANIS). The sequence of this contig shares 83% identity with the myosin XI gene. The involvement of myosin XI in variations in the average number of involucres per spike could now be validated by functional studies such as the study of variation in the expression of this gene along the environmental gradient.

Data archiving

The data is available at the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.mn3g7