Introduction

Color vision allows organisms to discriminate differences in spectral composition, irrespective of the light intensity (Jacobs and Rowe, 2004), that helps them to find food, avoid predators and select their mates. Visual photoreception is mediated by visual pigments that comprise a protein moiety, opsin and a chromophore, either 11-cis-retinal or 11-cis-3,4-dehydroretinal (vitamin A1 or A2 aldehyde, respectively). Vertebrate visual opsin genes are classified into five types: one type expressed in the rod photoreceptor cells, that is, RH1 (‘rhodopsin’), responsible for dim light perception, and four types expressed in the cone photoreceptor cells, that is, middle to long wavelength-sensitive (M/LWS, ‘red–green’), RH2 (‘green’), SWS2 (‘blue’) and SWS1 (‘violet-ultraviolet’) that are responsible for bright light and color perception (Yokoyama, 2000). Mutations in the coding regions of opsin genes can affect the structure of opsin proteins and cause changes in the peak absorption spectra, λmax, of the chromophore (Yokoyama and Radlwimmer, 1998).

Opsin genes are known to have an important role in ecological adaptation and even in reproductive isolation. In many cichlid fishes, for example, allelic variations of the opsin genes have been reported among populations and closely related species (Terai et al., 2002, 2006). In Lake Victoria, the allelic variation of the M/LWS opsin genes among populations is explained by their adaptation to different light environments (Terai et al., 2006). Two sympatric cichlids (Pundamilia pundamilia and Pundamilia nyererei) that inhabit different water depth differ in the allelic compositions of their M/LWS opsin genes with different λmax, and they are adapted to the spectral composition of their aquatic environment (Maan et al., 2006). This adaptation is also hypothesized to result in a female preference for different male nuptial coloration, where the M/LWS allele is the most sensitive, thereby leading to sympatric speciation (a typical example of sensory-driven speciation) (Seehausen et al., 2008).

The guppy (Poecilia reticulata) is an important evolutionary and ecological model organism because of its high phenotypic variation. The male body color patterns are highly polymorphic and are considered to be related to predation pressure and female preference (Houde and Hankes, 1997; Breden, 2006). Females prefer some components of these male color patterns. This female preference is heritable (Houde and Endler, 1990) and varies within and between populations (Endler and Houde, 1995; Houde and Hankes, 1997; Breden, 2006; Schwartz and Hendry, 2007). This suggests that variation in female preference could drive variation in male color patterns (Brooks, 2002), although other selective agents possibly have additional roles (Schwartz and Hendry, 2007). Spectral measurements of the cone photoreceptor cells of Trinidadian guppies using microspectrophotometry (MSP) detected a considerable amount of interindividual variations in the spectral sensitivity of cells to the middle to long wavelength ranges (Archer et al., 1987; Archer and Lythgoe, 1990). These cells were classified into three classes, where λmax was centered at 533, 548 and 572 nm, and individuals exhibit either one, two or three of these classes (Archer and Lythgoe, 1990). The MSP absorption curves of the guppy cones were reported to be fitted better by an A1 chromophore ‘rhodopsin’ template than by an A2 chromophore ‘porphyropsin’ template that suggests that the spectral variation of the cones is not because of possible differences in A1 and A2 chromophore usage among cells (Archer et al., 1987; Archer and Lythgoe, 1990). The 533 and 572 nm cells exhibited a narrow λmax distribution range, whereas the 548 nm cells had a broad range, implying that the 548 nm cells may represent a group of cells that express both the hypothetical 533 and 572 nm opsin genes simultaneously in the same cell with varying ratios (Archer and Lythgoe, 1990). Endler (1992) suggested that this sensory variation could contribute to differences in female preference.

Hoffmann et al. (2007) first reported several different sequences belonging to the M/LWS and other types of opsin genes, and this genetic variation was present both within and between individuals of guppies. Multiple M/LWS sequences were also reported independently by Weadick and Chang (2007). These studies did not demonstrate whether this variation represented alleles, paralogs or both. Subsequently, the paralogous composition of the guppy M/LWS opsins was found to comprise four genes (Ward et al., 2008), that is, LWS-1, LWS-2, LWS-3 and LWS-4, where the nomenclature of the first three genes follows that in the study by Sandkam et al. (2013) and that of the last was renamed in this study, for simplicity. Downstream of two SWS2 genes (SWS2-A and SWS2-B), the LWS-1, LWS-2 and LWS-3 genes are physically linked in this numerical order, with LWS-3 in the opposite orientation (Watson et al., 2011). LWS-4 is located in a different linkage group and lacks introns (Ward et al., 2008), except intron 1 (Watson et al., 2010, 2011). Based on the amino acid that corresponds to residue 180 of the human M/LWS opsins that is known to affect λmax, the LWS-1, LWS-2, LWS-3 and LWS-4 genes were originally designated as A180, P180, S180 and S180r, respectively. More recently, studies have demonstrated intraretinal variations and age/sex differences in opsin gene expression in guppies (Laver and Taylor, 2011; Rennison et al., 2011). Despite these advances, the genetic basis of interindividual sensory variation and the selection forces that maintain this variation remain unexplored.

In the present study, we investigated nucleotide polymorphisms in four M/LWS opsin genes of guppies in 10 populations from Trinidad and Tobago that inhabited various light environments. In addition, we analyzed the blue-sensitive SWS2-B and ultraviolet-sensitive SWS1 opsin genes for comparison as well as seven non-opsin nuclear loci as (potentially neutral) reference genes. First, we quantified the allelic variation of amino acid sites that are known to affect spectral sensitivity in the M/LWS-type opsin genes, that is, residues 180, 197, 277, 285 and 308 (the numbering follows that of the human M/LWS opsins throughout this paper) (Yokoyama and Radlwimmer, 1998, 2001). Next, we examined the overall pattern of nucleotide variation in the M/LWS opsin genes within and between populations, and determined whether it was consistent with a model of neutral evolution and demographic effects based on comparisons with the reference genes. We detected an allelic variant at residue 180 in LWS-1. We also found population genetic evidence for divergent natural selection among populations of nucleotide variation in the linkage group of SWS2-BLWS-3. We also investigated the relationships between several environmental variables and opsin divergence.

Materials and methods

Sample populations

In Trinidad, guppies were sampled from two rivers (Aripo and Guanapo) within the Caroni Drainage, one river (Yarra) within the Northern Drainage as well as one lake (Pitch Lake) and three rivers (Visigney Beach, Coffee and Vance River) in southern Trinidad (Figure 1). Guppies were sampled from both the upper and lower reaches of Aripo and Guanapo and the lower reaches in Yarra. Samples were also collected from the Lambeau River in Tobago. Adult male and female guppies were collected using seine nets from the shallow parts (<1 m water depth) of rivers and ponds. We randomly sampled individuals from multiple shoals at each sampling site. The fish were killed with an overdose of tricaine methanesulfonate (MS222) and preserved in 99.5% ethanol. A total of 429 individuals were collected from 10 wild populations in these areas of Trinidad and Tobago. The abbreviations of the populations are as follows: Upper Aripo (UA), Lower Aripo (LA), Upper Guanapo (UG), Lower Guanapo (LG), Lower Yarra (LY), Pitch Lake (PL), Visigney Beach (VB), Coffee (CO), Vance River (VR) and Lambeau River (LR) (Figures 1 and 2). The Aripo and Guanapo rivers have a series of waterfalls and weirs. Therefore, migration appears to be biased downstream because of floods, particularly during the wet season (van Oosterhout et al., 2007). Upstream migration over the waterfalls is severely limited, although the rate of gene flow is considerably higher than the mutation rate (Carvalho et al., 1991; van Oosterhout et al., 2006; Barson et al., 2009; Willing et al., 2010). The Caroni drainage is considered to be a source–sink metapopulation, where the downstream populations represent a ‘super sink’ that receives immigrants from the upstream populations (Barson et al., 2009). Northern drainage is isolated from Caroni drainage, but there is disrupted natural migration among drainages (Magurran, 2005). Southern Trinidad is remote from both the drainages in the north (Figure 1), and it appears that there is little or no migration between the south and north areas based on the marked divergence of 720 single-nucleotide polymorphisms (Willing et al., 2010).

Figure 1
figure 1

Locations of the sampling sites in Trinidad and Tobago. Upper Aripo (UA), Lower Aripo (LA), Upper Guanapo (UG) and Lower Guanapo (LG) are tributaries of the Caroni Drainage. One population, Lower Yarra (LY), came from a tributary of the Northern Drainage. Pitch Lake (PL), Visigney Beach (VB), Coffee (CO) and Vance River (VR) are located in southern Trinidad. Lambeau River (LR) is in Tobago.

Figure 2
figure 2

Allele frequencies of 6 opsin genes, that is, SWS2-B, LWS-1, LWS-2, LWS-3, LWS-4 and SWS1, in the 10 guppy populations from Trinidad and Tobago. The different colors shown in the pie charts indicate different alleles that could be distinguished by non-Syn nucleotide differences. The tables on the right indicate the sites with amino acid variations. The number indicates the amino acid site. In LWS-1, amino acid 180 is one of the five sites known to affect the absorption spectra of opsins.

Aquatic light environments

Aquatic environmental data were collected at the sampling sites in Trinidad during the dry season in 2003. These measurements included the percentage canopy coverage that was determined by capturing images using a 35 mm wide angle lens with the camera pointed straight up while standing in the center of the sampling location. The percentage shade covering the water around noon was estimated by two independent observers. The dissolved oxygen (DO) and conductivity were measured using a YSI 55 multimeter probe (YSI Inc., Yellow Springs, OH, USA).

The canopy cover, shade and DO are appropriate indices for assessing aquatic light environments. Direct light spectrum data were not available in the present study. It should also be noted that light measurements such as the light spectrum and intensity per site vary greatly depending on the cloud cover and time of day, whereas the canopy cover does not vary temporally and has a significant effect on the light strength and spectrum in aquatic environments. The DO level is also closely related to the level of photosynthesis, the type of phytoplankton present and light penetration that affect the light intensity and color of aquatic environments (Twomey et al., 2009). Thus, canopy cover and DO are appropriate and stable proxies for the light regime in aquatic environments.

Determination of the opsin nucleotide sequences

We isolated genomic DNA based on the acetyltrimethylammonium bromide protocol or using a Qiagen DNeasy96 kit (Qiagen, Crawley, UK). We designed PCR primers (Supplementary Table S1) based on the untranslated regions of six opsin genes (SWS2-B, LWS-1, LWS-2, LWS-3, LWS-4 and SWS1), according to published sequences of Cumana guppy opsin genes (Watson et al., 2011) (GenBank accession numbers HM540108 and HM540107) and our unpublished data. The complete coding sequences of these opsin genes were amplified using these primers. The amplification reactions followed a PCR protocol with a volume of 50 μl that contained 2.5 μl of the genomic DNA template, 5.0 μl 10 × ExTaq Buffer, 1 μM dNTP mixture, 0.5 μM MgCl2, 1 unit ExTaq (Takara Biotechnology Co., Tokyo, Japan) and the forward and reverse primers. The amplification conditions were as follows: 3 min at 95 °C; 35 cycles of 30 s at 94 °C; 30 s at 60 °C; and 4 min (LWS-1 and LWS-3), 6 min (LWS-2), 3 min (LWS-4) or 2 min (SWS1 and SWS2-B) at 72 °C, followed by 7 min at 72 °C. All the PCR products were purified by polyethylene glycol precipitation. In our analyses, we excluded LWS-2 from the CO, VR and LR populations because the PCR amplifications of these samples were not successful.

The sequencing reactions were performed using a BigDye Terminator v3.1 Cycle Sequencing Ready Reaction Kit (Applied Biosystems, Tokyo, Japan) with the PCR primers and additional internal primers. The sequences were determined in both strands using ABI3130 and ABI3170 Genetic Analyzer systems (Applied Biosystems). The sequences obtained were edited and aligned using CLUSTALX (Larkin et al., 2007), MAFFT (Katoh and Toh, 2010) and Se-Al (Rambaut, 1996). The haplotype frequencies were estimated using PHASE 2.1 that uses Bayesian algorithms (Stephens et al., 2001; Stephens and Donnelly, 2003) in DnaSP ver. 5 (Librado and Rozas, 2009). Deviation from a Hardy–Weinberg equilibrium was tested using Arlequin ver. 3.1 (Excoffier et al., 2005). The alignments of all types of coding sequences are shown for each gene in Supplementary Figure S1. GenBank accession numbers were assigned to the most frequent sequence type for each gene (for LWS-1, one each of A-type and S-type are given that were distinguished at residue 180): A-type LWS-1 (AB748984), S-type LWS-1 (AB748985), LWS-2 (AB748986), LWS-3 (AB748987), LWS-4 (AB748988), SWS2-B (AB748990) and SWS1 (AB748989).

Determination of reference sequences

We used non-opsin reference genes (ACTB_ORYLA, ADK, KCND2, MAB21L1, NABI, RBM4 and SIX3) to detect the natural selection acting on opsin genes (Tezuka et al., 2012). We selected 8 or 9 individuals randomly from each of the 10 populations and analyzed their reference sequences. We used the method developed by Tezuka et al. (2012) to obtain the reference sequences. Based on genomic information from the fully sequenced organisms, we designed primer pairs that we expected to amplify DNA fragments (300–700 bp) of the seven genes. Among these fragments, we selected primer pairs for DNA amplification of P. reticulata sequences where the sequence identities between the corresponding regions of the fully sequenced organisms were 75–90% that corresponded to the average identities of the exonic and/or intronic parts of all the genes in the genome (Tezuka et al., 2012). Thus, the selected DNA fragments could be random samples from the whole genomes with respect to the degree of sequence conservation; therefore, we could only use these fragments as references for comparison with the focal opsin genes.

The haplotype frequencies were estimated using PHASE 2.1 (Stephens et al., 2001; Stephens and Donnelly, 2003) in DnaSP ver. 5 (Librado and Rozas, 2009).

Summary nucleotide variation statistics

We calculated summary statistics for the reference genes and opsin genes for each population (Supplementary Tables S2 and S3) using DnaSP version 5 (Librado and Rozas, 2009). The ratio of nonsynonymous (non-Syn) to synonymous (Syn) substitutions in the six opsin genes was examined using Fisher’s exact test (http://www.physics.csbsju.edu/stats/exact.html). The nucleotide diversity (π) is the average number of pairwise nucleotide differences per site that can be a direct estimator of the population mutation rate, θ, in the infinite site model under the standard Wright–Fisher model (Tajima, 1983). The population mutation rate is defined as θ=4, where N is the effective population size and μ is the mutation rate per nucleotide site per generation. The number of polymorphic (segregating) sites among sequences (S) also provides another estimator of θ when adjusted using the sample size (n) and the length of sequenced region (L). This estimate is referred to as θW (Watterson’s θ; Watterson, 1975). Tajima’s D evaluates the difference between π and θW that are expected to be identical under the standard Wright–Fisher model (Tajima, 1989).

Deviations from Hardy–Weinberg heterozygosity within populations were expressed as FIS=1–Ho/He, where Ho is the observed heterozygosity and He is the expected heterozygosity (Supplementary Tables S4 and S5). We calculated Ho and He using Arlequin ver. 3.1 (Excoffier et al., 2005). Differences in FIS among populations and genes were tested using a General Linear Model, where FIS for each gene and population was the response variable, and population and gene were random factors. Random effects models are used when the treatments are not fixed, that is, when the various factor levels are sampled from a larger population. Genetic differentiation, FST, was calculated as (πTotalπ-Within)/πTotal, where πTotal and π-Within are the π values for all populations and each population, respectively.

Coalescent simulations of Tajima’s D test for balancing selection

We used Tajima’s D statistic as a test of neutrality to detect balancing selection in the opsin genes. Given that Tajima’s D is sensitive to demographic effects, we first estimated the parameters of a rough demographic model based on the non-opsin reference regions and we then obtained the null distribution of Tajima’s D statistic. The procedure is similar to that reported by Hiwatashi et al. (2010) and Innan (2006).

We performed a coalescent simulation to infer demography. For the coalescent simulation, we used the ‘ms’ program (Hudson, 2002) that can simulate nucleotide polymorphism patterns under the infinite site model with population structure. We tested the hypothesis that balancing selection may operate on opsin genes that would probably make Tajima’s D value positive. Therefore, to detect balancing selection, a major concern is population structure that can also result in a positive Tajima’s D value. There appeared to be some fragmentation within the sample populations because most of the Tajima’s D values were positive in the reference regions (Supplementary Table S2). Our assumption of constant population size (in the upland sites) appeared to be reasonable because Barson et al. (2009) found that even small upland guppy populations were stable and approached mutation–drift equilibrium. We performed coalescent simulations for each local population, where we assumed that there were two subpopulations and we allowed migration between them. The migration rate was set so it was consistent with the observed polymorphism data characterized by π and θW (see below for details). This treatment should make the test for balancing selection conservative.

The simulation required two parameters: the population mutation and migration rates (θ=4 and M=4Nm). We used a coalescent-based rejection-sampling algorithm to estimate these two parameters from the polymorphism data in the reference regions (summarized by π and θW). This approach was applied to each population where several reference regions were available (six for the PL and LY populations, and seven for the others; Supplementary Tables S2 and S5). The length of the sequence was set to the average of the six/seven reference regions examined, that is, 489 bp for PL and LY and 482 bp for other populations (Supplementary Table S2). The number of sequences was set to twice the total number of individuals examined, that is, 16 sequences because the reference regions were all autosomal (Supplementary Table S2). Some populations contained the reference region(s) in 14 samples (PL, VB, CO and VR), 18 samples (LA, UA, LG, UG and LR) or both (LY), but we used 16 as the overall average. These data were used to compute π and θW (Supplementary Table S2).

To estimate the values of θ and M for the observed values of π and θW (Supplementary Table S6), we used a simulation-based rejection-sampling algorithm, where random polymorphism patterns were generated using the ‘ms’ program. As mentioned above, we simulated two equal-sized subpopulations and estimated θ and M. In each replicate, θ and M were assigned randomly from the intervals, 0 to 2.00 × 10−3 and 0–10, respectively. Using these parameters, coalescent simulations were performed for six (or seven) independent regions to calculate π and θW. πave and θWave were the average values of π and θW from the simulated data. Replicates were accepted if the values of πave and θWave were within ±5% of the mean of the observed π and θW values (Supplementary Table S6). The distribution of a significant number of accepted pairs of θ and M was considered to be approximately proportional to the likelihood. This process was repeated until 10 000 accepted values of θ′ and M' were accumulated.

Next, we tested whether the observed π and θW for the opsin genes could be explained using a neutral model by assuming these estimated parameters (Supplementary Table S6). The simulation was performed based on the condition of the density distributions of θ and M that were estimated from the six/seven reference regions, and we obtained the null distributions of the summary statistics for the opsin genes. The length of the sequence was set to the total length of the opsin genes examined (1071 bp for LWS-1 and LWS-3, 1074 bp for LWS-2 and LWS-4, 1441 bp for SWS1 and 1821 bp for SWS2; Supplementary Table S3). The number of sequences was set to the total number of chromosomes examined (Supplementary Table S3). For each of the 10 000 θ and M values, we determined the polymorphism distribution and used it to calculate Tajima’s D (Supplementary Figure S2). We then used the distribution of these Tajima’s D values as a null distribution to test the neutrality of the opsin genes. We obtained a P-value for the proportion of simulation runs where the value of Tajima’s D was greater than the observed value (Supplementary Figure S2).

Differences in Tajima’s D among populations and genes were tested using a General Linear Model, where Tajima’s D for each gene and population was the response variable (60 values), and population and gene were random factors.

Coalescent simulations for FST analysis of divergent selection

The idea of detecting local adaptation using FST was first proposed by Lewontin and Krakauer (1973). They proposed that the null distributions of FST in the reference loci can be approximated as χ2 distributions. Therefore, if the mean FST value is estimated from multiple loci, it is possible to test whether a specific locus is an outlier of the null distribution because of local adaptation (the so-called ‘Lewontin–Krakauer test’) (Beaumont, 2005). However, given the increases in the amounts of data and computational power, recent studies tend to use empirical distributions of putatively neutral loci (Akey et al., 2002) or coalescent simulations for null distributions (Beaumont and Nichols, 1996).

In the present study, we performed coalescent simulations to evaluate the local adaptation of opsin genes. Again, we used a simulation-based approach, where we estimated the demographic parameters from the non-opsin regions initially and the null distribution for the opsin genes was obtained based on the condition of the estimated parameters (Supplementary Tables S4 and S5). We calculated the mean FST values of six reference loci with a randomly varying population migration rate (4Nm) from an arbitrary range of 0–3, using the ‘ms’ program (Hudson, 2002). We assumed an island model with 10 subpopulations with equal migration and 16 samples per subpopulation. We set θ as 1 because this does not affect the FST values. We accepted simulation runs if the mean FST value was within ±1% of the observed mean FST value. We repeated the simulation until we obtained 10 000 samples. The posterior distribution of 4Nm that we obtained is shown in Supplementary Figure S3A. Next, we used coalescent simulations based on the posterior distribution of 4Nm to confirm that the FST values of the reference loci were within the null distribution (Table 2 and Supplementary Figure S3B). The sample sizes were different; therefore, we calculated the null distribution of each opsin gene and obtained P-values that was similar to the test for balancing selection (Supplementary Figures S3C–G). LWS-2 was excluded from this analysis because these sequences could not be determined in the CO, VR and LR populations.

Linkage disequilibrium (LD) among SWS2-B, LWS-1, LWS-2 and LWS-3 in six populations

We confirmed the existence of a LD between the SWS2-B, LWS-1, LWS-2 and LWS-3 opsin genes that are located in close proximity on the same chromosome. The LD patterns (LOD and D′) were calculated using Haploview 4.2 (Barrett et al., 2005; Barrett, 2009).

Relationships between opsin divergence and environmental differences between the sites

We used Mantel tests to examine the relationships between FST and the environmental differences between the sites. We used four aquatic environment variables: the percentage of canopy over the water surface, shade, DO and conductivity. We examined nine sampling sites in Trinidad because no data were available for Tobago (Supplementary Table S7). Next, distance matrices were constructed between the sites by calculating the pairwise differences in the values of the environmental variables between sites. Mantel tests were conducted between the pairwise FST values and the differences in each environmental variable between the sites. The geographic distance between the sites was also examined to account for the effects of gene flow. The P-values were adjusted for the inflated type I error rate because of multiple comparisons using the false discovery rate procedure. In addition, we incorporated geographic distances in the partial Mantel tests for environmental variables that were significantly related to FST, thereby controlling for the effect of genetic dissimilarity caused by the genetic isolation of populations from distinct drainages or regions (the Caroni and Northern Drainages, and Southern Trinidad).

Results

Allelic differentiation of M/LWS opsins for spectral sensitivity

The amino acid compositions of the spectrally effective sites, that is, 180, 197, 277, 285 and 308 of the four M/LWS-type opsin genes, were as follows: Ala, His, Tyr, Thr and Ala (designated Ala/His/Tyr/Thr/Ser), respectively, in LWS-1; Pro/His/Phe/Ala/Ser in LWS-2; and Ser/His/Tyr/Thr/Ser in LWS-3 and LWS-4, as reported previously (Watson et al., 2011). The composition was invariant within and between populations except residue 180 of LWS-1 where an Ala/Ser polymorphism was found. Ala was the major allele in southern Trinidad and Tobago, whereas Ser was the major allele in the Northern and Caroni drainage areas of Trinidad (Figure 2 and Supplementary Table S8).

Overall nucleotide variation in opsin genes

The gene-level diversity measures for the opsin genes and non-opsin reference genes are shown in Supplementary Tables S4 and S5, respectively. The genotype frequencies of opsin genes in some populations deviated significantly from the Hardy–Weinberg equilibrium (mean (±s.e.) opsin FIS=0.189 (±0.035), Supplementary Table S4), whereas the reference genes were at Hardy–Weinberg equilibrium (mean (±s.e.) FIS=0.050 (±0.053), Supplementary Table S5). The heterozygote deficiency was significantly higher for the opsin genes compared with the non-opsin reference genes (Mann–Whitney U-test, W=1851.5, P=0.020; Supplementary Figure S4).

There were 4 non-Syn and 21 Syn polymorphic sites in LWS-1, 4 non-Syn and 4 Syn polymorphic sites in LWS-2, 3 non-Syn and 14 Syn polymorphic sites in LWS-3 and 3 non-Syn and 9 Syn polymorphic sites in LWS-4. The short wavelength opsins were also polymorphic, where the SWS1 coding region had 1 non-Syn and 4 Syn polymorphic sites, whereas the SWS2-B coding region had 5 non-Syn and 29 Syn polymorphic sites. There was no significant difference in the ratio of non-Syn to Syn substitutions among the six opsin genes (Fisher’s exact test, P=0.243).

Figure 2 shows the allele frequencies of the six opsin genes in the 10 populations in terms of the amino acid haplotype. The allele frequencies differed among populations and the differences were notably larger between the Northern/Caroni drainages of Trinidad (LY, UA, LA, UG and LG) and the other populations (Figure 2). The opsin genes in the downstream populations (LA, LG and LY) were more polymorphic than those in the upstream population (UA and UG; Figure 2 and Supplementary Tables S4 and S5) that is consistent with the higher level of genetic drift and genetic isolation found in upland regions (Barson et al., 2009; Willing et al., 2010). The SWS2-B opsin gene was highly polymorphic and the allele compositions differed among the streams.

Test for balancing selection in opsin genes

The coalescent simulation showed that the observed Tajima’s D values of SWS2-B in CO and LWS-3 in LY were significantly larger than those of the reference regions (P=0.001 in CO and 0.012 in LY; Table 1 and Supplementary Figure S5). Those of the reference genes were within the distribution (Supplementary Figure S5). However, after correcting for multiple tests using Bonferroni’s method, the deviations in these opsin genes were not statistically significant (P=0.072 in CO and 0.649 in LY). Tajima’s D did not differ significantly between the opsin genes (Friedman rank sum test, Friedman χ2=3.03, d.f.=5, P=0.695), but there were significant differences in the value of Tajima’s D between populations for the opsin genes (Friedman χ2=19.60, d.f.=9, P=0.021; Supplementary Figure S6).

Table 1 Sample sizes (n), Tajima’s D and P-values for the six opsin genes

Testing for divergent selection in opsin genes

The mean FST for all populations and the pairwise FST values of the opsin genes and the non-opsin reference genes are shown in Table 2 and Supplementary Table S9, respectively. We found that the guppy populations appeared to be more diverged with respect to the opsin genes (SWS2-B, LWS-1, LWS-3 and LWS-4) than the non-opsin reference genes (Supplementary Figure S3) (SWS2-B: mean FST=0.596, P=0.037; LWS-1: mean FST=0.810, P<0.001; LWS-3: mean FST=0.735, P=0.002; LWS-4: mean FST=0.637, P=0.018; SWS1: mean FST=0.549, P=0.085). The P-values of LWS-1 and LWS-3 remained significant after correction for multiple testing (LWS-1: P<0.001, LWS-3: P<0.012). More importantly, we found that the observed mean value of the opsin genes (mean FST=0.693) deviated significantly from that of the reference loci (mean FST=0.419, P<0.001, Supplementary Figure S3H). Note that to be conservative, we assumed three loci in the coalescent simulation instead of five loci because of the strong LD between three opsin genes (Supplementary Figure S7). The results suggest that the opsin genes have been subject to divergent selection.

Table 2 Average FST values for the opsin and non-opsin reference genes

Relationships between the environmental variables and opsin genes

The pairwise FST values and the differences in the DO values between the sampling sites were significantly correlated for LWS-1 (Mantel test: P=0.019) and LWS-3 (Mantel test: P<0.001; Figure 3 and Supplementary Table S10). The pairwise FST values of LWS-1 and LWS-3 were marginally significant for the geographic distances (Mantel tests: LWS-1, P=0.054 and LWS-3, P=0.049). The partial Mantel test showed that the positive relationships between FST and DO remained significant after controlling for the geographic distances (LWS-1, r=0.330, P=0.024 and LWS-3, r=0.616, P<0.001).

Figure 3
figure 3

Relationships between the FST value and DO in LWS-1 and LWS-3. The FST values are the pairwise values between the nine Trinidad populations. The DO values are the differences in the DO concentrations between the nine Trinidad populations. A Mantel test showed that both relationships were significant (LWS-1, P=0.019, LWS-3, P<0.001, Supplementary Table S10).

Discussion

In the present study, we examined the nucleotide variation in the four M/LWS opsin genes as well as the blue-sensitive SWS2-B and UV-sensitive SWS1 opsin genes in 10 guppy populations in various light environments in Trinidad and Tobago. For the first time, we discovered a potential spectral variation (180 Ser/Ala) in LWS-1 that differed at an amino acid site that is known to affect the absorption spectra of opsins. A coalescent simulation showed that the interpopulation genetic differentiation of two opsin genes (LWS-1 and LWS-3) was significantly higher than the neutral expectation. This genetic differentiation was related significantly to differences in the DO level. These results suggest that the diversity of the opsin genes among populations is driven significantly by natural selection and that guppies can adapt to various light environments by changes in their color vision.

Allelic differentiation of M/LWS opsins with respect to spectral sensitivity

In the present study, we detected an allelic polymorphism in the five-site composition at LWS-1 (180/197/277/285/308=Ala/His/Tyr/Thr/Ala, A-type, or Ser/His/Tyr/Thr/Ala, S-type) that differed at amino acid 180. Single mutations, that is, S180A, H197Y, Y277F, T285A and A308S, and double mutations, S180A/H197Y, in the vertebrate M/LWS opsins shift λmax by −7, −28, −8, −15, −27 and 11 nm, respectively, that is known as the ‘five-sites’ rule (Yokoyama and Radlwimmer, 2001). In addition, the S180P mutation is reported to shift the λmax of the lamprey LWS photopigment by almost 19 nm toward a shorter wavelength (Davies et al., 2009). According to the five-site compositions of other species (Yokoyama and Radlwimmer, 2001; Davies et al., 2009), the λmax values of the S-type and the A-type LWS-1 are expected to be 560 and 553 nm, respectively. The λmax of LWS-2 (Pro/His/Phe/Ala/Ser) is expected to be 518 nm. The λmax of both LWS-3 and LWS-4 (Ser/His/Tyr/Thr/Ala, which is the same as the S-type LWS-1) is expected to be 560 nm.

We can speculate about an ancestral allele by comparing closely related species. In a previous study of the Cumaná guppy in Venezuela (Watson et al., 2011), LWS-1 was reported to be the A-type. In our study, the A-type was the major allele in southern Trinidad and in Tobago, whereas the S-type was the major allele in northern Trinidad (that is, the Northern Drainage and the Caroni Drainage). In the green swordtail (Xiphophorus helleri), a closely related species to the guppy, LWS-1 was reported to be the S-type (Watson et al., 2010). In a study of the four-eyed fish (Anableps anableps) (Owens et al., 2009), a close outgroup of the guppy and the swordtail, multiple M/LWS-type opsin genes were also reported and residue 180 was Ser in all these genes. Taken together, these results may suggest that the A-type LWS-1 allele was derived from the common ancestor of the Venezuelan and the Trinidad and Tobago guppies after their divergence from the swordtail. Thus, the S-type LWS-1 allele found in the northern part of Trinidad could be an ancestral remnant. However, it is also possible that it was derived secondarily from LWS-3 by gene conversion. Gene conversion has also been implicated in the generation of M/LWS gene variation in the guppy, the swordtail and the four-eyed fish (Owens et al., 2009; Watson et al., 2010, 2011).

The present study included samples from two populations (UA and LA) in the Aripo river that Archer and Lythgoe (1990) had also sampled and they detected sensory polymorphisms. In the Lower Aripo (LA) population, all four of the LWS opsin genes had amino acid polymorphisms (Figure 2). Archer and Lythgoe (1990) detected three major types of cone cells with different λmax values (533, 548 and 572 nm) using MSP. The predicted λmax values of LWS-2 (518 nm; Pro/His/Phe/Ala/Ser), A-type LWS-1 (553 nm; Ala/His/Tyr/Thr/Ala) and LWS-3/LWS-4/S-type LWS-1 (560 nm; Ser/His/Tyr/Thr/Ala) may correspond to the three major types detected by MSP. However, Archer and Lythgoe (1990) also reported that different individuals had various combinations of these cell types where individuals had one, two or three types. In addition, a group of cells had a broad range of λmax values at 548 nm. These findings cannot be explained by allelic combinations of the three predicted λmax values of the M/LWS opsins determined in the present study. Thus, there may be further variation in λmax among the four M/LWS loci, and there is a further possibility of allelic differentiation in the four loci related to λmax because of amino acid variations other than the five sites as well as different expression patterns. The sensory variation could also be related to interindividual variation in the relative expression levels among opsin loci, as found in the Venezuelan guppies (Rennison et al., 2011). Direct measurements of λmax for the reconstituted photopigments and quantification of the expression of the alleles are necessary to resolve these questions.

We also found genetic polymorphisms at non-Syn sites in SWS2-B and SWS1. This suggests that there could be phenotypic variation in short wavelength color vision, although phenotypic variation has not been reported previously for short wave-sensitive cells using MSP. In total, we detected 6 non-Syn variable sites out of a total of 39 polymorphic sites in the two short wave opsin-coding regions, compared with 14 out of 67 sites in the combined M/LWS-coding regions. The ratio of Syn to non-Syn substitutions did not differ significantly between the long and short wavelength opsin genes in the samples analyzed. Many organisms use ultraviolet radiation for individual discrimination, communication and direction recognition (Losey et al., 1999; Honkavaara et al., 2002; Siitari et al., 2002). Female guppies use ultraviolet radiation from the male body during mate choice (Kodric-Brown and Johnson, 2002; Smith et al., 2002). At present, we are confirming the differences in λmax based on sequence variation in SWS.

Balancing selection for opsin genes

The coalescent simulations showed that the observed Tajima’s D values of LWS-3 at LY and SWS2-B at CO were apparent outliers in the distribution expected from the reference genes, but they were not significant after correcting for multiple comparisons. The results showed that there was no evidence for balancing selection at any loci. Thus, our results suggest that there is no evidence for overdominance or negative frequency-dependent selection within subpopulations with respect to the opsin genes. The observed heterozygote deficiency observed at opsin genes in some lowland populations suggests the opposite (see below).

Divergent selection for opsin genes

The overall genetic divergence among the populations (FST) for the opsin genes was significantly greater than that for the non-opsin reference genes (Table 2 and Supplementary Figure S3). In particular, the coalescent simulation indicated that the FST values for the LWS-1 and LWS-3 loci were significantly higher than those for the non-opsin reference genes. One of these LWS loci may be significantly divergent among populations because there was a very high level of LD among the loci (Supplementary Figure S7).

This observation is consistent with divergent selection operating on opsin genes, particularly LWS loci, resulting in local adaptation. In addition, we observed significant heterozygote deficiency for opsin genes that was particularly prominent in the downstream populations (the LA and CO populations were significant with an α of 5%; Supplementary Table S1). In contrast, the reference genes were at Hardy–Weinberg equilibrium (Supplementary Table S4). The most consistent explanation for this observation is that divergent selection acts on the opsin genes that could have changed the allele frequencies in the tributaries that flow into the downstream populations. These migrants may be homozygous for alleles that are rare in the recipient downstream population, resulting in the observed heterozygote deficiency. Consistent with this explanation is the study by Barson et al. (2009), who showed that the lowland Caroni populations also had a heterozygote deficit at highly polymorphic microsatellite markers that they explained by the inflow of migrants from upstream populations when the rivers were in spate during the wet season rains (van Oosterhout et al., 2007; McMullan and van Oosterhout, 2012).

The allele frequencies of SWS2-B, LWS-1, LWS-2 and LWS-3 differed among populations, and the differences between northern Trinidad (Caroni and Northern drainages) and the other populations were notably larger (Figure 2). In the northern populations, there were large variations in the allele frequencies, particularly for SWS2-B. Three to five non-Syn sites caused the differences among the alleles. The relatively minor alleles in northern Trinidad were dominant in southern Trinidad for SWS2-B, LWS-1 and LWS-3. Given the reduced amount of variation in the reference genes, this suggests that divergent selection caused genetic differentiation among the populations and that different alleles of the opsin genes may adapt to different environmental conditions.

Our results showed that the divergence of opsin genes was related significantly to differences in one environmental variable, that is, DO, particularly for two opsin genes (LWS-1 and LWS-3) with loci that were subject to statistically significant divergent selection. In addition, the differences in the DO levels between populations were still significantly related to the FST values of the opsin loci even when the effects of the geographic distances between populations were removed. Therefore, differences in DO may be related to the factors that cause divergent selection, independently of the effects of gene flow and divergent environmental factors related to geographic differences.

DO level may be related to eutrophication (the amount and types of phytoplankton) that affects both the water color and light penetration. In addition, changes in the DO level may cause changes in the suitable depth of aquatic environments (Vonlanthen et al., 2012), thereby leading to water color variation in guppy habitats. We found that the frequency of S-type LWS-1 (560 nm) increased significantly with the DO level (General Linear Model, binomial, logit link, P=1.49 × 10−19). In LWS-3, a major allele (FVV, Figure 2) increased significantly with the DO level (General Linear Model, binomial, logit link, P=7.90 × 10−9). These results suggest that long wave-sensitive genotypes may be favored by a higher DO level. There were variations in the LWS loci between the northern and southern populations in Trinidad. Unlike the guppies from the Northern and Caroni Drainages, those from Pitch Lake in southern Trinidad were exposed to direct sunlight, high water temperatures and the sediment produced a black background color (that is, pitch) (Kenny, 1995). However, the differences in the canopy cover and shade level between the sites were not highly related to opsin divergence; therefore, light intensity may not be a selective factor. Further studies are required to identify the selective agents that act on different opsin genes and genotypes.

In conclusion, we found considerable amounts of allelic variation of some opsin genes relative to reference genes in wild populations of guppies, including one amino acid at a known spectral tuning site. The allelic variations at the two loci, LWS-1 and LWS-3, were subject to divergent selection among the populations, and the DO level of the aquatic environment associated may be related to the divergent selective factors. Thus, it is necessary to explore spectral variation among the opsin alleles to increase our understanding of the functionality of allelic variants.

Data archiving

The sequence data have been submitted to GenBank under accession numbers AB748984AB748990.