Host plant specialization is a key mechanism for the diversification of phytophagous insects at both macroevolutionary and microevolutionary time scales (Ehrlich and Raven, 1964; Diehl and Bush, 1984; Mitter et al., 1988; Farrell, 1998; Drès and Mallet, 2002; Nason et al., 2002; Miller et al., 2003; Ferrari et al., 2006). Under the Dobzhansky-Muller model of speciation, host shifts can result in genetic differentiation as a consequence of assortative mating and reduced gene flow, ultimately leading to isolation and speciation due to the accumulation of mutations that cause reproductive incompatibility among populations (Dobzhansky, 1951; Coyne and Orr, 2004). However, evidence for host-related speciation can be overlooked when recent divergence leads to the retention of shared ancestral polymorphism (Maddison, 1997; Rosenberg, 2002; Degnan and Rosenberg, 2006) or confounded when genetic differentiation occurs as a consequence of biogeography rather than host specificity (Becerra and Venable, 1999). Because these circumstances are not mutually exclusive, teasing apart the population-level relationships between host use and speciation remains a principal challenge for understanding the evolution of insect–plant associations.

Yuccas (Yucca: Agavaceae) and yucca moths (Prodoxus, Parategeticula, Tegeticula: Prodoxidae) have provided extraordinary opportunities for the study of divergence and speciation, illustrating the importance of host plant specificity for diversification (reviewed in Pellmyr, 2003; see Althoff et al., 2006; Leebens-Mack and Pellmyr, 2004; Segraves and Pellmyr, 2004; Segraves et al., 2005; Svensson et al., 2005; Althoff, 2008; Godsoe et al., 2008; Smith et al., 2008a, 2008b for more recent work). Yucca moths in the genus Prodoxus are a derived lineage within the Prodoxidae, comprising the sister lineage of the pollinating yucca moths Parategeticula and Tegeticula (Pellmyr and Leebens-Mack, 1999; Pellmyr, 2003; Pellmyr et al., 2006). The pollinating yucca moths have coevolved as obligate mutualists with various host species of Yucca, providing exclusive pollination service and ovipositing so that emerging larvae feed on seeds. In contrast, members of Prodoxus do not pollinate their host plants, and instead oviposit so that larvae typically feed on tissues other than seeds (Pellmyr, 2003; Pellmyr et al., 2006).

Non-pollinating Prodoxus always coexist with pollinating yucca moths, as the latter are required for successful reproduction of yuccas; moreover, up to three different species of Prodoxus may coexist on the same plant, respectively specializing on fruit, flowering stalks or leaves (Pellmyr et al., 2006). Within Prodoxus, stalk feeding is the most common mode of host plant use, from which a more limited number of fruit or leaf-feeding taxa have evolved (Pellmyr et al., 2006). Although obligate mutualisms are expected to promote coevolutionary divergence in both yuccas and yucca moths (Pellmyr and Krenn, 2002; Pellmyr and Segraves, 2003; Leebens-Mack and Pellmyr, 1998, 2004; Althoff et al., 2006; Smith et al. 2008a; but see Smith et al. 2008b), the extent to which commensalist host specialization might lead to divergence among non-pollinating yucca moths is less clear. In the best-studied example to date, Prodoxus quinquepunctellus and Prodoxus decipiens have been shown to exhibit rapid phenological and morphological divergence among populations on different host species, particularly the evolution of ovipositor shape, a feature that is critical for successful oviposition and reproduction of yucca moths (Groman and Pellmyr, 2000; Althoff et al., 2001; Althoff and Pellmyr, 2002; Svensson et al., 2005). These results suggest that other members of Prodoxus are capable of highly specialized reproductive behavior, which could promote both host specificity and speciation.

Prodoxus coloradensis (Prodoxidae: Lepidoptera) is a stalk-feeding yucca moth endemic to southwestern North America, extending from coastal California to the Rio Grande in Texas (Figure 1; Pellmyr et al., 2006). Earlier taxonomic treatments of P. coloradensis indicated that this taxon was a geographically widespread generalist affiliated with multiple species of Yucca (Riley, 1880; Davis, 1967). However, a more recent revision of the Prodoxidae based on morphological and molecular data has narrowed the current status of P. coloradensis to a complex associated with three host species: Yucca baccata, Yucca schidigera and Yucca treculeana (Pellmyr et al., 2006). These host plants are parapatric, with limited areas of sympatry at their range margins (Figure 1; Matuda and Lujan, 1980; Hess and Robbins, 2002); within these zones, altitudinal and/or ecological segregation may further limit contact between moths on different host plants. Although P. coloradensis is known to be variable in wing coloration, which provides a useful diagnostic character for discriminating between other closely related species of Prodoxus (Pellmyr et al., 2006), there has been no clear morphological evidence in support of separate taxonomic status for host-associated populations within the current delimitation of P. coloradensis.

Figure 1
figure 1

Geographical distribution of P. coloradensis specimens sampled for this study, with approximate ranges for associated host species of Yucca in southwestern North America. Numbers in ovals refer to accession numbers for each specimen (cf. Figures 2 and 3).

For this study, we tested the hypothesis that P. coloradensis represents a complex of three different host-associated species. On the basis of mitochondrial DNA (mtDNA) and nuclear DNA (nucDNA) sequence data, we evaluated the genealogical relationships of stalk-feeding moths sampled from Y. baccata, Y. schidigera and Y. treculeana. As shared ancestral polymorphism and recent divergence can lead to the rejection of strict phylogenetic criteria for speciation (for example, reciprocal monophyly, Knowles and Carstens, 2007), we adopted a population genetic approach to test for significant differentiation among host groups and estimate effective migration rates using multilocus coalescent simulations. Because geographical ranges of the host species for P. coloradensis are not widely overlapping, we also examined whether evidence for genetic structure could be attributed to host use after accounting for spatial isolation. Finally, we evaluated differential rates of male versus female dispersal on the basis of relative rates of gene flow for maternally inherited mtDNA compared with biparentally inherited nucDNA.


Field collections

Adult (n=26) and larval specimens (n=14) were sampled from 15 locations across the range of P. coloradensis in southwestern North America during 1992–2007, then flash-frozen in liquid nitrogen and placed in permanent storage at −80 °C. Sample sites and the approximate ranges of each host Yucca species are shown in Figure 1. Specific collection localities and GPS coordinates are listed in Supplementary Table S1). Because neither adult moths nor larvae are readily identified as distinct morphological species, sampling was designed to estimate migration between populations of moths on different host plants. Consequently, throughout the remainder of the paper, for brevity we often refer to moths collected from Y. baccata, Y. schidigera and Y. treculeana by the name of the host species.

DNA sequencing

Adult moths were dissected to remove the thorax for DNA extraction, retaining the wings, head and abdomen as voucher specimens. Larvae were used to obtain the remaining DNA samples. Tissues dissected from each adult moth (or whole larvae) were macerated in liquid nitrogen, and genomic DNA was isolated using DNeasy extraction kits (Qiagen, Valencia, CA, USA) for blood and animal tissue. Standard PCR protocols for yucca moths were used to amplify 1482 bp of subunit I from the mitochondrial cytochrome oxidase I (COI) gene using three overlapping PCR amplifications (Smith et al., 2008a). A modified touchdown PCR protocol was used to amplify 502 bp of the nuclear elongation factor 1 alpha (EF1α) gene (Smith et al., 2008a). PCR products were purified using QIAquick PCR purification kits (Qiagen), followed by cycle sequencing according to standard protocols for BigDye 3.1 (Applied Biosystems, Foster City, CA, USA). Dye terminator fragments were cleaned using Sephadex and electrophoresed on an ABI 3730 capillary instrument (Applied Biosystems). Trace files were aligned and edited using CodonCode Aligner 2.02 (CodonCode, Dedham, MA, USA). We were unable to obtain complete reads of EF1α for one individual, which was excluded from subsequent analyses. Variable sites in the remaining sequences were identified using the automated mutation detection function in CodonCode Aligner, followed by manual inspection of electropherograms and additional resequencing of ambiguous bases. Successful PCR amplification and sequencing of target genes (rather than paralogs or PCR artifacts) were confirmed by aligning the resulting sequences to other closely related species in the Prodoxidae, verifying the correct reading frame without stop codons.

As the resolution of nucDNA sequences into allelic haplotypes provides additional information for coalescent analyses of population structure (see below), the gametic phase of heterozygous sites (that is, double peaks) in EF1α sequences were defined in PHASE v. 2.1.2 (Stephens et al., 2001; Li and Stephens, 2003; Stephens and Donnelly, 2003) using 10 000 Markov chain Monte Carlo (MCMC) iterations with 1000 burn-in iterations and a sampling interval of 10 iterations. Convergence was assessed by comparing the posterior probability (PP) of all haplotype resolutions between three replicate PHASE runs, increasing the posterior sampling until PPs differed by no more than 0.01 between runs. Heterozygous sites were coded as distinct alleles when resolved with >0.90 PP, otherwise these sites were treated as the missing data at that position. We also estimated the background recombination rate (ρ=4Nec) and deviations from ρ between each variable site.

After assembling aligned DNA sequence data matrices, we used DNaSP 5.0 (Librado and Rozas, 2009) to calculate the following summary statistics for each gene in the total sample of individuals: (i) number of haplotypes (h); haplotypic diversity (Hd; Nei, 1987); average number of nucleotide differences (K; Tajima, 1983); (iv) nucleotide diversity (π; Nei, 1987). The distribution of COI haplotypes and EF1α alleles among hosts was also summarized using statistical parsimony networks inferred in TCS 1.2.1 (Clement et al., 2000). GenBank accessions for DNA sequences are listed in Supplementary Table S1).

Gene tree estimation

Phylogenies for COI and EF1α were estimated using MCMC in MrBayes 3.1.2 (Ronquist and Huelsenbeck, 2003), including seven additional outgroup taxa in the Prodoxidae (see Supplementary Table S1 for GenBank accessions of outgroup species). Nucleotide substitution models (COI=GTR+I+Γ; EF1α=GTR+Γ) were selected based on congruent results from hierarchical likelihood-ratio tests (Posada and Crandall, 1998) and AIC scores (Posada and Buckley, 2004) among competing models in MrModeltest 2.3 (Nylander, 2008). Because the default branch length priors in MrBayes can lead to unreasonable estimates of substitution rates (for example, >1.0 substitution per site from root to tip) and upward bias of PPs (Yang and Rannala, 2005; Marshall et al., 2006), empirical priors for this parameter were defined based on the mean branch lengths recovered from maximum likelihood topologies inferred using GARLI 0.951 (Zwickl, 2006). MCMC analyses were executed for 200 000 burn-in iterations followed by 1 800 000 iterations of posterior sampling using two independent runs with four chains each and a sampling interval of 100 iterations. After posterior sampling was completed, we verified that the average s.d. of split frequencies for log-likelihood scores across paired runs was <0.01, with potential scale reduction factors for all parameters 1.000. Log-likelihood scores and parameter estimates were examined in TRACER v 1.4.1 (Rambaut and Drummond, 2008) to confirm that independent runs had reached stationarity and convergence, with effective sample sizes >200 for all parameters. MCMC posterior distributions of tree topologies and branch lengths were summarized in TreeAnnotator 1.4.8 (Drummond and Rambaut, 2007) as the maximum clade credibility tree, which represents the single tree recovered from the posterior distribution that maximizes the product of PPs across clades.

To obtain empirical estimates of COI and EF1α substitution rates on an absolute time scale for mutation rate priors in the isolation-with-migration coalescent analyses (see below), we analyzed both genes under a log-normal relaxed clock model in BEAST 1.4.8 (Drummond et al., 2002, 2006; Drummond and Rambaut, 2007) using a representative set of sequences from the Prodoxid moths, comprising a single taxon from each of the species used for phylogenetic inference of the COI and EF1α gene trees (Supplementary Table S1). We constrained the ages of two key nodes using a normal distribution with 95% confidence intervals (CI) approximating the means and s.e. of estimated divergence times from Pellmyr and Leebens-Mack (1999): most recent common ancestor (MRCA) of Lampronia and yucca moths=78.2 million years (Myr), 95% CI=59.8–96.6 Myr; MRCA yucca moths=41.5 Myr, 95% CI=31.7–51.3 Myr. MCMC analyses in BEAST were executed under a simple Yule birth model and allowed to run for 100 000 burn-in iterations followed by 10 000 000 iterations of posterior sampling, using two independent runs with a sampling interval of 1000 iterations. Stationarity and convergence of independent runs were confirmed by examining PPs, parameter estimates and effective sample sizes of MCMC output in TRACER (see above). Posterior distributions of time-scaled substitution rates for each locus were estimated as the mean and 95% highest posterior density (HPD) of this parameter.

Population structure

We examined the degree of population structure among moths on different host species using analysis of molecular variance (AMOVA) (Excoffier et al., 1992) in Arlequin 3.11 (Excoffier et al., 2005). For each gene, we estimated the proportion of genetic variation partitioned between host plants, calculated fixation indices (ΦST), and tested for significant differentiation using 10 000 nonparametric permutations. These analyses did not include substructure within host groups due to limited sampling at any given collection locality (see below). AMOVAs were performed under the Tamura–Nei substitution model (that is, the closest approximation to GTR available in Arlequin), with empirical estimates from MrBayes for the Γ parameter of each gene (COI: α=0.191; EF1α: α=0.192). Because differentiation between a subset of populations can indicate significant departures from panmixia, even when other populations are not genetically distinct, we also tested comparisons between each pair of host plants to determine whether global patterns of differentiation were supported by pairwise comparisons between host groups.

As AMOVA does not account for the spatial effects of gene flow across geographical landscapes (for example, isolation by distance; Rousset, 1997), we used a series of Mantel and partial Mantel tests (Mantel, 1967; Smouse et al., 1986) to examine correlations between pairwise genetic distance, geographical distance and host use. Preliminary analyses based on pairwise genetic distances between localities were difficult to interpret, particularly for sites with 2 individuals, which commonly yielded pairwise estimates of FST=1.0 (that is, no shared haplotypes or alleles) or FST=0.0 (that is, identical haplotypes or alleles). Because these results are more likely to represent an artifact of limited sampling within each collection locality (Figure 1) rather than true patterns of biological differentiation, we instead calculated pairwise genetic distances between individual sequences for each gene. Genetic distances between COI haplotypes and EF1α alleles were estimated using maximum composite likelihood (Tamura et al., 2004) in MEGA 4.0.2 (Kumar et al., 2008) under the same model of nucleotide substitution used for AMOVA (that is, Tamura–Nei with α values as described above). Geographical distances were determined from GPS coordinates using Geographic Distance Matrix Generator 1.2.3 (Ersts, 2009). Host use was coded as a binary indicator variable, such that any two individuals either share the same host species (0) or use different host species (1). Mantel and partial Mantel tests were performed in zt 1.0 (Bonnet and Van de Peer, 2002) using 10 000 nonparametric permutations for significance testing.

Coalescent simulations

Estimates of demographic values for divergence times (Tdiv), effective migration rates (Me) and effective population sizes (Ne) among populations of moths on different host plants were obtained using multilocus coalescent simulations in IM and IMa (Hey and Nielsen, 2004, 2007). This isolation-with-migration approach permits the estimation of demographic parameters while evaluating the likelihood of more or less complex models of historical divergence (reviewed in Hey, 2006). First, we used IMa to simulate genealogies under the full model (θ1, θ2, θA, m1, m2, t) and perform likelihood-ratio tests to evaluate the fit of nested models with fewer parameters (for example, (i) θ1=θ2=θA, m1, m2; (ii) θ1, θ2, θA, m1=m2; (iii) θ1=θ2=θA, m1=m2; and so on). Although IMa generates point estimates from the joint posterior distributions of nested models, it does not calculate Bayesian posterior credibility intervals for these functions. Therefore, we reanalyzed the data in IM under the least complex nested model that could not be rejected, both to reduce the variance in parameter estimates and to obtain 90% HPD intervals as a measure of uncertainty.

The isolation-with-migration model assumes that sampled taxa represent sister lineages, which do not experience gene flow from other unsampled lineages. As there is often no clear a priori approach for ordering pairwise IM and IMa analyses among three or more populations, for completeness we examined all first-order pairwise comparisons among host plant groups (Y. baccata versus Y. schidigera, Y. baccata versus Y. treculeana, Y. schidigera versus Y. treculeana). Initial ranges on prior probabilities for coalescent parameters scaled by the population mutation rate were established using estimates of θ=4 Neμ for the total sample of P. coloradensis and the subsamples of moths on different host plants. These estimates were obtained using a series of simple IMa models assuming no divergence (that is, t=0), describing panmictic populations at equilibrium (P. coloradensis, θ=10.2–24.0 90% HPD; Y. baccata, θ=6.7–16.6 90% HPD; Y. schidigera, θ=3.3–12.6 90% HPD; Y. treculeana, θ=2.9–10.6 90% HPD). We conservatively set the upper bound for a uniform prior on θ to exceed the highest value from initial estimates for panmictic populations (that is, θ=0–30). The upper bound on m was set using a uniform prior distribution that allowed extremely high levels of effective migration between populations (m=0–10, Me=θm/2=30 × 10/2=150). On the basis of the posterior distributions for the rate of substitutions per site per million year from BEAST (Supplementary Figure S1; COI: mean=1.894 × 10−3, 95% HPD=1.341 × 10−3–2.488 × 10−3; EF1α: mean=3.128 × 10−3, 95% HPD=1.616 × 10−3–4.955 × 10−3), we set uniform priors on the means and ranges for the substitutions per locus per year, discounting ambiguous or missing sites excluded by IMa (COI, 1454 bp: mean=2.753 × 10−6, lower=1.950 × 10−6, upper=3.617 × 10−6; EF1α, 485 bp: mean=1.517 × 10−6, lower=0.783 × 10−6, upper=2.403 × 10−6). Priors for t were determined using the earliest median date for the MRCA of P. coloradensis and P. sordidus for COI or EF1α (Supplementary Figure S2; 15.6 Myr for COI), then calculating the maximum coalescent scaled bound on t as the divergence time multiplied by the geometric mean (μg) of substitutions/locus/year (t=0–31.9, Tdiv=t/μg=31.9/2.044 × 10−6=15.6 Myr), permitting considerable freedom for divergence times within P. coloradensis.

On the basis of these initial prior distributions, we ran a series of preliminary IMa analyses to optimize MCMC settings for the number of chains and heating scheme. A small number of individuals with missing data at variable sites (that is, six individuals comprising 12 nucDNA alleles) were removed from the EF1α data matrix, as IMa excludes ambiguous sites across the complete set of sequences. Subsequent IMa runs were executed using a geometric heating scheme with 20–50 chains and heating parameters g1=0.95–0.99 and g2=0.80–0.50 under the HKY model of nucleotide substitution for both loci. Posterior sampling began after 500 000 burn-in iterations and was allowed to continue for 5 000 000–15 000 000 iterations. To evaluate stationarity and convergence, we monitored effective sample sizes and autocorrelation values, inspected trend plots for all parameters, and compared the results of at least three independent runs to confirm that marginal posterior distributions had reached similar solutions.

Following the completion of MCMC simulations in IMa, we used the load trees mode to subsample 20 000 genealogies across replicate runs and conduct likelihood-ratio tests to evaluate the relative fit of reduced models versus the full model, following a χ2 approximation in which the test statistic is twice the difference in log-probability scores (−2Λ) between nested models, with degrees of freedom equal to the difference in the number of parameters (Hey and Nielsen, 2007). When one or both migration parameters were fixed at zero, we calculated significance where the expected distribution is a mixture, such that −2Λ has a value of zero with probability 0.5 and takes a value from the χ2 distribution with probability 0.5 (Hey and Nielsen, 2007). In addition, when both migration parameters were set to zero, we evaluated this model against the corresponding model in which both migration parameters were equal, rather than the full model with asymmetric migration rates, as this represents a better approximation when m1 and m2 are correlated (Hey and Nielsen, 2007).

Following the evaluation of nested models, we then executed a series of IM runs under the least complex model that could not be rejected based on the likelihood-ratio tests described above, using the same prior probability distributions and MCMC settings. This approach has the advantage of estimating 90% HPD intervals for each parameter (which are not available for nested models in IMa), as well as reducing variance due to overparameterization of the full model. Run times varied from 5 000 000–25 000 000 iterations, and stationarity and convergence across independent IM runs were assessed as described above. For the final results, we used the output from the longest run for each pairwise comparison between populations. Coalescent parameters were converted to demographic values using the following formulae, incorporating the geometric mean of substitutions per locus per year (μg=2.044 × 10−6) and assuming a generation time of 1 year: (i) divergence time, Tdiv=t/μg; (ii) effective population size, Ne=θ/4μg; (iii) effective number of migrants (Me=θm/2, calculated from the marginal posterior distribution of m conditioned on the mode of θ). We also estimated the coalescent time to most recent common ancestry (TMRCA) of each locus, as well as locus-specific distributions for the mean number of migration events in simulated genealogies (Msim). To verify that marginal posterior distributions for these parameters were informed by the data, rather than representing an artifact of the prior distributions, each pairwise analysis was repeated with the −j0 run option in IM, which sets all likelihood functions to a value of one, so that the posterior distributions are equal to the prior distributions.


Summary statistics for molecular variation in COI and EF1α indicated substantially higher levels of genetic diversity in mtDNA versus nucDNA (COI: h=33, Hd=0.979, K=27.612, π=0.019; EF1α: h=15, Hd=0.642, K=4.167, π=0.010). The mode of the posterior distribution for the background recombination rate in EF1α was extremely low (ρ=2.0 × 10−8), indicating that the frequency of recombination (scaled to Ne) was negligible, with little deviation from ρ between variable sites (none >1.444 relative to ρ). The parsimony network for COI (Figure 2) showed that no mtDNA haplotypes were shared among host groups, with the highest frequency haplotype comprising 12.5% of the total number of inferred haplotypes (n=28). In contrast, the parsimony network for EF1α (Figure 2) indicated that a single nucDNA allele accounted for 61.5% of inferred alleles (n=12), and this allele was shared among all the three host groups. In addition, a single nucDNA allele was shared by Y. schidigera and Y. baccata, and a single nucDNA allele was shared by Y. baccata and Y. treculeana. Note that the number of inferred haplotypes and alleles varied slightly in the results from DNaSP and TCS due to the manner in which each program handles the missing and/or ambiguous sites.

Figure 2
figure 2

Statistical parsimony networks for cytochrome oxidase I (COI) and elongation factor 1 alpha (EF1α) from P. coloradensis on three species of Yucca. Circles are proportional to the frequency of unique haplotypes; branches connecting haplotypes indicate the number of inferred mutations. Labels indicate accession, sampling location, and host plant (cf. Figure 1). COI has 28 unique haplotypes, none of which are shared among host plants. In contrast, EF1α has 12 unique alleles, 3 of which are shared among host plants, including a single allele accounting for 61.5% of the total number of sampled alleles.

Posterior probabilities from MCMC consensus trees for COI and EF1α (Figure 3) showed support for the monophyly of P. coloradensis (COI: PP=1.0; EF1α: PP=0.91). Within P. coloradensis, the posterior distribution of the COI gene trees recovered a series of lineages corresponding approximately to the three host species of Yucca. However, these clades were not monophyletic with respect to host plants. For example, although all mtDNA haplotypes collected from Y. baccata and Y. treculeana were placed in a single clade with strong support (PP=1.0), Y. baccata formed a paraphyletic assemblage within this clade, including a single haplotype from two individuals sampled from Y. schidigera, as well as a single haplotype from Y. treculeana. Remaining haplotypes from Y. schidigera were placed in a clade sister to the Y. baccataY. treculeana group, albeit with low support (PP=0.59). Remaining haplotypes from Y. treculeana were placed in a derived clade (PP=1.0) that included a single haplotype from Y. baccata. In contrast to COI, support for clades in the posterior distribution of EF1α gene trees was uniformly low, with most clades having negligible support (PP 0.01) among a large number of equally likely topologies. Very few nucDNA lineages were recovered with PP >0.90, including: (i) a single allele from Y. treculeana and a single allele from Y. baccata (PP=0.95); (ii) a single allele shared by three individuals of Y. treculeana (PP=1.0); and (iii) two alleles each found in two different individuals of Y. schidigera, (PP=0.93, 1.0), respectively.

Figure 3
figure 3

Bayesian Markov chain Monte Carlo (MCMC) maximum clade credibility trees for cytochrome oxidase I (COI) and elongation factor 1 alpha (EF1α) from P. coloradensis on three species of Yucca. Tip node labels indicate accession, sampling location and host plant (cf. Figure 1). Posterior probabilities (>0.50) are shown next to the branches. These trees represent a single realization from the posterior distribution of genealogies for each locus. COI recovered many nodes with strong support (0.90), approximately corresponding to host plant use. In contrast, EF1α recovered a large number of equally probable arrangements with very low support (0.01), consistent with lower levels of host structure for nuclear DNA (nucDNA) versus mitochondrial DNA (mtDNA).

The AMOVAs for genetic differentiation among all three host populations showed substantial levels of structure within P. coloradensis (Table 1), with strong support for rejection of the null hypothesis of panmixia among host groups (COI: ΦST=0.625, P<0.001; EF1α: ΦST=0.261, P<0.001). Pairwise AMOVAs between host plants recovered significant differentiation for each comparison at a threshold of P<0.05 (results not shown), indicating that significant results for the global test were not driven by genetic structure between a subset of the three host groups. Mantel tests showed a similar pattern, with significant correlations between genetic distance and both geographical distance and host use (Table 2, Figure 4). Partial Mantel tests for correlation between genetic distance and host use (controlling for geographical distance) were also significant (COI: r=0.386, P<0.001; EF1α: r=0.064, P=0.021). Likewise, partial Mantel tests between each pairwise comparison of host groups were significant (results not shown), indicating that the results of the global test between all three host populations was not an artifact of host structure between any two host plants.

Table 1 Analysis of molecular variance for COI and EF1α from Prodoxus coloradensis on three species of Yucca
Table 2 Mantel and partial Mantel tests for COI and EF1α from Prodoxus coloradensis on three species of Yucca, examining the relationship between genetic distance, geographic distance and host use
Figure 4
figure 4

Scatter plots illustrating the relationship between genetic distance and geographical distance for cytochrome oxidase I (COI) and elongation factor 1 alpha (EF1α) from P. coloradensis on three species of Yucca. There is an overall pattern of isolation by distance for both loci. However, the correlation between genetic distance and host use remains significant after controlling for geographical distance (see Table 2 for correlation coefficients and P-values).

Likelihood-ratio tests for nested models in IMa are shown in Supplementary Table S2. The least complex model that could not be rejected for Y. baccata versus Y. schidigera or Y. baccata versus Y. treculeana was θ1=θ2=θA, m1=m2 (that is, equal effective population sizes with equal migration rates). For these two pairwise comparisons, we were able to reject the null hypothesis that m1=0 and/or m2=0 in almost all cases. The least complex model that could not be rejected for Y. schidigera versus Y. treculeana was θ1=θ2=θA, m1=m2=0 (that is, equal effective population sizes with no migration). For this pairwise comparison, we were unable to reject any null models in which m1=0 and/or m2=0. To evaluate estimates for m between Y. schidigera versus Y. treculeana with respect to the same model used for other comparisons between host groups, subsequent IM analyses for all three pairwise comparisons were conducted using θ1=θ2=θA, m1=m2.

Marginal posterior distributions for estimates of demographic parameters from IM are shown in Figure 5 and Table 3, representing results from the longest replicate run for each pairwise comparison. Posterior distributions for Tdiv had sharp peaks at <1.0 Myr (ca. 0.5–0.9 Myr) but exhibited long upper tails that failed to reach zero within the limits of the prior distribution (15.6 Myr). Convergent results from replicate runs with large ESS values (>300 minimum, typically >10 000) and no evidence of trends in run-time plots indicated that failure to achieve complete posterior distributions for Tdiv was not a consequence of poor mixing or inadequate run times. Likewise, increasing the upper bounds on t did not recover any evidence for decreasing probabilities at larger values, suggesting that the data were compatible with a wide range of divergence times at low but finite probabilities. The mode of Ne was consistently large for all comparisons among host plant groups (Ne=952 000–1 334 000) with well-shaped posterior distributions. The mode of Me was <1.0 for all pairwise comparisons (Me=0.260–0.963), although posterior distributions for this parameter encompassed a range of values exceeding Me>1.0. Locus-specific estimates for TMRCA and the mean number of simulated migration events are shown in Supplementary Figure S3 and Table S3. EF1α exhibited longer coalescence times compared with COI (4.0 Myr versus 1.0–2.0 Myr, respectively). Likewise, there was some evidence for higher rates of gene flow in EF1α versus COI (Supplementary Table S3), although there was substantial overlap in the posterior distributions for this parameter (Supplementary Figure S3).

Figure 5
figure 5

Marginal posterior distributions for effective population size (Ne), divergence time scaled to years (Tdiv) and effective migration rates (Me) for P. coloradensis on three host species of Yucca (B=Y. baccata, S=Y. schidigera, T=Y. treculeana). See Table 3 and the Methods section for details on calculating point estimates and 90% posterior densities of demographic values from coalescent parameters.

Table 3 Marginal posterior distributions of multilocus coalescent estimates for effective population size (Ne), divergence time scaled to years (Tdiv) and effective migration rates (Me) for pairwise comparisons of Prodoxus coloradensis on three host species of Yucca


Discriminating between the recent divergence of species and contemporary gene flow among populations is a principal difficulty for understanding the evolutionary processes that drive the diversification of phytophagous insects on different host plants. At deeper time scales, fixation of haplotypes or alleles leads to the coalescence of gene copies within species, permitting the application of simple phylogenetic models based on bifurcating lineages (Maddison, 1997). At more recent time scales, determining the balance between competing evolutionary forces (that is, mutation, selection, genetic drift and gene flow) remains a challenging proposition, despite advances in the application of coalescent theory to population genetics (reviewed in Rosenberg and Nordborg, 2002; Wakeley, 2008). Moreover, the recognition of random coalescence among different loci as a consequence of genetic drift has led to the increased emphasis on multilocus models for unraveling the historical demography of species and populations (Hey and Machado, 2003).

These issues are compounded when one considers the continuum of divergence from host races with appreciable gene flow to distinct, reproductively isolated taxa (Drès and Mallet, 2002; Mallet, 2008; Peccoud et al., 2009). This scenario of recent divergence at the boundary between species and populations is well represented by the results from P. coloradensis. Although we did not find evidence for monophyletic host plant clades, the posterior distribution of COI gene trees indicated that individuals from the same host species were more likely to cluster together than individuals from different hosts (Figure 3). In contrast, EF1α provided little support for clades within P. coloradensis, yielding a large number of equally probable topologies with no evidence for host structuring. Likewise, statistical parsimony networks for COI and EF1α showed that none of the sampled mtDNA haplotypes were shared among hosts, whereas a single widespread nucDNA allele was distributed across all the three hosts (Figure 2).

The absence of reciprocal monophyly among recently diverged species is unsurprising, as the probability of fixation is proportional to the strength of genetic drift (inversely related to Ne) and the time since divergence (Rosenberg and Nordborg, 2002; Wakeley, 2005). Under a Fisher–Wright model at equilibrium, populations will diverge due to the combined effects of mutation and genetic drift when the effective number of migrants (that is, Me=2 Nem) is <1.0 (Wright, 1931; see Hey et al., 2004) for an empirical example using IM). Estimates of large Ne and recent Tdiv from coalescent simulations (Table 3, Figure 5) supported the hypothesis of retained ancestral polymorphism. Furthermore, TMRCA values for both COI and EF1α extended past the estimated time of divergence (ca. 0.5–0.9 Myr), indicating that ancestral mtDNA haplotypes and nucDNA alleles have persisted in descendant populations (Supplementary Figure S3). In particular, TMRCA values showed that coalescence times for EF1α (ca. 4.0 Myr) were earlier than COI (ca. 1.0–2.0 Myr), consistent with larger Ne for diploid, biparentally nucDNA genes compared with the haploid, uniparentally inherited mtDNA genome (Birky, 2001; Ballard and Whitlock, 2004).

Likelihood-ratio tests rejected the null hypothesis of zero migration between the parapatric pairwise comparisons of Y. baccata and Y. schidigera or Y. baccata and Y. treculeana, but were unable to reject zero migration between the allopatric pair of Y. schidigera and Y. treculeana (Supplementary Table S2), indicating that low levels of gene flow may persist through sympatric contact zones (Figure 1). However, point estimates on the basis of the mode of Me exhibited a clear pattern of Me <1.0 for all pairwise comparisons among host plants (Table 3), suggesting that limited migration is insufficient to prevent divergence. Although 90% HPDs for Me included a range of values for which Me>1.0, we note that calculating the posterior density of m conditioned on the modal value of θ represents an extremely conservative interpretation of the data, yielding substantially wider confidence intervals than the alternative parameterization of the posterior density of θ conditioned on the modal value of m.

Although the general trends for coalescent estimates of Ne, Tdiv and Me were similar across comparisons among host populations, post hoc interpretation of pairwise IM results can present formidable challenges (Hoelzel et al., 2007; Lucas et al., 2009; Pinho et al., 2008; King and Roalson, 2009). When two populations have received immigrants from a third population not represented in a given pairwise analysis, estimates of Ne can be inflated; if both populations are exchanging migrants with this third population, estimates of Me can be similarly biased (Won et al., 2005). Likewise, as Me=2Nem, any upward bias in the estimate of Ne would also inflate the estimates of Me. In part, the large values for Ne and wide posterior distributions for Me from pairwise analyses of P. coloradensis may reflect these patterns (Figure 5), suggesting that our estimates are likely to overstate the true levels of effective migration among host plants.

The conclusion of low effective migration rates based on multilocus coalescent simulations was also supported by significant partitioning of genetic structure among host plants on the basis of AMOVA (Table 1). It is noteworthy that the levels of mtDNA differentiation were more than twice as large as nucDNA differentiation (COI: ΦST=0.625; EF1α: ΦST=0.261). This result was consistent with the greater phylogenetic resolution of host plant use based on COI versus EF1α (Figure 3). Likewise, the partial Mantel test for correlation between COI genetic distance and host use (controlling for geographical distance) was highly significant (r=0.386, P<0.001), whereas the analogous correlation between EF1α genetic distance and host use was much weaker (r=0.064, P=0.021).

As mtDNA is maternally inherited, these patterns may represent the signature of sex-biased dispersal. Prodoxus larvae typically pupate in the host tissue at the original site of oviposition. On pupal eclosion, we might expect that female moths remain on their natal host or within the same patch of host individuals, where oviposition sites are assured, whereas males disperse farther in search of unrelated mates (Gandon, 1999). This behavior parallels sex-biased dispersal seen in other phytophagous insects (Kuussaari et al., 1996; Albrectsen and Nachman, 2001; Petit et al., 2001; Caudill, 2003; Bowler and Benton, 2005; but see Mallet, 1986; Lawrence, 1988; Hill et al., 1996) and should lead to higher rates of nucDNA versus mtDNA gene flow (Salle et al., 2007).

The distribution of locus-specific migration events in the IMa coalescent simulations, which explicitly account for mutation rates and organellar versus autosomal inheritance, indicated that the peak values for inferred migration were similar for both COI and EF1α (Supplementary Figure S3). However, locus-specific posterior distributions for EF1α tended to be wider, encompassing higher rates of migration at low but finite probabilities, with the mean number of nucDNA migration events approximately two to three times larger compared with mtDNA (Supplementary Table S3). These posterior distributions are distinct from their respective prior distributions (see Methods), showing that differences in locus-specific gene flow are not simply an artifact resulting from the parameterization of nucDNA versus mtDNA inheritance models. Nonetheless, although female philopatry could represent a critical stage in the evolution of reproductive isolation associated with host shifts, the determination of sex-biased dispersal remains a difficult task (Prugnolle and de Meeus, 2002; Hedrick, 2005). Further understanding of the potential relationship between female philopatry and host-driven speciation is likely to require more detailed ecological studies of the fine-scale fitness consequences of host selection and oviposition in sympatric populations.

Given the parapatric distribution of Y baccata with respect to both Y. schidigera and Y. treculeana (Figure 1), we also asked whether high levels of genetic differentiation among moths on different hosts could be explained under a simple isolation-by-distance model, as distances between host populations can vary from immediate sympatry to hundreds of kilometers. Mantel tests indicated significant correlations between genetic distance versus both geographical distance and host use (Table 2, Figure 4). However, the relationship between genetic distance and host use remained significant after controlling for geographical distance (Table 2), suggesting that differentiation between host plants is not simply an artifact of spatial isolation. Conversely, geographical distance remains significant after controlling for host use. Noting the sister position of moths on Y. schidigera to Y. baccata and Y. treculeana in the COI phylogeny, as well as the general trend from basally branching western lineages to derived eastern lineages (Figure 3), this residual signal of spatial structure could reflect the historical signature of range expansion from west to east, leading to colonization and specialization on different hosts (for example, Segraves and Pellmyr, 2004). Alternatively, despite recent divergence, host populations may have reached equilibrium with limited gene flow between Y. baccata and geographically proximate flanking populations of Y. schidigera and Y. treculeana, with no direct gene flow between the latter, as suggested by the failure to reject the null model of θ1=θ2=θA, m1=m2=0 for these two widely separated host groups (Supplementary Table S2).

Despite the opportunity for gene flow in narrow zones of sympatry, the results of this study support the hypothesis that P. coloradensis comprises a complex of incipient host-specialized species. These results are consistent with high levels of differentiation and low rates of effective migration among host plants, although with considerable variation in the width of posterior densities for Me, accompanied by a strong signal of recent divergence (ca. 0.5–0.9 Myr) and the retention of shared ancestral polymorphism. Although the precise mechanisms remain unclear, there is also evidence for higher rates of nucDNA versus mtDNA gene flow, suggesting that female philopatry might contribute to divergence among host populations. Furthermore, many of these patterns are analogous to the results from previous study on pollinating yucca moths affiliated with Y. baccata and Y. schidigera (Leebens-Mack and Pellmyr, 1998), indicating parallel trajectories of host-specific divergence in both pollinating and non-pollinating species.

Considering the numerous closely related species of Prodoxus found on similar host plants with overlapping biogeographical ranges (Pellmyr et al., 2006), future studies will benefit from a better understanding of evolutionary relationships at the interface between species and populations, not just within P. coloradensis but across the entire genus. Deep coalescence of mtDNA haplotypes and nucDNA alleles suggests that shared ancestral polymorphism may persist across other recently diverged lineages of Prodoxus, complicating efforts to distinguish incomplete lineage sorting from the ongoing gene flow. However, the development of highly polymorphic nucDNA microsatellite markers in the closely related genus Tegeticula (Drummond et al., 2009) provides new molecular methods for dissecting the fine-scale population dynamics of prodoxid yucca moths distributed among sympatric or parapatric host plants. Likewise, coordinated estimates of the phylogenetic relationships among Yucca (Pellmyr et al., 2007; Smith et al., 2008b) have created an expanded framework for evaluating the biogeographical history and diversification of these taxa with respect to host plant specialization. Coupled with these tools, coalescent models for multilocus phylogenetic inference (Edwards et al., 2007; Liu et al., 2008; Kubatko et al., 2009) and extensions of the basic isolation-with-migration coalescent model to multiple populations offer novel opportunities to examine the extent to which host specificity influences the divergence of phytophagous insects.