Introduction

Insects comprise about two-thirds of the 1.5 million described species of animals1, and current estimates predict that another 4 million insect species remain unknown2. This spectacular diversity is thought to be in a large degree a consequence of ecological speciation resulting from interactions with plants, particularly antagonistic interactions3,4,5,6,7,8,9,10. Antagonism between plants and insects could lead to accelerated rates of diversification, with the diversity of defenses among plants resulting from host specialization that in turn may spur radiations in insects circumventing those defenses6,7,10,11,12,13. The effects of these interactions are not restricted to macroevolution: theoretical models predict that, in specialized interactions, coevolution can lead to stronger differentiation when compared to spatial isolation alone in the case of antagonism but not in mutualism14,15. Regardless of the proximal mechanisms, a pattern of strong isolation by environment16 may be expected when insect–plant interaction is a major cause of reduced gene flow and an insect species interacts with different plant species or populations. For example, in brood pollinators (specialized pollinators that are also seed predators17) it has been observed that more divergent host plant populations are associated with more divergent insect populations18,19,20,21,22,23, but not in all cases evaluated23,24.

If antagonisms promote divergent selection leading to the formation of host races and ecological speciation9, genetic isolation between plant populations may be a better predictor of insect isolation in antagonists than in mutualists or commensals. It is unclear whether this is the case for most plant feeding insects, especially considering that these interactions often involve multiple partners and are spatially and temporally variable and context dependent25. Here we test this prediction by using a direct comparison between insects with different modes of interaction across scales of plant divergence. We take advantage of the variation in insect–plant interactions found in communities of palm-associated weevils distributed across the same geographic range and interacting with the same plants. We specifically test the hypothesis that isolation associated with host plant divergence is stronger in antagonistic species when compared to isolation by geographical distance alone.

Palms in the genus Syagrus, one of the closest relatives of the coconut26,27, produce large inflorescences that are visited by dozens of insect species28,29,30,31,32. The most abundant flower visitors of these Neotropical palms are specialized beetles in the family Curculionidae, one of the most diverse insect taxa33. We recently described the community of insects associated with the seasonally dry forest palm Syagrus coronata, showing that many weevil species are broadly distributed throughout the plant geographical range31. Some of them are brood pollinators, while others are antagonists breeding on flowers or seeds and some are commensals breeding on decaying plant tissues. Populations of S. coronata have been found to have deep genetic divergences34, and this plant shares many species of weevil with Syagrus botryophora, a parapatric palm specialized on rainforests and diverged from S. coronata early in the history of the genus, about 20 million years ago26,27. Given this old divergence, weevil morphospecies shared by the two plants are likely a result of relatively recent host shifts as opposed to long-term co-diversification. We used double-digest RAD-seq (ddRAD), a low cost genome-wide sequencing method35,36, to obtain genome-wide genetic markers for several populations of both plant species, including a population of S. coronata known as S. × costae, a hybrid with S. cearensis26. We used the same method to sequence nine morphospecies of weevil broadly distributed across the range of these palms. These nine morphospecies are all attracted to flowers and locally specialized on their host plants. They mate and lay eggs on their hosts and are distributed through a similar geographical range, but differ in the kind of interaction with plants in two relevant axes: their roles as pollinators as adults and whether their larvae breed on live or decaying tissues.

We first use the genomic data to delimit weevil species and better understand the diversity of these little-known insects. We find evidence for deeply divergent cryptic species, in most cases allopatric and associated with different hosts but broadly sympatric in the case of pollinators. Then, we test models of isolation by environment to ask whether the kind of interaction with host plants is associated with differences in the degree of isolation by geographical distance or isolation associated with host plant genetic divergence. Finally, we fit a Bayesian hierarchical model to test whether species with antagonistic interactions exhibit stronger levels of host-associated differentiation in relation to other species. We find that this is not the case: the variation in the degree of isolation by environment between species is not associated with breeding on live or decaying plant tissues.

Results

Cryptic weevil species

Biological information on the species studied here is summarized in Table 1, and the geographic sampling in Supplementary Fig. 1. We initially assembled genomic data sets by filtering low-coverage loci (<12 reads) and genotyping each individual separately. Visualization of patterns of missing data revealed that, for some of the weevil species, certain ddRAD loci are shared within groups of samples, with very few loci recovered across groups (Supplementary Fig. 2). This pattern could be an artifact resulting from batch effects during ddRAD library preparation, because samples in a batch are pooled before size selection and PCR amplification35. Alternatively, it could be a consequence of cryptic, deeply differentiated taxa contained within each species as traditionally recognized by morphology37. Since studying the early stages of divergence does not make sense in the complete absence of gene flow9,38, we first evaluated whether our data set included cryptic species.

Table 1 Weevil morphospecies included in the study with references for natural history information.

To test whether this is the case, we recorded the number of loci shared, average sequence divergence, and batch identity for each pair of samples in each morphospecies. We found that samples processed in the same batch do share more loci, but extreme levels of missing data are only explained by deep sequence divergence, sometimes above 2.5% (Table 2 and Supplementary Fig. 2). We note that, in all cases, splitting samples into operational taxonomic units (OTU) at this level of sequence divergence results in groups with very high genetic differentiation from each other as measured by G′ST (Supplementary Fig. 2). With the exception of Anchylorhynchus trapezicollis, these clusters separate populations on each host plant (Fig. 1). For all kinds of interactions, there is negligible to zero gene flow between these populations on the two different host plant species. In the case of the pollinator Anchylorhynchus trapezicollis, we find three genetic clusters, with one of them in both host species and broadly sympatric with the other two (Fig. 1). By comparing the morphology of the two most abundant clusters in sympatry and allopatry, we found differences in the length of ventral plumose hairs and in male secondary sexual characters (Supplementary Fig. 3). These diverged genetic clusters represent cryptic, previously unrecognized species. Hereafter, we will use OTUs as our unit of analysis, noting that these will be properly described as new species in the future. In general, we also recommend caution in studies of little-known organisms in which cryptic species might be common39, noting that we were only able to distinguish OTUs because samples were individually barcoded and not pooled by location.

Table 2 Effect of nucleotide distance and shared library batch on number of shared RAD loci (thousands).
Fig. 1: Principal component analysis (PCA) for each plant species and insect OTU.
figure 1

The PCA for each plant and insect species is independent, with position of a sample in the first two PC axes coded following the color legend provided: samples with more similar colors have more similar PCA scores. M. bondari OTU 2 is black since no PCA is possible with a single sample. Supplementary Fig. 4 shows the same PCA results but plotted in traditional coordinates instead of colors in a map. A small jitter was added to enable visualization of overlapping points. Dashed boxes enclose morphospecies. Large map includes known palm distributions26, 84 enclosed in dashed lines. Small maps show PCA results for each weevil OTU, with clusters enclosed in black dashed lines and labeled with uppercase letters corresponding to populations used in coalescent models (Table 3 and Supplementary Table 1). Scales 1 mm in insect images. Images of A. bondari OTU 1, C. decolor OTU 1, C. impar, D. polyphaga, M. ypsilon, and R. rectinasus OTU 1 were reproduced from de Medeiros et al.31 with permission of Oxford University Press and The Linnean Society of London.

A principal component analysis (PCA) of the genetic variation of each OTU reveals little spatial congruence among weevil OTUs and variable congruence with the genetic variation of their host plants (Fig. 1). We found evidence for genetic clusters in 12 of the 13 weevil OTUs (Fig. 1 and Supplementary Fig. 4) and investigated whether there is gene flow between these clusters by using a model of isolation with migration based on the site frequency spectrum (Supplementary Fig. 5). We found that, in all cases, models including migration had higher support than those that did not (Table 3 and Supplementary Table 1). Populations of Anchylorhynchus trapezicollis OTU 1 and Remertus rectinasus on different host plants have much deeper divergence and smaller migration rates than those interacting with S. coronata alone (Table 3 and Supplementary Table 1), indicating that there are well-delimited host races even in these cases that divergence is shallow enough to enable assembly of ddRAD data sets.

Table 3 Summary of isolation-with-migration model fit, showing pairs of populations with direct gene flow inferred in the best and second best model, as well as the ΔAIC between them.

Interactions do not predict patterns of isolation

Following evidence for ongoing gene flow between populations in each OTU, we assessed the role of geography and plant host as genetic barriers for each species of weevil. We also include climate in this analysis to account for the possibility of other differences in environment acting as genetic barriers. We used matrices of geographical distance, host plant genetic distance, and climatic distance between weevil populations as explanatory variables for the genetic covariance between weevils in a Bayesian model of isolation by distance and environment40,41. With model choice by cross validation, we found that climate was not a significant barrier to gene flow for any weevil species, and the significance of geography or host plant varied (Supplementary Table 2). For this reason, we ran these models again using the full data set, but including only geography and host plant as predictors. The importance of geography or host plant as the main driver of divergence varied between weevil OTUs, and this variation seems uncorrelated to the mode of interaction (Fig. 2 and Supplementary Table 3).

Fig. 2: Effects of geographical and plant distance on weevil pairwise genetic distance in variable sites (pairwise π).
figure 2

Colors show whether a species breeds on live tissue and is a pollinator following the color key. Dashed lines show the marginal effects of each distance implied by average parameter estimates.

To test whether species interactions are associated with differences in patterns of genetic divergence, we defined the statistics αdiff, which describes the relative importance of host plants as sources of population divergence when compared to geography for a given OTU (see “Methods”). We then implemented a hierarchical Bayesian model to evaluate the independent effects of being a pollinator or breeding on live tissue (i.e., being an antagonist) on the value of αdiff (Eqs. (1) and (2)). We scored interactions along these two independent axes because the positive aspect of a brood pollination interaction may also affect rates of population divergence. Mutualisms have sometimes been claimed to lead to highly specialized interactions and thereby promote diversification in both insects and plants42,43,44, but theoretical models do not predict that mutualisms lead to divergence in specialized interactions15. We note that Anchylorhynchus weevils are not exclusive pollinators of species of Syagrus palms30,31,45 and the net effect of these interactions is currently unknown. Our model estimates the effect of pollination or antagonism by the parameters γpol and γant, respectively (Eq. (2)). A significantly positive value for these parameters means that pollinators (γpol) or antagonists (γant) experience higher levels of divergence related to host plant divergence when compared to geography alone than species that are not pollinators or antagonists. We used posterior predictive simulations to find that the model adequately fits to the data (Supplementary Fig. 6), and also found that the number of OTUs used in this study provides enough power for inferences (see “Methods”) (Supplementary Fig. 7). There is substantial variation in estimated αdiff between OTUs (Fig. 3a), but no evidence that γpol is significant on either direction (Fig. 3b). While there is a positive trend for γant, its 95% credibility interval includes negative values (Fig. 3b).

Fig. 3: Effects of insect–plant interaction on variation of αdiff across OTUs.
figure 3

a Posterior distribution of αdiff across OTUs, ordered by average αdiff. b Posterior distribution of parameters associated with pollination and antagonism, when compared to non-pollinators and non-antagonists. Points: average estimates, thick lines: 50% credibility intervals, thin lines: 95% credibility intervals.

Discussion

The degree of isolation by distance and by environment in these weevils co-distributed throughout the same range and interacting with the same plants varies widely, and this variation is largely unrelated to the kind of interaction with their hosts. All insect morphospecies previously thought to interact with both of two host plant species are actually comprised of cryptic species or highly divergent populations, each specialized on a single host plant species. This is evidence that host plants constitute an important barrier for all beetle species sampled here. At a finer scale, plant host population divergence is a barrier to weevil gene flow for a subset of weevil OTUs, encompassing all kinds of interactions. OTUs breeding on live plant tissue seem to experience slightly higher divergence associated with host plants, but not significantly higher than other OTUs. Closely related OTUs do not necessarily show similar responses to geography and host plant divergence, suggesting that phylogenetically conserved traits (such as lifespan or flight ability) are not major drivers of the differences observed.

The lack of effect of pollination does not imply that mutualisms in general do not affect insect divergence rates. Anchylorhynchus weevils are not exclusive pollinators of Syagrus and therefore the outcomes of these interactions are very likely to be context-dependent and geographically variable, as other cases of non-specialized brood pollination46. Moreover, the difference in morphology of ventral hairs between OTUs might be related to pollen-carrying capacity. The lack of effect of antagonism, however, is unexpected. Palm flowers have chemical and physical defenses against herbivory47, but weevil OTUs breeding on decaying tissues and therefore not interacting with these defenses exhibit similar patterns of isolation to those that do attack live defended tissues. While the ability to digest and detoxify plant tissues is thought to be a key adaptation enabling macroevolutionary diversification of phytophagous beetles48, and weevils specifically49, it is unlikely that coevolution and adaptation to plant defenses is a universal source of divergent selection and a necessary condition to explain the high rates of insect speciation. A recent review found that most studies on candidate genes for specialization to hosts in phytophagous insects focus on resistance or detoxification of plant secondary metabolites50, but the actual source of selection might be in other aspects of host use. Divergence following host shifts is pervasive in phytophagous insects and their parasitoids51, despite the large variation in interaction outcomes. Even though coevolution is an important driver of diversification under some conditions5,7,15,43,52,53, evolution without reciprocal adaptation might be sufficient to explain many or most cases of insect specialization.

Diverse and complex phytophagous insect communities such as the one we study here are likely the norm rather than the exception in insect–plant interactions. Here we found that all weevil species, including those breeding on decaying plant tissues, show similar patterns of host-associated divergence. Strict antagonistic coevolution and divergence of host plant defenses are unlikely to drive this pattern. Despite the variation in larval breeding sites, all of the weevil species evaluated here mate on flowers31, and it is possible that the usage of flowers as mating signals is a more general driver of divergence for these beetles and other phytophagous insects. Verbal models of how the evolution of sensory biases could be a major driver of phytophagous insects diversification have been proposed for a long time54,55, but have received little attention in comparison to the wealth of research focused on plant defenses as drivers of diversity spurred by the classic study of Ehrlich and Raven12. The evolution of odorant receptors associated with mating signals in insect flower visitors has been recently linked to species divergence in at least one case56. Considering that about one-third of insect species visit flowers57, the generality of flowers and other host plant cues working as mating signals that result in insect species divergence should receive more attention.

We studied patterns of isolation by distance and by environment in nine morphospecies of weevils associated with flowers of two palm species, which turned out to be 14 weevil OTUs after cryptic species were identified. Host plant species identity was a very strong barrier to gene flow in all cases, with a different OTU or a highly divergent population on each host. Both geography and host plants, but not climate, are important barriers determining genetic differentiation, with variation between insect species being largely unrelated to the kind of interaction with their host plants. Insect–plant antagonistic coevolution does not seem to be required for insect specialization and the generation of barriers to gene flow, and other aspects of insect–host interactions, such as sensory biases, should be investigated in studies of phytophagous insect diversification.

Methods

Sampling

We sampled insects and plants from 13 populations of S. coronata (including S. × costae, hybrids with S. cearensis26) and five populations of S. botryophora throughout the distribution of both plant species (Fig. 1). Whole inflorescences were bagged and excised with insects aspirated and stored in 95% ethanol. Leaf tissues were collected from the sampled plant and other individuals in the vicinity. For this study, we chose nine specialized weevil species that we previously identified to engage into different kinds of interaction with their host plants (Table 1) and that have widespread geographical distributions31 and sequenced one to ten individuals per morphospecies per locality (Supplementary Fig. 1).

DNA extraction and library preparation

We extracted DNA from insects and prepared ddRAD libraries35 from 150 ng of input DNA as described in de Medeiros and Farrell36, including whole-genome amplification for low-yield DNA extracts. Some of the individuals were extracted destructively, but for others we digested full bodies split at the base of the pronotum and preserved the remaining cuticle. For plants, DNA was extracted from leaf tissues using the E.Z.N.A. HP Plant DNA Mini Kit (Omega Biotek) following the manufacturer protocol, and libraries were prepared with the same enzymes and protocol as for insects, but from 300 to 1000 ng of genomic DNA without whole-genome amplification. Barcoded libraries were sequenced on Illumina systems, in several runs pooled with unrelated libraries. The minimum sequence length was single-end 100 bp, and all sequences were trimmed to this length prior to assembly.

Initial data set assembly

Sequences were demultiplexed by inline barcodes and assembled using ipyrad v.0.7.2458,59. For insects, sequences were entirely assembled de novo, but removing reads of potential endosymbionts by using the ipyrad option “denovo–reference” with reference sequences including genomes of known weevil symbionts60 as well as Rickettsia and Wolbachia genomes downloaded from the NCBI. We assembled data sets separately for each insect morphospecies. For plants, sequences were assembled either by mapping to the draft genome assembly of the coconut61 or de novo for unmapped reads, using the ipyrad option “denovo+reference”. Reads were clustered within and between samples at 85% identity, and only loci with coverage greater or equal than 12 in a sample were retained for statistical base calling using ipyrad. Initially, we retained all samples and all loci present in at least four samples, and we used Matrix Condenser36,62 to visualize patterns of missing data. We then removed samples with excessive missing data from the data sets, since with whole-genome amplification these are more likely to include contaminants and amplification artifacts36. Instead of choosing an arbitrary threshold for filtering, we flagged for removal outliers as observed in the histogram view of Matrix Condenser.

Assessing missing data

For each insect morphospecies, we calculated the following pairwise metrics: (1) number of loci sequenced in common for each pair of samples, (2) the average pairwise nucleotide distance using the function “dist.hamming” in R package phangorn v.2.4.063, and (3) whether the two samples were prepared in the same batch. We tested whether sequence distance and batch effects are negatively associated with the number of common loci by fitting a regression on distance matrix64,65 implemented in the R package ecodist v.2.0.166.

Assembly of final data sets

After confirming that sequence distance is negatively associated with number of shared loci, we split the data sets for each morphospecies into clusters separated by at least 2.5% nucleotide differences using the R package dendextend v.1.8.067. To further confirm if clusters thus obtained consist of highly isolated populations, we used the R packages mmod v.1.3.368 and adegenet v.2.1.169,70 to calculate G′ST71 between these clusters using all loci present in at least one individual per cluster. In the case of Anchylorhynchus trapezicollis, clusters were sympatric across a broad range, so we compared the morphology of individuals with preserved cuticle to confirm their divergence with an independent source of data. Sequencing statistics are available in Supplementary Table 4.

Population structure

We used bwa-mem v.0.7.1572 to map reads on the consensus sequence for each RAD locus in the final data set. Alignment files in bam format were used as input to ANGSD v.0.92073 and PCAngsd v.0.97374 to filter sites not in Hardy–Weinberg equilibrium (HWE) while accounting for population structure75. We removed the whole RAD locus if any site was found not to be in HWE. We then used the same software to estimate genetic covariance matrices for each insect and plant species, as well as posterior genotype probabilities. PCA based on these covariance matrices were clustered by the k-means method with scripts modified from the R package adegenet. For each insect species, the optimal number of clusters was chosen by minimizing the Bayesian information criteria76.

Isolation with migration models

We used ANGSD and dadi v.1.7.077 to generate the multidimensional site frequency spectra for each morphospecies with more than one k-mean cluster. We used these as input for models of isolation with migration78 (Supplementary Fig. 5) in fastsimcoal v.2.6.0.379,80. All simulations were done with a mutation rate of 3e−9, in line with other insects81, but inferred parameters were finally scaled by the mutation rate (Supplementary Fig. 5). For each model, we ran 100 independent searches of the maximum likelihood parameters and selected the best model by the Akaike information criterion (AIC).

Isolation by distance and environment

We used BEDASSLE v.2.0-a140,41 to infer the effects of geographical distance and host plant genetic distance on the genetic covariance of weevil populations. We additionally tested the effect of climatic distance as a confounder. We generated valid82 (i.e., Euclidean) distances for explanatory variables as follows. We projected collection localities to UTM Zone 24S using the R package sf v.0.8-083 and calculated the Euclidean distance between them to obtain geographical distances. For climatic distance, we downloaded records of S. coronata and S. botryophora from GBIF84 using the R package rgbif v.1.3.085, cleaned them with the R package CoordinateCleaner v.2.0-1186, and then used the R package raster v.3.0-787 to extract bioclimatic variables88 for these localities. We used PCA to find that the first PC explained 90.9% of the variance in the data set and that annual precipitation (bio12) had a very high loading on this component (Supplementary Fig. 8). Therefore, we used the difference in Annual Precipitation as climatic distance. For plant host genetic distances, we used NGSdist v.1.0.889 to estimate genetic distances between all samples of Syagrus based on posterior genotype probabilities and including invariant sites. We then calculated pairwise genetic distances between populations as the average distance between all of their samples, and checked that the resulting distances were Euclidean by using the R package ade4 v.1.7-1590. For each weevil OTU with three or more populations sampled, we called genotypes with posterior probability ≥ 0.8 and filtered the data set to one site per RAD locus to avoid linked sites, including only sites genotyped in at least one sample per population. For cross validation, we split data sets in ten partitions with 50 replicates and chose the simplest model among those with highest explanatory power. After finding that climate was not an important variable for any species, we ran BEDASSLE2 models on the full data set with only host plant and geography as distance matrices, with four chains of 2000 generations each and used the R package shinystan v.2.5.091 to evaluate convergence.

The BEDASSLE model estimates parameters associated with the strength of the relationship between a given distance matrix and the genetic isolation of species40. Here we denote αg as parameter associated with geographical distance and αp the parameter associated with host plant genetic distance. We used all samples from the posterior distribution to calculate αdiff = αp − αg for each OTU. The variation of αdiff between OTUs indicates the degree to which plant or geographical distances are associated with barriers to gene flow for each OTU, with more positive values associated with greater importance of host plants. We estimated the determinants of variation in αdiff across species by implementing a Bayesian hierarchical model similar to those typically used in meta-analyses. For each OTU j:

$$\alpha _{{\rm{diff}}_j} \sim {\rm{Normal}}\left( {{\uptheta}_j,\sigma _j} \right),$$
(1)
$${\uptheta}_j \sim {\rm{Normal}}\left( {\mu + \gamma _{{\rm{ant}}} \times I_{{\rm{ant}}_j} + \gamma _{{\rm{pol}}} \times I_{{\rm{pol}}_j},\tau } \right).$$
(2)

In this model, σj is the standard deviation in the posterior estimates for αdiff, calculated from BEDASSLE posterior draws and assumed as known. μ is the mean αdiff for all OTUs, estimated by the model, and τ is the estimated variation in αdiff that is unrelated to species interactions. Iant and Ipol are indicator variables for whether each species is an antagonist (i.e., breeds on live tissue) or pollinator, respectively (Table 1). In our data set, both indicators have a value of 1 for brood pollinators and 0 for non-pollinators breeding on dead tissue, while for non-pollinators breeding in live tissue Iant = 1 and Ipol = 0. The parameters γant and γpol, therefore, are associated with the strength of the linear relationship between Iant and Ipol and αdiff, and constitute the model output of interest here. Values significantly different from 0 indicate that antagonism or pollination has a significant effect in determining the strength of weevil population divergence imposed by host divergence, when compared to space alone. We used standard Normal priors for γant, γpol, and τ and μ. We implemented this model in rstan v.19.292 using the Stan language. Models were run and convergence checked as for BEDASSLE models. We tested model fit by using posterior predictive simulations. Finally, we assessed whether the number of species included in this study is sufficient to achieve power to estimate γant and γpol by running a model with an extreme case based on real data. We used the real distributions of αdiff but relabeled the three species with highest values as non-pollinator antagonists, the next three as both pollinators and antagonists, and the remaining seven as neither pollinators nor antagonists. This preserved the number of species for each category in the real data but maximized the differences in αdiff between modes of interaction.

Statistics and reproducibility

Sampling locations and sample sizes for all species are available in Supplementary Fig. 1. The number of samples, populations, and genetic markers for each OTU is available in Supplementary Table 4. When more than eight individuals for an insect morphospecies were available for sequencing in a locality, we arbitrarily chose eight individuals for DNA extraction. After discovering cryptic sympatric species in an initial analysis, we sequenced additional individuals targeting the putative species to confirm their identity. Moreover, we randomized the position of samples in DNA extraction plates to avoid potential biases arising from cross-contamination when performing high-throughput automated DNA extractions for insects36. Different statistical tests were used for each section of the manuscript, with details in the appropriate sections above.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.