Introduction

Behaviors that affect mate choice have an important role in the process of speciation by determining the chances of meeting between potential sexual partners or the recognition of these partners (Slater and Halliday, 1994; Schluter, 1998). These behaviors have been shown to evolve at a relatively rapid rate, and to frequently be the first phenotypes to vary among recently diverged lineages (Foster and Endler, 1999). For organisms that mate within their resource environment, habitat choice behavior is a form of mate choice, because it leads to de facto assortative mating (Bush, 1994; Schluter, 1998). The genetic underpinnings of habitat choice behavior may crucially affect the likelihood of speciation driven by selection in different habitats (Gavrilets, 2004; Via, 2009). Thus, the genetic architecture of habitat choice within species is an important aspect of speciation research.

How adaptation to different ecological environments could reduce gene flow has been examined in great detail in host-plant races of phytophagous insects (Itami et al., 1997; Feder, 1998; Funk, 1998; Via, 1999; Nylin et al., 2004; reviewed in Matsubayashi et al., 2010). Research on plant-feeding insects has actually contributed greatly to the development of a general framework for studying a type of speciation known as ‘ecological speciation’ (Schluter and Conte, 2009; Via, 2009). What do we know about the genetic architecture of habitat choice, or in this case host-plant preference, in host races of phytophagous insects? First, host preference is usually determined by a few (1–5) loci (reviewed in Matsubayashi et al., 2010). Those loci were not characterized further, except for two genes coding for odorant binding proteins in Drosophila melanogaster (Matsuo et al., 2007). Second, preference genes are often located on autosomes, except in two butterfly systems where Z-linked inheritance was reported. Third, dominance of preference for one host over another was found in about half the reviewed studies. Here, we report on the genetic architecture of a suite of behavioral characters that determine host-plant preference in the pea aphid (Acyrthosiphon pisum Harris), a phytophagous insect that has become a model system for the study of ecological speciation (Via, 2009; Peccoud and Simon, 2010). Unlike previous studies, we combine biometrical analyses and quantitative trait locus (QTL) mapping analyses. Also unique to this study is the analysis of different steps of the aphid behavior that eventually lead to the acceptance or rejection of the plant as host.

Pea aphids can be found on numerous legume species. However, this broad host range at the species level does not reflect generalized host use at the individual level—instead, reciprocal transplant experiments have revealed genetically differentiated host races in nearly all cases in which individuals collected from different hosts have been tested on alternative plants (Via, 1991; Simon et al., 2003; Ferrari et al., 2008), and these divergent populations span the continuum between populations and species (Peccoud et al., 2009). A large part of this genetic divergence in host use by pea aphids appears to be caused by variation in habitat choice (Via, 1999; Via et al., 2000). Habitat choice is a complex trait as it involves several sequential steps and multiple behavioral components. This trait was dissected by Caillaud and Via (2000). Winged pea aphids from alfalfa and red clover in New York State (USA) land without discrimination on alfalfa and clover. But if a specialist lands on the alternative host, the aphid rapidly abandons the plant without taking the time to find the phloem, which is the actual food source (Caillaud and Via, 2000). This rapid assessment of plant type involves tasting cells in the leaf or stem tissue nearly as soon as the feeding stylets are inserted into the plant. The decision to accept or reject depends on the recognition of stimulants specific to each host plant, not on deterrents or toxins found in alternate hosts (Del Campo et al., 2003). If very specialized pea aphids are prevented from leaving the alternate host, they will often remain unwilling to even search for the phloem, or ingest phloem sap at all, ultimately starving to death (Caillaud and Via, 2000). We know that pea aphids can experience severely reduced fitness on the alternate host, as shown in ‘sham migrations’ of winged aphids, which strongly favors the evolution of accurate and rapid habitat detection and choice (Via et al., 2000). The unwillingness to feed from the alternate host is therefore an important component of habitat choice in the pea aphid, and it is particularly important with respect to population divergence and speciation because it decreases random mating between sympatric populations on different hosts.

Here, the unwillingness to feed on non-hosts was dissected into simpler components for genetic analysis. In the honeybee, dissection of the character ‘dance communication’ showed that this complex behavior was regulated by subsets consisting of simple genic systems (Rinderer and Beaman, 1995). This is also possible for the character of interest here because feeding behavior in aphids involves not only the phloem sap, which is the final food source, but also other plant tissues, which contain plant allelochemicals that may affect plant acceptance or rejection. Different chemical signals, different sensory systems and different subsets of aphid genes, may control feeding. Although most of the feeding behavior involves the movements of the feeding stylets within the plant tissues, we were able to observe these using an electronic monitor, the direct current electrical penetration graph technique (Tjallingii, 1988, described in Caillaud and Via, 2000). Using both a modified version of line-cross analysis (Kearsey and Pooni, 1996; Lynch and Walsh, 1998), and a linkage map (as in Hawthorne and Via, 2001), we analyzed the mode of gene action and estimated the number of effective genetic factors for each of these traits.

Materials and methods

Clones and crosses

We made controlled crosses to analyze the genetic architecture of short-term feeding behavior in the pea aphid, exploiting the fact that pea aphids are cyclically parthenogenetic. After the sexually produced fertilized eggs hatch, progeny reproduce parthenogenetically. This allows replicated phenotypic measurements of each hybrid genotype to be made, increasing the experimental power of the analyses. In purely sexual species, replication of F2 progeny can only be obtained through making recombinant inbred lines, in which inbreeding depression or residual genetic variation may bias analyses.

A reciprocal single-pair cross between two genotypes specialized on different hosts was used for these analyses. The two genotypes were collected in 1989 in Tompkins county (NY, USA) from an alfalfa field (genotype ‘A1’) and a clover field (genotype ‘C1’). These two genotypes were chosen for these experiments because field experiments of demography on both hosts revealed that they typify the ecological specialization of a set of field-collected clones (Caillaud and Via, 2000). F1 hybrids were obtained from reciprocal crosses between these two parental genotypes, and two F1 were then crossed to produce the F2 generation. The two parental clones, the two F1 and the F2 were then used in a line cross analysis (Kearsey and Pooni, 1996; Lynch and Walsh, 1998).

The genotypes crossed to produce either F1 or F2 generations were induced to form sexual morphs through exposure to a declining photoperiod and a cold temperature. Note that each clone produces sexual females, all with the same genotype, and multiple XO males that differ in which chromosome X they bear. Ten replicate mating dishes of two males and three females for each direction of the cross were established so that the F1 and F2 progeny would have, on average, equal numbers of the two possible male-transmitted X chromosomes. All fertilized eggs produced over life of the females were harvested, surface sterilized and placed in an incubator under daily temperature cycles: 4 °C during a 12-h day and 0 °C during a 12-h night. After about 100 days of this winter treatment, eggs were removed from the incubator and the hatchling progeny were reared in Petri-dishes containing both alfalfa and clover foliage. These progeny reproduce clonally under long-day photoperiod, and thus a parthenogenetic lineage was established for each F1 or F2 clone. Each parthenogenetic lineage was then maintained individually in a container containing two 7 cm square pots planted with both clover and alfalfa.

Phenotypic characterization

We have previously identified the specific characters that explain differences in feeding behavior between the parental genotypes A1 and C1 (Caillaud and Via, 2000). Here, we evaluate four of these characters on both host plants in A1, C1, 2 F1 hybrids and 102 F2 hybrids. The characters are: the amount of time spent on a plant, but without penetrating it in search of food (NON-PEN, duration of waveform NP in direct current electrical penetration graph (see Caillaud and Via, 2000 for pictures of the various waveforms seen in feeding monitor trials)), the total time spent searching for feeding sites (SEARCH, duration of waveform C), the time before an aphid started injecting saliva in the phloem vessels to prepare the ingestion phase (START PREPARATION, time to waveform E1) and the amount of time spent ingesting sieve sap (INGESTION, duration of waveform E2) (Table 1). Typically, aphids deposited on host plants (C1 on clover for instance), search actively for the feeding sites (long SEARCH) and spend at least a third of their time on the plant ingesting nutrients (long INGESTION) (Caillaud and Via, 2000). In contrast, aphids deposited on non-host plants (C1 on alfalfa for instance) spend most of their time ‘sitting’ on the plant (long NON-PEN), and do not ingest nutrients (short INGESTION).

Table 1 Biological significance of four characters typifying behavioral divergence between parents A1 and C1

Aphid stylet activities during plant penetration were recorded for 390 min using the direct current electrical penetration graph technique as described in Caillaud and Via (2000). The aphid and the plant were included in an electrical circuit. The penetration of the aphid stylets in the plant modified the characteristics (voltage) of the electrical signal recorded, which provided reliable information about the behavior (ingestion, salivation) and the stylet tip position (phloem, xylem, and so on) of the aphid during plant penetration. The electric characterization and biological significance of these waveforms has been carefully calibrated (Tjallingii, 1988). This method permitted us to follow the activity of aphid mouthparts inside the plant tissues as they penetrated the epidermis, reached the phloem vessels, injected saliva in these vessels to prepare the ingestion step and eventually ingested phloem sap.

In a given day, four genotypes were tested on each of the two host plants. Aphid positions were randomized within each day to avoid any effect of their location in the Faraday cage. We performed 9–14 replicates per host for parental genotypes, 3–5 for 2 F1 genotypes and 2–5 for 121 F2 genotypes (total of 792 observations).

All experiments involved wingless adult aphids, starved for 3 h before experiments in order to increase their reactivity to the plant and to standardize the pre-experimental physiological conditions. Although winged aphids are responsible for the majority of host plant acceptance–rejection in nature, it is quite difficult to attach a gold wire to their thorax without interfering with their mobility. The feeding behavior of wingless individuals on the host plants of this study is not significantly different from the one of winged individuals (Flory and Caillaud, unpublished (Supplementary Figure 1). The recordings of the individuals that died (3%), or escaped (9% because of breakage of the gold wire) before the end of the recording period, were discarded. Experiments were performed at the same time every day to remove any effect of diurnal rhythm on the results. To even further reduce the effect of environmental variance, all aphids were set up by the same person and analysis of the recordings was done blind.

Temperature was held at 21±1 °C and aphids were under continuous and homogeneous artificial illumination coming from above (fluorescent tubes, 1500 lux). Plants were grown from seeds in 2.5′′ pots. Plants were maintained in a growth chamber at 20 °C and 16:8 light:dark, and were 5–6 weeks old at the time of the experiments.

The behavioral data were examined for normality (Proc Univariate; SAS Institute Inc., 1990). START PREPARATION and INGESTION were log (x+1) transformed before analysis. Line-cross and QTL mapping analyses were performed on least square means for parental lines and F1 hybrid lines, and best linear unbiased predictors for F2 hybrid lines (PROC MIXED, SAS Institute Inc., 1990). We were unable to examine the behavior of all 121 F2 hybrids on the same day, and observations were grouped into ‘blocks’ in our experimental design. The behavior of parental genotypes A1 and C1 on both alfalfa and clover was tested in each block (two replicates per plant). We compared the feeding behavior of A1 and C1 across blocks using PROC MIXED (SAS Institute Inc., 1990) where ‘plant’ and ‘parent’ were fixed effects while ‘block’ and all interactions including ‘block’ were random effects. A ‘block’ effect for a given behavioral character would suggest that part of the variation between the different clones studied is due to the fact that they were tested on different days. We will mention ‘block’ effects only when they were significant.

Line-cross analysis

To determine which behavioral characters showed significant genetic variation among the F2 hybrids, an analysis of each character on each plant was performed (PROC MIXED, SAS Institute Inc., 1990). ‘F2 hybrid’ was considered as a random factor because the F2 individuals studied are only a sample of the possible F2s that could have been obtained, while ‘parent’ and ‘F1 hybrid’ were considered as fixed because they were specifically chosen. No analysis of gene action or estimate of number of loci was attempted on traits for which there was no significant segregation variance among F2.

We tested the generation means for goodness of fit to genetic models incorporating additive or additive-dominant effects using a Joint scaling test as described in Kearsey and Pooni (1996). This test has been shown to be applicable to non-homozygous lines (as A1 and C1 are) in the absence of mating between close relatives (Lynch and Walsh, 1998). This method uses the means and variances of behavioral characters for three generations (parents, F1 and F2 hybrids) to derive estimates of the composite (that is, net) additive and dominance effects on the phenotypic difference between the parents for each plant. Here, we implemented the weighted least square method as described in Lynch and Walsh (1998). The object is to explain the variation between the observed generation means with as simple a model as possible. We started with the simple model involving only net additive effects (Wti.yi=wti(m+a.x1i)), tested its significance (χ2a), then gradually added higher-order composite effects to the model until no further significant improvement in the model fits occurred. We added a composite dominance effect first (Wti.yi=wti(m+a.x1i+d.x2i)) (χ2aXd). If the additive-dominance model was rejected, we proceeded on the analysis of a model containing an epistatic component (an additive by additive composite effect: Wti.yi=wti(m+a*x1i+aa*x3i….)(χ2aXa). For the latter model, and because we have only four generations, we had to drop the composite dominance effect from the model. This was possible only when dominance effects did not improve significantly the fit of the model, which we tested by calculating the likelihood ratio test statistic: a2−χ2aXd (Lynch and Walsh, 1998). The weighted regression analysis and the χ2 test were implemented using PROC REG (SAS Institute Inc., 1990).

The minimum number of segregating factors involved in genetic divergence in behavior between alfalfa and clover specialists (ne) was estimated using the method first developed by Castle (1921), then modified by Lande (1981) for use with non-homozygous populations. We use here the method suggested by Cockerham (1986) that corrects for sampling variances in the estimates of parental populations (equation 9.27 in Lynch and Walsh, 1998). The sampling variances of the parental means and the segregational variance were estimated using a weighted least squares procedure comparable to the one used to estimate the composite effects of the generation (PROC REG; SAS Institute Inc., 1990). After computing ne using the Cockerham equation (1986), we substituted this estimate in an expression suggested by Zeng (1992) that takes into account possible linkage (c) and unequality of allelic effects (Cα). Cα is the squared coefficient of variation of effects and c is the average recombination frequency between random pairs of loci throughout the genome. Estimates of c using this approximation were shown to not greatly differ from the more refined estimates that can be obtained with a genetic map (Lynch and Walsh, 1998). Cα is equal to 0 when all loci have equal effects. Nor Cα neither c are known for the pea aphid. A downwardly biased estimate of c is given by: c=(M–1)/2M, where M is the haploid number of chromosomes.

QTL mapping analysis

We used the same cross and the same linkage map of pea aphids that Hawthorne and Via (2001) used to map QTL affecting fecundity and acceptance (measured by looking at the location of winged individuals 70 h after release in a cage) on alfalfa and clover. This linkage map was made mostly from dominant AFLP markers, requiring a separate map for each parental genome. Linkage groups Ia–IVa pertain to the parental genotype specialized on alfalfa (A1). Linkage groups Ic–IVc pertain to the parental genotype specialized on clover (C1) (Hawthorne and Via, 2001). Seven co-dominant sequence-tagged AFLP markers allowed alignment between the two maps.

The genotype of 102 F2 progeny was assessed using 116 AFLP markers as described in Hawthorne and Via (2001). Primer information for all these primers is provided in Supplementary Tables 2 and 3. The F2 phenotype was assessed using the EPG technique as described earlier. QTL affecting feeding behavior were identified using composite interval mapping (Zeng, 1994) and model 6 of QTL Cartographer (Basten et al., 1996). In our analysis, the size of the ‘conditioning window’ used around the test interval was 15cM. The significance level of the likelihood ratio for each analysis was determined by permutations (Churchill and Doerge, 1994; Doerge and Churchill, 1996). One thousand permutations of the aphid phenotypes with respect to genotypes were performed for each chromosome and each trait separately because there is evidence of segregation distortion in the pea aphid, particularly for linkage group I, which inflates the permutation maximum likelihood ratio statistics (Doerge, personal communication). When the permutation maximum likelihood ratio statistics<50 times was significant at α=0.1 under the null hypothesis, the QTL was considered as suggestive. Estimates of the additive and dominance effects of each behavioral QTL (a, d), as well as the proportion of the variance explained by the QTL conditioned on the background markers (r2), were given by QTL cartographer.

Results

Descriptive statistics

No significant variation was found between EPG recordings performed on either of the two parental genotypes (A1 and C1) between the beginning and the end of the experiment 18 months later. This suggests that significant variation among F2 tested at different times during this period can be reasonably attributed to genetic variation and not to environmental variation. The descriptive statistics of the F2 generation are presented in the Supplementary Table 1. Genetic variation in the F2 generation was found for all characters. Variance between the F2 represented between 9.3% (SEARCH on clover) and 32.7% (INGESTION on clover) of the total phenotypic variance.

Line-cross analysis

If alfalfa and clover specialists have diverged primarily in genes with additive effects, then hybrid means for all generations should fall along the dotted lines joining the observed parental means in Figure 1. The extent to which the hybrid means are displaced from this line is proportional to the degree of dominance. In the presence of epistasis, the displacement for the F1 hybrids and the F2 hybrids is comparable. Hybrid means fall along the line for three characters: SEARCH in the alfalfa habitat (Figure 1b), START PREPARATION in the clover habitat (Figure 1g) and INGESTION in the clover habitat (Figure 1h), suggesting that these characters are almost completely additive. A joint scaling test reveals that the best model for explaining line means for these characters is indeed the simple additive model (Table 2). For all other characters, F1 hybrid means are displaced from the line. The additive-dominant model fits best the hybrid means of two characters: START PREPARATION in the alfalfa habitat (Figure 1c) and NON-PEN in the clover habitat (Figure 1e). In contrast, the additive and additive by additive epistatic gene action explain best genetic divergence between A1 and C1 for NON-PEN in the alfalfa habitat (Figure 1a, Table 2), SEARCH in the clover habitat (Figure 1f, Table 2) and INGESTION in the alfalfa habitat (Figure 1d, Table 2).

Figure 1
figure 1

Observed character means and standard errors for the four behavioral traits measured on alfalfa (ad) and clover (eh), in four lines (parentals A1 and C1, hybrids F1 and F2). A priori expectations of additive gene effects are represented by the lines drawn between parental means A1 and C1.

Table 2 Maximum likelihood estimates of four components of the generation means (m, a, d and aXa) and test of the significance of models incorporating these components (χ2 test)

Biometrical estimates of gene number using equation the Castle (1921) estimator, modified as suggested by Cockerham (1986), varied little between characters and ranged from 0.11 to 2.54 (Table 3). An estimate of c is 3/8 for pea aphids. As for Cα, its estimation is more elusive. With a Cα equal to 1, Otto and Jones (2000) found values of ne very close to the real number of simulated underlying loci (n) when n20. When n=100, ne underestimated n and only 35% of the loci involved were detected. In the pea aphid, using equation 9.27 of Lynch and Walsh (1998) with c=3/8 and Cα=1, we find ne between 0.43 (SEARCH on alfalfa) and 9.5 (INGESTION on clover) (Table 3).

Table 3 Estimates of gene number (ne) for the four behavioral characters, in each parental habitat

QTL mapping analysis

On alfalfa, no QTL was found for START PREPARATION on alfalfa, suggesting the involvement of many genes of small effect. For all other behavioral characters, we found 1–3 QTL spread over all four linkage groups (Table 4). The proportion of variance explained by each QTL varied from 7.3% to 52.1%. For each trait, and in each plant environment, all detected QTLs explained between 23.3 and 73.8% of the genetic variance for that trait in that environment (Table 4).

Table 4 QTL position and effect on the phenotype. START PREPARATION and INGESTION were transformed before analysis

Most of the QTL have positive additive effects on characters expressed on the ‘native host’ and negative additive effects on characters expressed on the alternate host (Table 4). That is, for instance, QTL for SEARCH in A (that is, alfalfa) on Ia (the alfalfa homolog for linkage group I) increases the time spent actively looking for nutritional tissues on alfalfa (thus increases feeding), while the QTL for SEARCH in A on IIIc (the clover homolog for linkage group III) decreases that time (thus decreasing feeding). The three exceptions were found for behavioral characters in the clover environment. The QTL for NON-PEN on clover mapped onto IVc has an effect opposite to the predicted effect because its directionality is positive thus decreasing feeding on clover. Similarly, QTLs for START PREPARATION and INGESTION on clover mapped onto IIIc have a directionality that decreases feeding on clover instead of increasing it.

In some cases, QTLs for several behavioral characters colocalized. For example, on linkage group Ia, between chromosomal positions 28 and 31cM, we found a QTL that decreases NON-PEN on alfalfa while increasing SEARCH and INGESTION on that plant (Table 4). Similarly, on linkage group IIIa, between positions 31 and 33cM, a QTL decreases NON-PEN on alfalfa while increasing SEARCH on alfalfa. Nevertheless, unique QTLs were detected for each character. For instance, SEARCH and NON-PEN on alfalfa are influenced by other QTLs than the common ones mentioned above, located on linkage groups IIIc and Ic, respectively (Table 4).

Discussion

One of our main goals was to distinguish between two alternative genetic models for the evolution of adaptation. We asked: can variation between populations in ecologically important traits be explained by allelic differentiation at a few major genes, or are key differences between populations because of many genes, each with a relatively small phenotypic effect? If divergence between A1 and C1 is caused by one-to-few major segregating factors with large effects on the phenotype, then host shifts could occur relatively rapidly through allelic change at only a few loci, providing a reasonable mechanism for the rapid diversification of herbivorous insects. Also, if divergence between A1 and C1 is caused by a one-few major segregating factors, then a search for molecular markers tightly linked to these loci is likely to be successful, and the eventual molecular characterization of these loci is likely to be possible. The biometrical estimates of effective genetic factors are strikingly low (Table 3). Similarly, QTL mapping analysis detected 0 to 3 QTL per character in each plant habitat (Table 4). One of these QTL maps on linkage group Ia and explains 52.1% of the genetic variance between specialized parents A1 and C1. It is tempting to speculate that a major gene influencing feeding behavior segregates in our F2 generation. True et al. (1997) have proposed a quantitative definition of a major gene effect as one for which the distributions of alternative homozygotes show little overlap such that the probability of misclassifying them is <0.05. When both homozygous classes are normally distributed with equal variance, the probability of misclassification is 0.05 when the means are 3.28 s.d. apart. For the character NON-PEN on alfalfa, the parental means are 5.61 s.d. apart, so a major gene would have an effect that explains 58.4% of the parental difference. The QTL for NON-PEN on alfalfa mapped onto Ia has an effect estimate of 52.1%. This may very well be a major gene. However, an inherent bias in QTL analyses comes from the fact that the same data are used to detect QTL and to determine their effect sizes (Beavis, 1998). As a consequence, effect sizes of detected QTLs are usually overestimated. For instance, in a simulation study involving 20 underlying QTLs and 500 F2's, Otto and Jones (2000) found that the 9.65 detected QTLs appeared to explain 100% of the parental difference, on average, although less than half of the underlying QTLs were detected. In the case of the pea aphid, this means that the QTL on linkage group Ia that explains 52.1% of the parental difference may actually have a smaller effect on the phenotype. Several studies have suggested the presence of genes with major effects on behaviors that may be involved in feeding, such as olfactory avoidance and foraging behavior in larvae of Drosophila melanogaster (reviewed in Anholt and MacKay, 2004) and adults of Apis mellifera (Hunt et al., 1995). However, few have used quantitative genetic methods to estimate gene numbers, and the QTL analysis of feeding behavior in pea aphids remains, to our knowledge, the only mapping study of this key behavioral trait in herbivorous insects.

The 1–3 QTLs detected for a given behavioral character, in a given plant environment explain 23–73% of the parental difference, thus showing that other QTL having a minor effect on feeding behavior have not been detected. Our QTL mapping analysis involved a limited number of recombinant genotypes (n=102). Although we increased the experimental power of our genetic analysis by replicating phenotypic measurements for each F2 genotype, on each plant, we only increased it by 1.73. Limitations in the size of the segregating generation in which the QTL mapping analysis is performed is known to lead to underestimates of the total number of loci involved (Beavis, 1998). With a small F2 population, QTLs that explain large fractions of the phenotypic difference between parental lines can be detected but loci of more subtle effect can be overlooked. In other words, feeding behavior in the pea aphid appears to have an oligogenic basis.

Thus, neither the single gene, nor the infinitesimal model of genetic adaptation, explains feeding behavior in pea aphids. An oligogenic basis of phenotypic divergence has also been reported in several studies of host-plant-associated traits. In the first one, Jones (1998) used a chromosomal assay to show that differences in performance between D. seychellia and D. simulans on the toxic Morinda fruit was influenced by five small regions of large phenotypic effect while large regions of the chromosomes have no effect on this trait. He then used these results to suggest that adaptation to Morinda was neither monogenic nor highly polygenic. In the second one, Sezer and Butlin (1998b) found that a small effective number of loci (ne) underlied differences in oviposition preference between two sympatric host races of the brown planthopper N. lugens. Craig et al. (2001) used Mendelian genetics to show that differences in host preference between two sympatric host races of the fly Eurosta solidaginis was inherited at a limited number of loci. Last, Dambrosky et al., 2005 used segregation patterns in F2 and backcross hybrids to show that only a modest number of allelic differences at a few loci may underlie host fruit odor discrimination in host races of Rhagoletis pomonella. In the pea aphid, we reach a similar conclusion but using QTL mapping. In addition, we provide suggestive evidence that a major gene influencing feeding behavior segregates in our mapping population. This type of genetic architecture may facilitate the rapid and repeated evolution of host range in herbivorous insects. Allelic diversification in just a few genes of large effect could allow initial establishment on the new host, followed by the spread of novel alleles at other minor loci that improve the ‘quality’ of this establishment.

Interestingly, the putative major QTL is on the X chromosome. Several other studies have reported a strong sex-linked component of host-associated behavioral traits. In two closely related species of the swallowtail butterfly (Papilio spp.), host preference was shown to be X-linked (Thompson, 1988). In two geographical populations of another butterfly, Polygonia c-album, Janz (1998) found that population differences in oviposition preference was also X-linked. Traits encoded on the X chromosome are believed to evolve at a faster rate, under natural selection, than autosomal loci, if mutations are fully or partially recessive because these alleles are shielded from selection by the heterozygous condition and exposed to selection when in the hemizygous state. In other words, X-linkage may function as a means to keep important combinations of traits intact in the face of recombination and gene flow. More studies on a variety of herbivorous insects are needed to examine the particular role that genomic regions on the X chromosome could have in the evolution of insect–plant interactions. In the case of aphids, the location of this putative major QTL on the X chromosome may facilitate future attempts to use positional cloning because males of pea aphids are haploid and a dense linkage map of this chromosome can be obtained relatively easily.

Moreover, this putative major QTL affecting NON-PEN on Alfalfa is located on the same chromosomal segment, between AFLP markers C6–675 and E8–330, as two other characters measured previously by Hawthorne and Via (2001). These characters are ‘Fecundity on Alfalfa’ and ‘Acceptance of Alfalfa’. ‘Fecundity’ was measured as the number of progeny during the first 9 days of adult life. ‘Acceptance’ was measured as the percentage of winged aphids that were on a plant with offspring 70 h after being released in a cage containing both clover and alfalfa plants (Hawthorne and Via, 2001). An analysis of genetic correlations between all these characters and another mapping study using a dense linkage map would help determine whether this colocalization is due to pleiotropy of a master locus, or to the presence of tightly linked loci affecting each character independently.

What about QTL affecting other behavioral steps besides NON-PENETRATION? Monitoring aphid stylets as they penetrate the plant tissues gives us the opportunity to identify unique genes that would otherwise be invisible if the behavior was analyzed with less detail. Our study suggests that they are indeed different loci influencing different parts of the exploration of the plant. The putative major gene located between AFLP markers C6–675 and E8-330 on the X chromosome, influences NON-PENETRATION, SEARCH and INGESTION on Alfalfa. But this locus does not influence START PREPARATION that is a step at which the stylet tips are inside the phloem vessels and intense salivation occurs in preparation for the switch to phloem ingestion (Table 4). This suggests that hybrid acceptance of alfalfa and clover could involve two events, one related to plant chemistry before the phloem vessels are reached, and another related to plant chemistry once the stylet tips are inside the phloem vessels.

Our other major goal was to investigate the mode of gene action on traits associated with host choice in pea aphids. The biometrical analysis revealed that a simple model including only additive effects explains the behavioral divergence between alfalfa and clover specialists for three of the eight characters (four traits on each of two hosts). A model including additive plus dominance effects were sufficient for two traits, while additive plus epistatic effects were required to explain divergence in the final three traits (Figure 1 and Table 2). This diversity of gene action is consistent with previous applications of line-cross analysis that suggest non-additive gene action is almost always involved in the case of differentiation of divergent lines (Hard et al., 1992; Fenster et al., 1997; Hatfield, 1997; Lynch and Walsh, 1998; Sezer and Butlin, 1998a). Non-additive genetic effects on behaviors associated with host-plant were also detected in two other phytophagous insects, the seed-feeding beetle Callosobruchus maculates (Fox et al., 2004), and the soapberry bug, Jadera haematoloma (Carroll et al., 2001).

One striking result shown by the QTL mapping analysis, is that all but 3 of the 16 estimated QTL alleles alter the phenotype toward the form of specialization predicted by the parental phenotypes (Table 4). For most quantitative traits, there is a mixture of plus and minus alleles in each parental line (Tanksley, 1993). For example, 36% of the QTLs detected in a cross between two tomato species had effects opposite to those predicted by the parental phenotypes (DeVicente and Tanksley, 1993). In D. mauritiana and simulans, QTL analysis of male-specific bristle number traits also revealed a mixture of plus and minus alleles (True et al., 1997). In contrast, a rare situation was described in another cross between D. mauritiana and simulans: all but 1 out of the 19 additive effect estimates showed the same positive sign (Zeng et al., 2000). In this context, Zeng et al. (2000) hypothesized that a strong preponderance of allelic effects in only a single direction reflected a history of unusually strong directional (or in our case, divergent) selection. The consistent directionality of the allelic effects on key traits in divergent pea aphid host races seen here and in Hawthorne and Via (2001), supports this hypothesis. We know that divergent selection on various components of host use in pea aphids is very strong (Via et al., 2000), as required for genetic divergence between populations that utilize different environments with no physical barriers to gene flow (Via and West, 2008; Via, 2009). Thus, it seems likely that future analyses in other systems will reveal that consistent directionality of allelic effects on ecologically important traits is a general feature of the genetic architecture of taxa that have diverged under strong selection in the face of gene flow.