Introduction

Quaternary climatic oscillations have had dramatic effects on the evolution of species. The contemporary distribution of genetic diversity cannot be understood without studying how organisms responded to climatic history in geological times (Hewitt, 2000). The distribution ranges of temperate species were restricted during glacial maxima to a few glacial refugia and the organisms re-colonized northwards during interglacial periods of temperature amelioration. These re-colonization routes were often blocked by geographical barriers or by expansion routes from other lineages (Hewitt, 1996). The suture zones formed by lineages originating from different refugia and coming into secondary contact are considered to show higher genetic diversity than other geographic regions (Petit et al., 2003). In most cases, refugial areas were localized in the southernmost regions (for example, Schmitt, 2007). Yet, cold-tolerant species may also have survived in northern refugia (Stewart and Lister, 2001; Hewitt, 2004), such as the Alps, or central, eastern and northern Europe, leading to complex postglacial patterns (for example, Ursenbacher et al., 2006). Moreover, some species strictly depend on other organisms (hosts, mutualists, symbionts, and so on) for their development or dispersal. In that case, the phylogeographic pattern of the dependent species can be influenced by that of its partner, and a certain level of similarity may be highlighted (see for instance Burban et al., 1999; Burban and Petit, 2003). However, this is not always the case and the genetic structure of insects does not necessarily reflect that of its host (Stauffer et al., 1999; Sallé et al., 2007).

The pine shoot beetle Tomicus piniperda (L.) (Coleoptera: Scolytinae) has a Palearctic distribution (Balachowsky, 1949). The exact eastern limits of its geographical distribution are still unclear. It develops on various pine species but is mainly found on Pinus sylvestris in Europe (Postner, 1974; Pfeffer, 1994). Moreover, its present distribution and host association suggest that T. piniperda survives poorly on Mediterranean pine species such as P. halepensis and P. pinea, and that it cannot develop in warm and dry environments (Gallego et al., 2004). The life cycle of this species is characterized by two phases of dispersal and one over-wintering period (Långström, 1983). The first dispersal phase occurs during the reproductive process, when adults fly to an attractive host and oviposit in the inner bark. The second dispersal phase takes place after nymphosis, when young adults fly to the shoots of the host plant for maturation feeding.

As T. piniperda has an obligate relationship with a pine host for both reproduction and maturation feeding, we hypothesized that its phylogeographic pattern is closely linked to the past history of its main hosts. There is evidence that P. sylvestris had glacial refugia in Central Europe and in the Alps, whereas different pine species were also present in the Iberian Penisula and Italy during the coldest period (Sinclair et al., 1999; Soranzo et al., 2000; Burban and Petit, 2003; Bucci et al., 2007). We thus expect that T. piniperda exhibits a complex contemporary genetic structure. A phylogeographic study with a restricted number of populations suggested that the evolutionary history of T. piniperda reflects that of P. sylvestris with both southern and northern glacial refugia (Ritzerow et al., 2004). However, samples were lacking from the southern range and only one population was analyzed from the Iberian Peninsula. Thus, the recent past history of T. piniperda could not be solved.

Here we present a phylogeographic study of T. piniperda in its known distribution range in Europe using a mitochondrial marker. The objectives of our work were: (i) to reconstruct the phylogeography of T. piniperda with representative sampling of the species; (ii) to confirm the presence of northern refugia during the glaciations; and (iii) to infer if the past history of this cold-tolerant species was related to that of its pine hosts.

Materials and methods

Beetle sampling

Beetles were collected throughout Europe from 1999 to 2004. Thirty-four populations were found in sixteen countries and on six Pinus species, to complement the existing samples from France (Kerdelhué et al., 2002) and to obtain a representative sampling of the species across its distribution range in Europe. No insects were trapped in the southern part of the Iberian Peninsula, of Italy and of the Balkans, where only the congeneric T. destruens was found (Figure 1). Samples were stored in absolute ethanol. All relevant information is gathered in Table 1 and the localities are shown in Figure 1.

Figure 1
figure 1

Geographical distribution of T. piniperda populations and their associated haplotypes of the cytochrome oxidase I and II sequences. (a) Geographic distribution of the 34 populations represented with the proportion of different haplotypes for each population. Small open circles show unsuccessful trappings. Codes for the localities are given in Table 1. Color codes refer to the color used in the haplotype network. (b) Haplotype network of the 36 haplotypes. Each line corresponds to a mutational step and each empty circle to a missing intermediate. Haplotype frequencies are represented by the size of the circle.

Table 1 Sampling sites, date of capture, host tree, geographic coordinates and abbreviations of T. piniperda's populations

DNA extraction

DNA from the imago stage was extracted from head, thorax and legs, whereas DNA was extracted from the entire body for larval or pupal samples. The abdomen, elytra and antennae of adults were kept as vouchers in ethanol. DNA was extracted and isolated with the GenElute Mammalian Genomic DNA miniprep kit (Sigma, St Louis, MO, USA).

Mitochondrial DNA amplification

T. piniperda specific primers were used to amplify a partial region of the cytochrome oxidase I and II genes (Kerdelhué et al., 2002). The annealing temperature was 50 °C, and a total of 30 cycles of amplification was carried out in 50 μl reaction volume. PCR products were purified using the GenElute PCR Clean-Up kit (Sigma) and direct sequencing was carried out systematically with both PCR primers using the BigDye Terminator sequencing kit (PE Applied Biosystems, Foster City, CA, USA) and carried out with an ABI 3100 automatic sequencer. All sequences were carefully checked by hand before analysis.

Data analysis of mitochondrial sequences

All the obtained sequences, and sequences from France (GenBank accession numbers AF457799-AF457803, AF457804-AF457808, AF457819-AF457820, AF457822-AF457826; Kerdelhué et al., 2002), were aligned using ClustalW v1.4 (Thompson et al., 1994) as implemented in BIOEDIT v7.0.5.

A statistical parsimony network was computed using TCS v1.21 (Clement et al., 2000), which estimates genes genealogies from DNA sequences following the method described in Templeton et al. (1992). We used topological and frequency criteria (Crandall and Templeton, 1993; see also Pfenninger and Posada, 2002) to solve the few cladogram ambiguities that occurred.

Allelic richness R was computed after rarefaction to three individuals (Petit et al., 1998) using CONTRIB, for all populations including at least three individuals. Gene diversity H and within population mean number of pairwise differences π were calculated using ARLEQUIN v3.1 (Excoffier et al., 2005).

We used a spatial analysis of variance (SAMOVA 1.0, Dupanloup et al., 2002) to identify groups of population that are geographically homogeneous and maximally differentiated. By using a simulated annealing procedure, the program maximizes the proportion of total genetic variance because of differences among groups of populations (FCT). The program was run for 10 000 permutations from 100 random initial conditions for two to 15 differentiated groups (K=2 to K=15).

To test for a host effect on the distribution of genetic diversity, we carried out an Analysis of Molecular Variance (AMOVA, Excoffier et al., 1992) on the whole data set using ARLEQUIN v3.1. Populations were grouped according to the pine species from which they were sampled (see Table 1).

Occurrence of a significant phylogeographic structure was assessed by testing if Gst (coefficient of genetic variation over all populations) was significantly smaller than Nst (equivalent coefficient taking into account the similarities between haplotypes) by the use of 1000 permutations (see Pons and Petit, 1996) in the program PERMUT. Pairwise Gst and Nst were calculated using DISTON, and Nei's average number of differences between populations (corrected Nei's D; Nei, 1975) was calculated using ARLEQUIN. The geographic distances were computed between all sampling locations using the geographic coordinates (see http://jan.ucc.nau.edu/~cvm/latlongdist.html). To detect isolation by distance, matrices of genetic distances (pairwise Gst and Nst, and corrected Nei's D) were compared with the matrix of geographic distances by means of a simple Mantel test (Legendre and Legendre, 1998) using the R software (R Development Core Team, 2005). We used 999 random permutations to test the significance of the Mantel test statistic. Correlations between intra-population parameters (H and π) and latitude were assessed by means of linear regressions (using R). These tests were carried out to assess if genetic diversity significantly decreased with latitude. CONTRIB, PERMUT and DISTON are available at http://www.pierroton.inra.fr/genetics/labo/Software/. Both Mantel tests and diversity analyses were carried out on the whole data set and among the two major groups defined by SAMOVA (see below).

The demographic history of populations belonging to the different geographical groups defined by SAMOVA was inferred using different methods. Owing to the insufficient sample sizes, only the two major groups were analyzed (see below). First, mismatch distributions of the pairwise genetic differences (Rogers and Harpending, 1992) within each of the two geographical groups were conducted using ARLEQUIN and their goodness-of-fit to a sudden expansion model was tested using parametric bootstrap approaches (1000 replicates). The sum of squared deviations (SSD) between the observed and expected mismatch distributions was used to assess the significance of the test. Pairwise differences typically form two main patterns, that is, multimodal distributions which are consistent with demographic stability, and unimodal distributions which reflect recent population expansion (Slatkin and Hudson, 1991). In addition, the significance of population expansion was tested using Tajima's D (Tajima, 1989) and Fu's Fs neutrality tests (Fu, 1997) implemented in ARLEQUIN, and the Ramoz–Onsins and Rozas's R2 statistics (Ramos-Onsins and Rozas, 2002) carried out using DNASP version 4.20 (Rozas et al., 2003). Mismatch analyses were also used to estimate the approximate timing of expansion of T. piniperda's populations within the geographical groups defined by the SAMOVA. We used the relationship τ=2ut (Rogers and Harpending, 1992), τ being the age of expansion measured in units of mutational time, t the expansion time in number of generations, and u the mutation rate per sequence and per generation. This last value was calculated using the relationship u=2 μk, with μ the mutation rate per nucleotide and k the length of the sequence in nucleotides. The 2.3% pairwise sequence divergence for arthropods mitochondrial genes defined by Brower (1994) was used to approximate μ as 1.15 per million years per nucleotide.

Results

Mitochondrial variability

We obtained a final alignment of 150 sequences that were 797 bp long, including 463 bp in COI, 69 bp tRNA Leu and 265 bp in COII.

A total of 36 haplotypes (HT) were identified with 35 polymorphic sites and were named 1A, 1B, 1C, … 1Z, 2A, 2B, 2C, 2D, 2G, … - 2L (Table 2). Haplotype sequences are available in GenBank under accession numbers FJ619352-FJ619381, FJ619384-FJ619389. Haplotype 1A was shared by 63 individuals mostly outside the Iberian Peninsula and Italy (Figure 1). Haplotype 1B was found in 14 individuals mainly from France and Italy, haplotype 1I was shared by 10 individuals distributed from Pyrenees to Sweden, and haplotype 1G was found for seven individuals in the Eastern part of the distribution range. All other haplotypes were shared by a maximum of five individuals. Further, 25 private haplotypes (that is, haplotypes found in one population only) were identified, 17 of which were located in the Iberian Peninsula or the Pyrenees, four in Italy or Croatia, one in France and three in Northern latitudes (Germany, Poland and Estonia). Moreover, thirteen haplotypes (1L, 1O, 1R, 1U, 1Y, 1Z, 2B to 2D, and 2H to 2K) were found exclusively in the Iberian Peninsula, although in several populations.

Table 2 Haplotypes (HT) found in each population and population parameters

The 36 haplotypes were joined in a single haplotype network with 95% probability (Figure 1b). The three most frequent haplotypes, namely 1A, 1B and 1I, were closely related. Each of them had 2–3 rare satellite haplotypes that diverged from the major haplotype (either 1A, 1B or 1I) by one or two mutations. The rest of the network grouped all other haplotypes without any clear structure, as many loops occurred. There were very few missing haplotypes in the entire network. It is interesting that the three major haplotypes and their satellites were mostly found outside the Iberian Peninsula, whereas the rarer haplotypes present in the rest of the network were mostly found in Spain and Portugal (Figure 1a).

Mitochondrial population genetic parameters, spatial structure and host effect

Total gene diversity HT was 0.82, whereas the average within-population diversity HS was 0.59. The indices of population structure GST and NST were 0.276 and 0.439, respectively, and did not differ significantly from each other, indicating a weak phylogeographical structure. For each population, gene diversity H, allelic richness R and mean number of pairwise differences π are given in Table 2.

The SAMOVA analysis did not allow to unambiguously identify the optimal K, that is, the number of groups that shows the highest FCT value. FCT first increased from 0.510 to 0.520, reached a local maximum for K=4 and slightly decreased. However, it gradually increased again to reach 0.537 for K=15. However, as soon as K was higher or equal to five, the new groups consisted of single populations with no strong changes to the structure identified for K=4. This configuration was thus retained, with FCT=0.52 (P<0.001), FST=0.56 (P<0.001) and FSC=0.09 (P<0.001). The four groups obtained with the SAMOVA were (see Figure 2) (i) an ‘Iberian’ group clustering the six populations from Spain and Portugal and the Pyrenean population of highest altitude (LC-FRA); (ii) a ‘Pyrenean’ group comprising QS-FRA and NA-FRA; (iii) a ‘Central-European’ group clustering 4 populations from Italy, Germany, Poland and Russia (SON-IT, BIS-GER, HAJ-POL and KAL-RU); (iv) a ‘Main-European’ group gathering all the other populations. Owing to the insufficient sample size, only Iberian and Main-European groups were studied for within-group analyses (see below).

Figure 2
figure 2

Results of the Spatial Analysis of Molecular Variance (SAMOVA). (a) Geographical distribution of the four identified groups (see text for details). (b) Haplotype network of the 36 haplotypes showing the proportion of individuals belonging to each of the four groups. Color codes are the same as for the map above.

AMOVA showed that the proportion of molecular variance explained by hosts was not significant (14.36%, 0.05<P<0.10), whereas a significant amount of genetic diversity was found among populations within hosts (33.79%, P<0.001).

The Mantel test showed a significant effect of isolation by distance as the matrix of geographic distances was significantly correlated to either GST (standardized Mantel statistics rM=0.2798, P=0.002) or NST (rM=0.2395, P=0.003) in the whole data set, and to GST (Mantel statistics rM=0.1527, P=0.037) and NST (Mantel statistics rM=0.6033, P=0.045) in the Main-European and in the Iberian groups, respectively.

H and π were found to be negatively and significantly correlated with latitude in the whole data set (H: R2=0.430, P<0.001 and π: R2=0.169, P=0.024), whereas only H was negatively and significantly correlated with latitude in the Main-European group (R2=0.290, P<0.014). None of the two parameters were correlated with latitude within the Iberian group.

Demographic history

Demographic inferences were carried out within the Iberian and Main-European groups found by the SAMOVA analysis. Mismatch analyses for both groups were consistent with the sudden expansion model (Main-European: SDD=0.0007, P>0.10; Iberian: SDD=0.0079, P>0.10) and showed unimodal distributions that closely fit the expected distributions (Figure 3). Both Fu's Fs statistics were negative and significant (Main-European: Fs=−6.72, P<0.01; Iberian: Fs=−15.46, P<0.001). Tajima's D values were also negative but significant in neither of the two groups (Main-European: Tajima's D=−1.36, P>0.05; Iberian: Tajima's D=−1.01, P>0.1). Low values were obtained for R2 in both groups but they were not significant (Main-European: R2=0.06, P>0.1; Iberian: R2=0.08, P>0.05). The approximate timing of demographic expansion t was estimated for both groups using τ values (Main-European: τ=0.285; Iberian: τ=3.281), mutation rate of 1.15% per million year and per nucleotide (Brower, 1994), and a generation time of one year. For the Main-European group, the expansion was estimated to date back 7800 years, whereas the estimated age of the expansion for the Iberian group was 90 000 years.

Figure 3
figure 3

Mismatch distributions for haplotypes of T. piniperda among the Iberian and Main-European groups. The bars represent the observed distributions. Solid lines describe the distributions expected under a sudden expansion model.

Discussion

Our phylogeographical study of the bark beetle T. piniperda in its known European range showed that this species is genetically diverse, with 36 haplotypes identified from a sample of 150 individuals. A majority of these haplotypes were found in single populations, especially in the Iberian Peninsula. Even if the corresponding genetic network was not clearly structured and if no evidence of a strong phylogeographical signal was found, our results permitted us to decipher the recent past history of the species, and to show that its Quaternary history varied according to the region considered. In particular, the spatial analysis of molecular variance allowed us to identify four main groups of populations that illustrate the regional characteristics of this cold-tolerant species.

T. piniperda's Iberian populations: a past history consistent with the ‘refugia-within-refugia’ hypothesis

The Iberian Peninsula has long been identified as one of the most important Pleistocene glacial refugia for European species (Hewitt, 2004). Yet, it has recently been proposed that this region should rather be seen as a complex of several separate refugia, in which species or populations could independently survive during the glaciations (Gomez and Lunt, 2006). Many species actually show a strong population sub-structure within Iberia. This ‘refugia-within-refugia’ scenario can hold true for T. piniperda, as we found high-allelic diversity and high levels of endemism in this region. The Iberian populations (including populations found near the Pyrenees) are characterized by a high number of rare haplotypes. Moreover, genetic diversity in this region did not decrease with latitude, and the genetic divergence between populations was not significantly related to geographic distances. The data suggest that T. piniperda survived the recurrent Quaternary climatic oscillations in several refugia within the Iberian Peninsula without major movements or severe bottlenecks. Such an evolution is expected at the rear edge of distributions, in places where short altitudinal or latitudinal shifts were sufficient to track the most suitable habitats and where allelic diversity was thus maintained (Hewitt, 2001). We could estimate the date of population expansion in the Iberian group to be as old as 90 000 years, which is consistent with the hypothesis of local survival of the species during the repeated Quaternary climatic oscillations. T. piniperda is a cold-tolerant species that cannot develop under warm and dry climates (Gallego et al., 2004; Horn, unpublished data). It can thus be hypothesized that it was present in the southern part of the Iberian Peninsula and in low altitudes during glacial maxima, and that it survived the interglacials at higher altitudes and latitudes. Its present-day distribution is probably similar to its interglacial distributions, as was suggested for its main host P. sylvestris (Cheddadi et al., 2006).

It is interesting that most of the haplotypes found in Iberia were restricted to this region, suggesting a very limited number of long-distance migrations to the rest of Europe during the warming periods. This can be because of the Pyrenees, a physical barrier that efficiently hindered migration events. Dynesius and Jansson (2000) also suggested that populations that evolved in regions where climatic oscillations were limited were not selected for high-dispersal ability, in contrast to populations that underwent high intensities of climatic oscillations. However, we hypothesized that long-distance movements of individuals originating from the Iberian Peninsula could also have occurred, as can be seen from the presence of haplotypes 2A and 2G up to Russia and to the Italian Alps, respectively. Moreover, the spatial analysis of molecular variance suggested that the two populations located on the French side of the Pyrenees, namely QS-FRA and NA-FRA, should be seen as a separate group. Some of the haplotypes found in these populations are endemic to the Pyrenees and closely related to the Iberian ones, but others have a larger geographical range, including either France, Italy, or Northern Europe. This suggests that these two north-Pyrenean populations contain both individuals that belong to the Iberian group and individuals that migrated from other refugia or to northern regions.

Outside Iberia: a complex history made of multiple refugia and repeated long-distance dispersal events

Our results show that all populations sampled outside the Iberian Peninsula and the Pyrenees can be separated in two groups, (i) the Central-European populations from Germany, Poland, and Kaliningrad (on the Baltic Sea shore) together with one peculiar Italian population (SON-IT) that will be discussed below; and (ii) all other sampling sites. Some haplotypes are endemic from North or Central Europe (1N, 1H, 1Q), and others (1M, 1P) are also predominantly found in these regions which can explain why the Central-European populations form a separate group. Taken together, these data suggest that T. piniperda probably survived in a northern refugium, even if its location cannot be identified with confidence. On the basis of the macrofossils and pollen records, Cheddadi et al. (2006) showed that the host plant P. sylvestris survived the last ice age in the Mediterranean Peninsulas but also in the Southern Alps, the Danube region and the Hungarian plain. They argue that the populations are today present in their interglacial refugia, located north to their last glacial refugia in which they cannot survive today due to high temperatures. Haplotypes that survived in southern refugia are thus now located at higher latitudes or altitudes. Owing to the obligate relationship between T. piniperda and its pine hosts, we can hypothesize that the species had to survive the glacial maxima in regions where pines were present. After the arguments presented by Cheddadi et al. (2006), we suggest that North-Central Europe is the current interglacial refugium for haplotypes that probably survived the glaciations in Central or Eastern Europe, where pines were also present during the Last Glacial Maximum (LGM). As very little is known about the distribution of T. piniperda outside Europe, we cannot rule out the presence of refugia elsewhere. Genetic data for beetles sampled from eastern populations of P. sylvestris or from Asia Minor could help to distinguish between different hypotheses. It is interesting that the occurrence of northern glacial refugia from either Central or Eastern Europe was suggested for the spruce Picea abies and two of its associated bark beetles Ips typographus and Pityogenes chalcographus (Lagercrantz and Ryman, 1990; Stauffer et al., 1999; Avtzis et al., 2008) that are ecologically similar to T. piniperda.

Most of the other sampling sites are characterized by the occurrence of one of the two main haplotypes, 1A and 1B, and the whole group of populations bears signs of fairly recent population expansion, which is expected during the interglacial northern colonization of suitable habitats, after the range contraction imposed by the glacial maximum (Slatkin and Hudson, 1991; Avise, 2000). It is interesting that the date of the expansion was estimated to ca. 7500 years, that is, after the LGM. Contrarily to the situation described for the Iberian Peninsula, the geographic distributions of several haplotypes exhibit repeated long-distance migration events, which are because of the contraction-expansion cycles after the Quaternary climatic oscillations. Glaciations were more severe in this part of Europe than they were in Iberia, which can explain the contrast between regions (Hewitt, 1996; Dynesius and Jansson, 2000). Nevertheless, the location of some of the rare haplotypes allows us to formulate hypotheses about the most plausible refugia and the directions of post-glacial re-colonization. For instance, the haplotypes 1F and 1G, which are closely related to the main haplotype 1A, are found mostly in Italy, Slovenia and Croatia, which is consistent with a glacial refugium located south of the Alps. As they are found also in more northern locations like Germany or Poland, and as the main 1A occurs as north as Northern Sweden and Norway, we suggest that this refugium was one of the main sources for the post-glacial northward colonizations that took place during the last interglacial. Concerning the second most frequent haplotype 1B, the Southern Alps and the Massif Central harbor its closest rare haplotypes 1C, 1D and 1E, and the major haplotype 1B is restricted to France and the Italian Alps. This distribution suggests that this group of related haplotypes originated from an Italian refugium and that they are found today in their interglacial range, north of the places in which they survived the LGM. It is worth noting that the same scenario was suggested for P. sylvestris (Cheddadi et al., 2006; Naydenov et al., 2007).

Finally, some closely related rare haplotypes are found both in the Pyrenees and in Italy (1K and 1J, closely related to the more frequent 1I that is mostly found in the Pyrenees). This pattern can be explained if this group of haplotypes survived in a glacial refugium located in the north of the Iberian Peninsula and reached Northern Europe during an interglacial, as is suggested by the occurrence of haplotype 1I up to Estonia and Russia, in long distance movements similar to that of Iberian haplotypes 2A and 2G (Nichols and Hewitt, 1994). During the following glacial episode, some individuals may have been trapped in Italy and the Balkans, whereas some were still present in the Pyrenees. In the present time, individuals bearing these haplotypes are found both in the North of the Iberian refugia and in the North of the Italian refugia. Similarly, the haplotype 1W has a peculiar contemporary distribution, with individuals in the Iberian Peninsula, France, Italy and up to Poland. Once again, we can hypothesize that such a pattern is due to the fact that a given haplotype survived in different refugia after recurrent glacial episodes. The complex distribution pattern of haplotype 1W probably explains why the SAMOVA analysis grouped the Italian population SON-IT with the North-Central European populations. It is interesting that one haplotype of P. sylvestris found mostly in the Iberian Peninsula was also present in low proportions in the Balkans, which suggest a similar history (Naydenov et al., 2007), with different refugia sharing similar rare haplotypes.

T. piniperda is known to have good dispersal abilities (Kerdelhué et al., 2006), which may explain the complex phylogeographic pattern highlighted here. It is interesting that its sibling species T. destruens was recently proved to show a similar complex history because of high migration events in Europe (Horn et al., 2006). Moreover, our results show that the recent history of T. piniperda is highly similar to that of its main host P. sylvestris, even if it is not strictly associated to that particular host species. Their ecological requirements are very close, and the dispersal ability of the beetle allowed it to track its host both during glacial periods and during interglacials. Both species apparently shared the same refugia, in the Mediterranean Peninsulas, in the Southern Alps but also in more cryptic northern refugia.