Colonization and diversification of the Euphorbia species (sect. Aphyllis subsect. Macaronesicae) on the Canary Islands

Diversification between islands and ecological radiation within islands are postulated to have occurred in the Euphorbia species (sect. Aphyllis subsect. Macaronesicae) on the Canary Islands. In this study, the biogeographical pattern of 11 species of subsect. Macaronesicae and the genetic differentiation among five species were investigated to distinguish the potential mode and mechanism of diversification and speciation. The biogeographical patterns and genetic structure were examined using statistical dispersal-vicariance analysis, Bayesian phylogenetic analysis, reduced median-joining haplotype network analysis, and discriminant analysis of principal components. The gene flow between related species was evaluated with an isolation-with-migration model. The ancestral range of the species of subsect. Macaronesicae was inferred to be Tenerife and the Cape Verde Islands, and Tenerife-La Gomera acted as sources of diversity to other islands of the Canary Islands. Inter-island colonization of E. lamarckii among the western islands and a colonization of E. regis-jubae from Gran Canaria to northern Africa were revealed. Both diversification between islands and radiation within islands have been revealed in the Euphorbia species (sect. Aphyllis subsect. Macaronesicae) of the Canary Islands. It was clear that this group began the speciation process in Tenerife-La Gomera, and this process occurred with gene flow between some related species.


Results
Phylogenetic analysis and divergence time estimation. We conducted our phylogenetic analysis based on concatenated sequences. A phylogenetic tree (Fig. S1, Supporting Information) was constructed based on the concatenated sequence of chloroplast tRNA-Leu, psbA-trnH, ndhF and nuclear ITS (Table S1, Supporting Information). The deeper node shows that the subfamily Euphorbioideae had separated from the other subfamilies of the Euphorbiaceae family 75.56 ± 6.97 (PP = 1.0) Ma ago in the Upper Cretaceous. The phylogenetic relationships of sect. Aphyllis (Fig. S2, Supporting Information) were partially resolved by including four more chloroplast genes (Table S1, Supporting Information). The 11 recognized species constituted subsect. Macaronesicae and had diverged from subsect. Africanae 9.81 ± 1.05 (PP = 1.0) Ma ago in the Miocene. The first divergence within the Macaronesicae group was 6.92 ± 1.55 (PP = 1.0) Ma ago, when E. tuckeyana separated from the rest of the group; after that, E. lamarckii diverged 1.93 ± 0.98 (PP = 1.0) Ma ago in the Pleistocene. The other species of the Macaronesicae group split into three subgroups, but they had very low support according to posterior probability.
Biogeographical history. The results of statistical dispersal-vicariance analysis (S-DIVA) suggested 31 dispersal, eight vicariance and two extinction events for all 14 species analysed, and 21 dispersal, six vicariance and two extinction events for subsect. Macaronesicae. The ancestral range of subsect. Macaronesicae at node 24 was Tenerife and Cape Verde Islands, where one vicariance event was inferred for the species E. tuckeyana (Fig. 1a,b). At node 23, Tenerife is the ancestral area for species inhabiting the Canary Islands (areas from A to G), Selvagens Islands (I), and Madeira (J). The other nodes showed very low support, which does not allow us to clearly define an ancestral area. Dispersal events (for all 14 species) mainly occurred at three points in time: 9.34 Ma ago (Point A in Fig. 1c), 3.2 Ma ago (Point B in Fig. 1c) and less than a million years ago (Point C in Fig. 1c). The global pattern was that four speciation events occurred in La Gomera and Tenerife (E. atropurpurea, E. berthelotii, E. bravoana, and E. bourgaeana), and one more speciation event occurred in South Africa (E. berotica-E. stolonifera).
Genetic differentiation and population structure. The chloroplast genetic diversity of E. aphylla, E. atropurpurea, E. berthelotii, E. lamarckii, and E. regis-jubae across the Canary Islands was investigated at the population-level, although the population sampling of E. aphylla and E. atropurpurea was limited. A total of 17 haplotypes were revealed in the five species (Fig. 2a). On the Canary Islands, eight populations were fixed by a single haplotype, and six populations were polymorphic; the Ela-T-Me population of E. lamarckii on Tenerife possessed the highest number of haplotypes. The relationships of these chloroplast haplotypes were revealed through reduced median-joining network analysis (Fig. 2b). Five haplogroups were detected and corresponded to five species very well. Haplotype H1 was found only in E. aphylla, H2 and H3 were found in E. atropurpurea, and H4, H5 and H6 were found in E. berthelotii. Haplotypes H7-H14 were found in E. lamarckii, and H15, H16 and H17 were found in E. regis-jubae. No common haplotypes were shared between any pairs of species. Clustering of haplotypes based on phylogeny (Fig. 3)   When using chloroplast data from E. lamarckii, E. regis-jubae, E. berthelotii and E. atropurpurea, the values of Tajima's D and Fu's F S were positive but did not differ significantly from zero, and the test of goodness-of-fit rejected the demographic expansion model (P SSD = 0.011; P Rag = 0.002) but not the spatial expansion model (P SSD = 0.283; P Rag = 0.210).
The nuclear genetic diversity and allelic richness in each population were 0.284-0.757 and 2.516-6.985 (Table 1), respectively. The Ela-T-Me population of E. lamarckii on Tenerife possessed the highest genetic diversity and allelic richness, whereas the Eat-T-Ma population of E. atropurpurea on Tenerife had the lowest genetic diversity and allelic richness. The number of alleles revealed per locus was 11-23, and the observed heterogeneities were 0.407-0.596 (Table 2). Except for locus E42, the inbreeding coefficient (F IS ) was significantly different from zero (P < 0.05). The genetic differentiation (F ST ) estimated after excluding null alleles was 0.160-0.314. There was no evidence that microsatellites were affected by selection ( Fig. S3 and Supplementary Information).
The population structure was described using discriminant analysis of principal components (DAPC); the first axis distinguished E. atropurpurea from the other three species, and the second axis separated E. berthelotii, E. lamarckii and E. regis-jubae (Fig. 4a). In a further analysis including only E. lamarckii and E. regis-jubae, a clearer differentiation between E. lamarckii and E. regis-jubae was shown along the PC1 axis (Fig. 4b), and the PC2 axis further revealed a strong structure in E. lamarckii that split its eastern populations on La Palma and EI Hierro from the rest of the populations on La Gomera and Tenerife.
No evidence of gene flow was observed between the E. lamarckii and E. berthelotii species; however, possible evidence of significant gene flow was detected between the species pairs of E. atropurpurea and E. lamarckii, and  Table 1; (b) A reduced median-joining network of chloroplast haplotypes showed their relationships. All of the maps were drawn with the free software DIVA-GIS v7.5 (http://www.diva-gis.org/download).

Discussion
In the present study, we were able to determine the area of origin for one of the most characteristic groups in the region of Macaronesia, namely the genus Euphorbia sect. Aphyllis subsect. Macaronesicae. Our results showed that Tenerife and the Cape Verde Islands played the most important role in the origin and speciation of subsect. Macaronesicae; in particular, Tenerife -La Gomera was the area of origin of these species in the archipelago of the Canary Islands. Tenerife initially existed as three separate islands (Roque del Conde, Teno, and Anaga) 20 . Palaeo-island Roque del Conde in the southwest of Tenerife is very likely to have received migrants from Cape Verde due to the very close distance. La Gomera and the three palaeo-islands of Tenerife have been geologically stable since the Pliocene 9 and thus could act as repositories and stepping stones for the Macaronesian plant group 21 . We further discovered that dispersal events (Fig. 1c) mainly occurred at three timepoints, at 9.34 Ma, 3.2 Ma and 1 Ma, in which the first and third periods coincided with the emergence of La Gomera and EI Hierro. The present study revealed that populations on Tenerife, Gran Canaria, and La Gomera generally harboured the highest level of nuclear variation and possessed diverse chloroplast haplotypes (Table 1 and Fig. 2a), highlighting the important role of the central Canary Islands as cradles of biodiversity and as sources of diversity for other islands and for the mainland 9,10,22,23 . The estimation of age for the disjunction between subsect. Macaronesicae and subsect. Africanae predated the formation of the Sahara (approximately 6 Ma) and thus agrees well with a climate-driven vicariance explanation, as has been reported in other groups such as the genera Canarina, Plocama, Campylanthus, Camptoloma, and Hypericum 4,22 .
Our results revealed a relatively rapid evolution of subsect. Macaronesicae on the Canary Islands (Figs 3 and S2), which was consistent with a previous study that showed that Euphorbia sect. Aphyllis is a group that has diverged much faster than expected 4 . The Bayesian tree of subsect. Macaronesicae produced in the present study was partially consistent with the results of previous studies 4, 18,19 ; one of the similarities was the formation of a clade that consisted of E. lamarckii and another composed of E. regis-jubae. However, the species that were grouped with these two species were not entirely similar to the species groupings that were reported in previous studies 4,18 . What has happened in some species and the patterns between related species are discussed later because inter-island colonization seems to have played a major role in the diversification of subsect. Macaronesicae.
The genetic divergence between the "eastern" islands and the "western" islands has been revealed in numerous groups such as Canarina canariensis 9 and Phoenix canariensis 24 . In the present study, the west-east split corresponded to species differentiation between E. regis-jubae and E. lamarckii, which could be interpreted as an inter-island colonization and diversification in the archipelago 10,25 . E. regis-jubae and E. lamarckii are xerophilous and thermophilous species with a marked invasive capacity and appear to live in similar ecological conditions because both are very common constituents of the coastal belt vegetation at low elevations in the arid and semi-arid infracanarian bioclimatic belt 15 . The two species possessed roughly equal genetic diversity within populations in the present study. The two species have subtle morphological differences such as the sizes of bracts, and the subordination of E. regis-jubae to E. lamarckii as a subspecies was once the accepted view 15 . Their morphological variation might simply be a consequence of divergence through gradual transformation after inter-island colonization and finally the accumulation and harbouring of considerable genetic diversity through mutation and recombination 26 .
The inter-island colonization routes of the two widespread species could be inferred based on the chloroplast haplotype relationships. For E. lamarckii, the ancient haplotype (H11) was revealed in the Ela-T-Me population on Tenerife, and the haplotypes from La Gomera, La Palma and EI Hierro were paraphlyletic to the haplotypes from Tenerife, indicating that this species dispersed from Tenerife to La Gomera, La Palma and EI Hierro, which was consistent with an earlier study 9 and was further supported by S-DIVA analysis that suggested that Tenerife was the ancestral area for species that inhabit the Canary Islands. Similar to the genetic diversity pattern reported in Phoenix canariensis 24 , a split between the populations of E. lamarckii from La Palma and EI Hierro and from La Gomera and Tenerife was revealed by nuclear microsatellites; this isolation was reasonable when taking into account the fact that La Gomera and Tenerife are much older than La Palma and EI Hierro. For E. regis-jubae, the ancient haplotype (H15) was found on Gran Canaria. Three individuals of this species in Morocco possessed the derived haplotypes H16 and H17, indicating a colonization of this species from Gran Canaria to northern Africa via Fuerteventura and Lanzarote. The colonization of the African continent suggested that the Canary  Table 2. Characteristics of the 11 nuclear microsatellites used in this study. *Departure from Hardy-Weinberg (P < 0.05); **Departure from Hardy-Weinberg (P < 0.01).
Islands may play a role as refugia and a supply source for continental biodiversity 5,23 . The fact that the common chloroplast haplotype H16 was shared across the central and eastern islands suggested seed dispersal among islands because the chloroplast genome is transmitted maternally in most angiosperms. Considering other evidence for inter-island gene flow between E. atropurpurea on Tenerife and E. berthelotii on La Gomera, inter-island colonization was proposed to play a prominent role in the diversification of the Euphorbia sect. Aphyllis subsect.
Macaronesicae on the Canary Islands. The present study revealed that no haplotypes were shared between any pairs of species and that the haplotype phylogenetic tree (Fig. 3) could mirror the species' history. However, possible evidence of significant gene flow was detected between species pairs of E. atropurpurea and E. lamarckii, and E. atropurpurea and E. berthelotii. Gene flow between species may be a cause for the incongruence between phylogeny trees constructed based on the species and haplotype levels and for the low supportiveness of some nodes in the former (Fig. 1a) because it combined chloroplast and nuclear sequences. When natural selection drives splitting divergence in the presence of migration 27 , the species genome may remain selectively porous to gene flow even long after speciation 28 . Although difficult to detect, hybridization and introgression are more prone to happen in island-related taxa and are proposed to play important roles in the evolution of Macaronesian plants 29 . Speciation via adaptive radiation that is driven by selection within markedly different ecological zones is common within islands 14 . High volcanoes with sharp elevation gradients in climate and the complex topographies and secondary eruptions in the central and western islands of Canarian archipelago may facilitate within-island diversification 2,25 .
A clear link between climatic conditions and adaptation in succulent Euphorbia was revealed in Madagascar 30 . The divergence between E. atropurpurea and E. lamarckii on Tenerife and E. berthelotii and E. lamarckii on La  Gomera might be cases of within-island radiation because they occurred at different elevation zones of climate. Unlike E. lamarckii occupying xerophytic habitats in the basal vegetation belts of the Canary Islands, E. atropurpurea and E. berthelotii preferred the mesophitic and sub-hygrophytic areas at moderate elevations on Tenerife and La Gomera 17,31 . The level of genetic variation within populations of E. atropurpurea was obviously lower than within populations of E. lamarckii (Table 1), which provided further evidence of adaptive divergence 26 ; however, caution must be exercised in interpreting this result because we sampled only one population of E. atropurpurea, and several important populations in the palaeo-island of Teno and Roque del Conde (Adeje) were not included in this study; these ancient areas could have acted as refugia in the central Canary islands and could be considered as separate units in the phylogeographical history of Tenerife 9 . Although no evidence of gene flow was revealed between E. berthelotii and E. lamarckii on La Gomera, gene flow was detected between species pairs of E. atropurpurea and E. lamarckii on Tenerife; thus, the present study may provide an example of adaptive divergence in the face of gene flow. Thirteen populations were genotyped with 10 polymorphic nuclear microsatellites developed in E. lamarckii 32 . E. aphylla was not genotyped with SSR due to the very small sample size. Forward primers were labelled at the 5′ -end with the fluorochromes TAMRA, HEX, 6-FAM and ROX, and polymerase chain reaction (PCR) products were separated and visualized on an ABI-377 fluorescence sequencer (Applied Biosystems, Carlsbad, CA, USA).

Data collection.
Four non-coding chloroplast regions including the trnL intron, trnL-trnF, trnS-trnG, and psbM-trnD were sequenced. The TrnL intron and trnL-trnF were amplified and sequenced with universal primer pairs of c and d, e and f as described by Taberlet et al. 33 . TrnS-trnG and psbM-trnD were amplified and sequenced by primer pairs of trnS GCU and 5′ trnG2S, psbMF and trnD GUC R 34 .

Statistical analyses. Phylogenetic analysis and divergence time estimation.
We carried out the phylogenetic reconstruction and calibrated the divergence time step by step. First, the node time of the subfamilies of Euphorbiaceae was estimated through phylogenetic analysis of a set of DNA sequences of 16 species within three subfamilies (Acalyphoideae, Crotonoideae, and Euphorbioideae). DNA sequences of three chloroplast genes (tRNA-Leu, psbA-trnH and ndhF) and nuclear ITS were retrieved from GenBank (see Table S1, Supporting Information for accessions). The incongruence length difference (ILD) test 35 was performed with PAUP 36 to evaluate the congruence of the partitioned datasets, and a P value of the ILD test was not lesser than 0.01, which suggested that combining the data improved or did not reduce the phylogenetic accuracy 37 . Homoplasy and substitution saturation were assessed with PAUP and DAMBE5 38 . Low homoplasy and little saturation were revealed for the concatenated sequences. Phylogenetic analyses were conducted with the multi-species coalescent approach implemented in BEAST v2.4.0 39 using a calibrated Yule speciation process with the unlink option for tree, site and clock models in the partition panel. The substitution model of TN93 for chloroplast concatenated genes and HKY for ITS were selected based on the Akaike Information Criterion (AIC) by using the program MODELTEST v3.7 40,41 . The uncorrelated lognormal relaxed clock was used for both partitions. The multi-species coalescent approach was run using a constant population function with an autosomal nuclear ploidy for ITS and Y or mitochondrial for chloroplast genes. We used a run of 2 × 10 7 generations for the MCMC search and sampled every 2000 th generation. It was considered to be a stationary state of the Markov chain when all of the parameters obtained an ESS (Effective Sampling Sizes) value of more than 200 as measured in the TRACER v1.6 program 42 . FIGTREE v.1.4.2 43 was used to generate and visualize the resulting maximum clade credibility (MCC) chronograms. The nodes were calibrated according to the divergence time of the subfamily Euphorbiaceae (74.25 ± 7.35 Ma) provided in Magallón et al. 44 .
Subsequently, four more chloroplast genes were included, and a total of eight chloroplast genes and nuclear ITS (see Table S1, Supporting Information for accession) of 14 species were used in phylogenetic reconstruction with the multi-species coalescent approach as in the previous step. The 14 species were from two sections of Euphorbia (sect. Aphyllis and sect. Paralias), including all 11 species of sect. Aphyllis subsect. Macaronesicae, two species of sect. Aphyllis subsect. Africanae (E. berotica and E. stolonifera), and E. paralias of sect. Paralias. A calibrated Yule prior was used with GTR substitution models for both partitions. Strict and uncorrelated lognormal clocks were used for the chloroplast and nuclear genes, respectively. This combination gave the highest value of ESS. The divergence time obtained in the former step (node for Africanae 3.06 ± 0.97 Ma; node for Macaronesicae 7.98 ± 1.22 Ma; and node for Aphyllis 9.05 ± 1.28 Ma) was used as prior to calibrate this group.
Finally, we made a phylogenetic analysis of the haplotypes revealed in this study using a multi-species coalescent approach with the same parameters defined in the previous two datasets. The haplotypes were determined with DNASP version 5.0 45 from DNA sequences (NCBI accession No. KR188519-KR188875) obtained in this study. The calibrated Yule prior, GTR substitution model, and uncorrelated lognormal relaxed clock were chosen. In the dating of haplotypes, a nested-dating partitioned method 9,23 was used to apply different tree priors for inter-and intra-specific relationships, and the date estimation (6.92 ± 1.55 Ma for subsect. Macaronesicae) obtained in the second step (Fig. S2, Supporting Information)  Biogeographical and phylogeographical analyses. The biogeographical history of subsect. Macaronesicae was investigated based on eight chloroplast genes and nuclear ITS using the S-DIVA approach implemented in RASP v.1.2 46 . Thirteen geographical areas were defined: A = El Hierro; B = La Palma; C = La Gomera; D = Tenerife; E = Gran Canaria; F = Fuerteventura; G = Lanzarote; H = Morocco and North of Africa; I = Selvagens Islands; J = Madeira; K = Cape Verde Islands; L = South of Iberian Peninsula; and M = South Africa (Fig. 1b). We used the following settings: maximum area at each node = 2; allow extinction (slow); allow reconstruction (slow); use ancestral ranges of condensed trees 46 . Additionally, we calculated time-event curves using four functions for the dispersion, extinction, and vicariance events and the total number of biogeographical events over time 46 .
A reduced median-joining network was constructed with NETWORK v4.5.1.6 47 to resolve the haplotype relationships. Neutrality tests were performed by calculating the significance of Tajima's D 48 and Fu's F S 49 . The distribution of the observed number of differences between pairs of haplotypes was examined to assessed demographic expansion with ARLEQUIN version 3.0 50 . Harpending's Raggedness index 51 and the sum of square deviations (SSD) between the observed and expected mismatch were used as test statistics for goodness of fit to the expansion model; a significant P value from 10,000 bootstrap replicates rejected the fit of the data to the expansion model.
Population structure and gene flow. The software MICRO-CHECKER 52 was used to identify and correct genotyping errors in the microsatellite data. No evidence of scoring error due to stuttering and large allele dropout was found. Population genetic parameters including the number of alleles detected (A), allelic richness (A R ) rarefied to the smallest sample size of 11 diploid individuals per population, observed heterozygosity (H O ), gene diversity within populations (H S ), gene diversity in the total population (H T ), inbreeding coefficient (F IS ) and genetic differentiation among populations (F ST ) were calculated at each locus using FSTAT version 2.9.3 53 . The frequencies of null alleles were evaluated with the program FREENA 54 using 10,000 replicate simulations, and a refined estimation of population differentiation (F ST ) was obtained after excluding null alleles. The Hardy-Weinberg equilibrium and genotypic disequilibrium were tested with Bonferroni corrections 55 . An F ST -outlier approach implemented in ARLEQUIN 3.5 50 was used to test whether microsatellites were affected by selection. Population structure was described by discriminant analysis of principal components (DAPC) with the ADEGENET package in R 56 . DAPC does not rely on a particular population genetics model and is thus free of assumptions about the Hardy-Weinberg equilibrium or linkage disequilibrium 56 . The genetic variation was partitioned by analysis of molecular variance (AMOVA) using the program ARLEQUIN 3.5.
For E. berthelotii, E. atropupurea, and E. lamarckii, gene flow between their populations on La Gomera and Tenerife was inferred using the program IMA2 27 . Nine microsatellites with perfect dinucleotide repeats and chloroplast sequences were used in this analysis. The mutation rates for these microsatellites and chloroplast sequences were obtained from the earlier estimates 57,58 . A geometric heating model was used with a first heating parameter of 0.95 and a second heating parameter of 0.91. The maximum population size (4 Nu) was set to 100, and the upper bound of the priors for migration and divergence time (maximum time of population splitting) were set to m = 0.99 and t = 10.0, respectively. A total of 100 independent chains with burn-in durations of 1 million steps were run for the MCMC procedure 27 .