Introduction

Speciation is responsible for the origin of biological diversity (Gavrilets 2003), and typically begins with a barrier to gene flow, which promotes further genetic and phenotypic divergence (Nosil and Feder 2012). Species are traditionally defined as a group of interbreeding populations that are reproductively isolated from other such groups (i.e., the biological species concept; Mayr 1963). However, more recently there has been a significant change in our understanding of the role of barriers to gene flow between diverging populations (Coyne and Orr 2004). It is now widely recognised that hybridisation between lineages can be common, leading to increased levels of genealogical discordance among loci within the genome (Lexer and Widmer 2008). In the genic view of speciation, species divergence occurs along a continuum of genetic differentiation (Wu 2001), with incipient species passing through a phase where they are only partly reproductively isolated (Kopp and Frank 2005). Evidence of partial speciation has been documented in many organisms, such as Drosophila (Legrand et al. 2011) and in the plant genus, Nuphar (Shiga and Kadono 2007). Lineages that are in the early stages of speciation offer important opportunities for understanding the mechanisms that drive species divergence (Schield et al. 2015). In this study, we examine patterns of diversification within Eucalyptus.

Eucalyptus is a large genus (comprising more than 700 species) in the family Myrtaceae that forms a significant component of the Australian biota (Smith et al. 2003). Although Eucalyptus is Gondwanan in origin (with fossils dated 51.9 Ma, Gandolfo et al. 2011), the present-day dominance of this genus across Australia is considered to be relatively recent (Crisp et al. 2004). Pollen evidence indicates that Eucalyptus became widespread in the Pleistocene (1.5–5 Ma; Martin 1982; Kershaw et al. 1994), and molecular studies suggest the occurrence of rapid radiations during the Quaternary (McKinnon et al. 2004). In south-eastern Australia, eucalypts are particularly diverse (in terms of both morphological and lineage diversity), especially in the sub-coastal area of central New South Wales including Sydney and the Blue Mountains (Wardell-Johnson et al. 1997). The latter region was declared a World Heritage Area partly due to its high eucalypt diversity (c. 100 species; Hager and Benson 2010). One diverse group of eucalypts from this area is the green ashes (subgenus Eucalyptus section Eucalyptus, sensu Brooker 2000), which includes tall trees on fertile soils, as well as smaller trees and mallees (multi-stemmed plants, usually less than 10 m tall) on shallow soils (Ladiges et al. 2010). While the larger green ashes can be fire sensitive (Nicolle 2006), mallees grow from an underground lignotuber from which they can resprout following fire (Mullette 1978). Many green ash species are morphologically similar (Fig. 1), have overlapping distributions, and cases of hybridisation have been reported (Johnson and Blaxell 1972). As such, the green ashes represent an appropriate group to study active evolutionary processes.

Fig. 1
figure 1

Distribution of the study species showing: Eucalyptus cunninghamii (), E. laophila (), E. stricta (), E. langleyi (), E. obstans (), and E. dendromorpha (). Leaf morphology and variations in bud and capsule size of each species are also shown (Klaphake 2012, pp. 45–49). Locations sampled in the present study are shown. Details of species and populations sampled (and location codes) are presented in Table 1. Maps were generated using Australia’s Virtual Herbarium (2015)

Speciation has influenced the diversification of present-day species (Barraclough and Nee 2001) and a comprehensive understanding of population genetic differentiation along environmental gradients is therefore important when investigating speciation mechanisms of taxa (Gaudeul et al. 2012). Genetic variation within a species often has a geographic basis since the processes of adaptation, gene flow, and genetic drift act differentially across heterogeneous landscapes and may be strongly influenced by the demography and spatial distributions of populations (Eckert et al. 2008). Previous genetic studies have demonstrated how a range of speciation mechanisms have played an important role in the evolution of different groups of eucalypts. For example, McGowen et al. (2001) investigated genetic differentiation (using nuclear microsatellites) in closely related eucalypts along steep mountains in Tasmania, and found that Eucalyptus vernicosa at higher altitudes had evolved in allopatry from E. subcrenulata from lower altitudes. Alternatively, Foster et al. (2007) found, using both nuclear and chloroplast (cp) DNA markers, that morphologically distinct tree and dwarf forms of E. globulus were maintained despite being geographically close, indicating that ecotypes, as well as species, could evolve in parapatry. Hybridisation and incomplete lineage sorting are thought to be important evolutionary processes in the diversification of Eucalyptus. For example, cpDNA variation in eucalypts from varying elevations in Victoria was found to be more correlated with geography than species identity, suggesting past or current inter-specific hybridisation (Pollock et al. 2013). The poor morphological resolution of many present-day Eucalyptus species complexes suggests that they may be the result of recent and ongoing speciation in conjunction with past and current hybridisation (Griffin et al. 1988). The hypothesis of recent speciation in many eucalypt groups is supported by observations of inter-specific hybridisation in extant Eucalyptus species (e.g., Vaillancourt et al. 1994; McKinnon et al. 2001; Field et al. 2011).

The advent of next-generation sequencing (NGS) and associated technologies has enabled a much higher genomic resolution for the study of speciation mechanisms compared with traditional molecular methods (Keller et al. 2013). One technique that is increasingly being used in Eucalyptus is Diversity Arrays Technology sequencing (DArT, e.g., Steane et al. 2011; Rutherford et al. 2016). DArT is based on genome complexity reduction using restriction enzymes, followed by hybridisation to microarrays to simultaneously assay thousands of markers across the genome (Jaccoud et al. 2001). This method has recently been used in combination with NGS to develop DArTseq, which provides at least three times as many markers as the microarray DArT method, as well as an additional set of co-dominant single-nucleotide polymorphisms (SNPs, Sansaloni et al. 2011). With technological advances such as DArTseq, evolutionary processes at a genomic scale can now be investigated and genetic variation across landscapes can be examined in greater detail (Bragg et al. 2015).

The phylogeny of the green ashes was previously estimated using the DArT microarray method (Rutherford et al. 2016). It was found that while some of the lineage relationships were consistent with previous taxonomic classifications primarily based on morphology, other relationships were not. Recent speciation events may result in incongruence between gene trees, which can lead to difficulty in estimating relationships among taxa (Rieseberg and Brouillet 1994; Seehausen et al. 2014). Poor molecular resolution of taxa can also be due to hybridisation or incomplete lineage sorting (Jones et al. 2013). Reticulate evolution (hybridisation between divergent lineages) has been suggested to have played a role in the evolution of the green ashes (Hager and Benson 2010). However, “reticulate evolution” is poorly described by traditional evolutionary models (i.e., those based on the assumption that evolution is successfully captured by a bifurcating tree; Mindell 2013). Since a preliminary allozyme study by Prober et al. (1990) found low genetic differentiation between the green ashes, a population genomic study is likely to provide novel insights into the evolutionary history of these closely related lineages.

In the current study, we used genome-wide DArTseq markers to investigate the evolutionary origins and population genomics of six green ash species with overlapping distributions across altitudinal and latitudinal gradients in south-eastern Australia. Our objective was to improve our understanding of speciation mechanisms within the green ash group. In particular, based on previous phylogenetic analyses (Rutherford et al. 2016), we aimed to: (1) examine patterns of genetic differentiation among populations and taxa, (2) explore the role of inter-specific hybridisation, (3) investigate associations between genetic differentiation and geographic factors, and (4) better delineate species boundaries within the green ashes.

Materials and methods

Study system and sampling strategy

The green ashes have long been taxonomically challenging, with much disagreement over the number of recognised species. Many of the species recognised by Hill (2002) (e.g., E. laophila and E. obstans) were not recognised by Brooker (2000). To ensure that the species concepts of all major authorities were considered in our study, we followed the narrower species descriptions of Hill (2002). We selected six green ash species (Fig. 1), five of which were very closely related (i.e., in the same clade based on previous phylogenetic analyses; Rutherford et al. 2016). The sixth species, E. cunninghamii, are morphologically distinctive and in a previous phylogenetic study was found to be in a clade separate from the other study species (Rutherford et al. 2016). We chose to include E. cunninghamii in the present study so that we could compare taxa that are difficult to distinguish morphologically and genetically with a species that was morphologically and genetically differentiated, thereby providing a context for our findings. Sampling was designed to cover the distribution of each species across a range of latitudes and altitudes in Sydney, the South Coast, the Southern Highlands, and the Greater Blue Mountains World Heritage Area (GBMWHA) (Table 1). Locations of populations were obtained from the National Herbarium of New South Wales database, Benson and McDougall (1998) and Mills (2010); and are summarised as follows. Although the area sampled covers the full distribution of the rare and restricted species, the more widespread species, E. stricta and E. dendromorpha, extend further south. The southern extremity of E. dendromorpha is in Monga National Park (c. 270 km south of Sydney), while populations of E. stricta extend to Batemans Bay (280 km south of Sydney).

Table 1 Location details of species and populations of green ash eucalypts sampled for genomic DNA

The study region is environmentally heterogeneous (0–1200 m a.s.l., 700–1800 mm annual rainfall). Eucalyptus stricta is widespread throughout this region, occurring in coastal, upland, and highland habitats. The other green ash species are rare, restricted, or localised. Both the mallee and tree forms of E. dendromorpha occur on sandstone substrates, generally in moist habitats (near waterfalls or creeks). Eucalyptus obstans occurs only in coastal habitats primarily from Sydney to Jervis Bay (200 km south of Sydney). Eucalyptus laophila is confined to higher elevations on sandstone ridges and pagodas (650–1100 m a.s.l.), while E. cunninghamii is restricted to escarpments (700–1000 m a.s.l). Many species are geographically proximate or have overlapping distributions. Other species are geographically disjunct and isolated. For example, E. langleyi is found only in the Nowra area (160 km south of Sydney).

Leaf material was collected from up to eight individuals (at least 10 m apart) per population. Mallees occur as stands (or clumps) that appear to grow from the same lignotuber, and as such, it can be difficult to discern a genetically distinct individual in the field (Rossetto et al. 1999). Therefore, a mallee stand was conservatively regarded as a genet and leaf material was collected from a single stem per stand. For some of the target populations (such as populations of E. cunninghamii), mallee stands were very close together and could not be sampled 10 m apart. Other populations (e.g., E. cunninghamii from Mount Banks) were small and it was not possible to sample eight individuals. In such cases, each mallee stand in the population was sampled. Eucalyptus laophila on the Glow Worm Tunnel Road (Wollemi National Park) was a scattered population, with some individuals occurring to the north and over the tunnel itself, while others were approximately 1.5 km south of the tunnel. For this population eight individuals at the tunnel itself and five individuals south of the tunnel were sampled. When sampling a population of E. stricta from Stanwell Tops, we found an individual that was morphologically differentiated from the others. We took a sample and voucher of this individual (NSW908486) and tentatively identified it as E. obstans. The geographic position (including elevation) of all samples were recorded (GPS model: Garmin Rino 650, Garmin Australasia, Sydney, NSW, Australia). Vouchers for each population were lodged at the National Herbarium of New South Wales. Leaf samples were freeze-dried and stored at ‒20 °C prior to DNA extraction.

DNA isolation and sequencing

Total genomic DNA was extracted from leaf samples following the protocol outlined in Rutherford et al. (2016). This was a CTAB protocol modified from Doyle and Doyle (1990). DNA concentrations were measured using a Qubit 2.0 Fluorometer (Invitrogen, Melbourne, VIC, Australia) and for each sample between 400 and 1000 ng of DNA (at a concentration of 50 ng mL–1) was sent to Diversity Arrays Technology Pty Ltd. Samples were genotyped by Diversity Arrays Technology Pty Ltd using the DArTseq platform (Sansaloni et al. 2011).

Genotype data were generated from total genomic DNA using proprietary analytical pipelines (DArT Pty Ltd). These pipelines first remove poor-quality sequences from Illumina fastq files, leaving approximately 2,500,000 sequences per sample for marker calling. Identical sequences were collapsed (creating “fastqcoll” files), which were analysed using a proprietary algorithm (DArT Pty Ltd). This algorithm corrects a low-quality base in a singleton tag using a collapsed tag that is supported by multiple sequences as a reference. The corrected fastqcoll files were then used in a proprietary (DArT Pty Ltd) variant calling and filtering algorithm (version DArTsoft14). High-quality variant calling was ensured by sequencing to high mean depth of coverage (>30 reads) at each locus. Finally, the quality of marker calls was assayed by including technical replicates of multiple samples and scoring the consistency of genotype calls between the replicates (leading to a “reproducibility” score for each locus).

Relationships of green ash species and populations

DArTseq analysis produced a large dataset with a high proportion of high-quality markers. Of the 54 303 markers, 77.6% had a reproducibility of 100% (with 91.3% of markers having a reproducibility of ≥98%) and 50.2% of markers had a call rate of ≥90% (that is, genotype calls missing in <10% of samples). The proportion of missing data for samples ranged from 13.2 to 43.9%.

We examined the full dataset in SplitsTree4 (Huson 1998; Huson and Bryant 2006). SplitsTree4 infers the genetic relationships among a set of samples as a network, based on information from multiple loci. The network can represent evolutionary histories with substantial reticulation, arising from incomplete lineage sorting and hybridisation (Huson and Bryant 2006). In the present study, a relationship network of the total dataset was generated in SplitsTree4 using the default settings of the software.

A reduced dataset of the higher-quality DArTseq markers was used for subsequent analyses. All SNPs with a reproducibility (proportion of replicate assay pairs for which the marker score is consistent) of less than 100% and which had more than 5% missing data were excluded from this dataset. To exclude the potential influence of linkage, one SNP was randomly selected from each of the restriction-associated loci that contained more than one SNP. This reduced dataset comprised 11,739 SNPs. To identify chloroplast data within the DArTseq dataset, we downloaded complete chloroplast genome sequences of 22 species of Eucalyptus (Table S1) from the NCBI (National Center for Biotechnology Information) database (https://www.ncbi.nlm.nih.gov/nuccore/?term=Eucalyptus, accessed 23 August 2016). A BLAST search of the full DArTseq SNP dataset against the chloroplast genomes was performed in CLC Genomics Workbench using the default settings (version 8, www.clcbio.com). Eight DArTseq markers were identified as potential chloroplast sequences and checked to be homozygous before being submitted to a BLAST search of the whole genome of E. grandis on the NCBI database (https://blast.ncbi.nlm.nih.gov/Blast.cgi, accessed 24 August 2016). Results of the BLAST search were used to confirm hits to the chloroplast genome and five DArTseq markers were confirmed to be of chloroplast origin. These five DArTseq markers were excluded from the dataset for all subsequent analyses.

We used the R package adegenet 2.0.1 (Jombart 2008) to perform a principle coordinate analysis (PCoA) to explore genetic structure between species. To further investigate relationships among populations, analyses were performed using STRUCTURE version 2.3.4 (Pritchard et al. 2000; Falush et al. 2003). STRUCTURE uses a Bayesian model-based approach to group individuals by multi-locus genotypes, but does not impose taxonomic, population or geographic groups a priori (Pritchard et al. 2000). STRUCTURE was run using a subset of the highest quality markers (to reduce the run-time of the analysis). This subset excluded all markers with a reproducibility of less than 100%, an average count of minor alleles less than 10, as well as all markers with missing data at any locus (leaving 4783 markers). To determine whether this subset was representative of the larger dataset, a comparative PCoA was conducted based on Nei’s unbiased genetic distances in the PCoA module of GenAlEx v6.501 (Peakall and Smouse 2006, 2012). A STRUCTURE analysis was performed using default parameters, K-values ranging from 1 to 25 (with five iterations for each K), 100,000 MCMC steps, and a burn-in period of 100,000 steps. The optimal value of genetically distinct clusters (K) was calculated based on the maximal mean posterior probability across replicates as well as the second rate of change (ΔK) (Evanno et al. 2005). We used STRUCTURE HARVESTER v0.6.94 (http://taylor0.biology.ucla.edu/structureHarvester, accessed 30 September 2016) to estimate the optimal value of K. Once the major sources of structure were identified, a hierarchical approach was used to examine the data further, with additional STRUCTURE analyses iterated for each subpopulation identified. Therefore, for each subpopulation identified we performed an analysis with K-values ranging from 1 to the total number of populations present and with the same parameters stated above (i.e., five iterations for each K, 100,000 MCMC steps, and a burn-in period of 100,000 steps).

Selecting the optimum value of K using STRUCTURE can be problematic as there is some degree of uncertainty as to what value of K best fits the data (Janes et al. 2017). The method of Evanno et al. (2005), which uses ΔK, is an ad hoc approximation that does not allow K to be 1 (Janes et al. 2017). Therefore, it is advised that users of STRUCTURE should not rely exclusively on ΔK, and to follow some recommendations when selecting the optimum value of K (Janes et al. 2017). So, in addition to using ΔK, we followed the recommendations of Janes et al. (2017), which involved inspecting all barplots produced by each STRUCTURE analysis at different values of K (from multiple runs and iterations), as well as including the Ln Pr(X/K) plots and STRUCTURE barplots for multiple values of K in the Supplementary Material when reporting results. Once the optimum value of K was identified for each STRUCTURE analysis, we used the FullSearch algorithm in CLUMPP v 1.1.2 (Jakobsson and Rosenberg 2007) to derive a single output from the five independent runs of K. We used Distruct v 1.1 (Rosenberg 2004) to visually display the probability of membership (Q) of individuals into the inferred groups.

Genetic diversity of species and populations

We calculated genetic diversity parameters from the highest quality markers (4783 SNPs). The expected heterozygosity (He), observed heterozygosity (Ho), number of observed alleles (Na), genetic differentiation between regions (FST), and degree of inbreeding (FIS) were determined using the Frequency Module of GenAlEx v6.501 (Peakall and Smouse 2006, 2012). Genetic diversity parameters were firstly calculated for the “taxonomic” species according to the species concepts of Hill (2002): E. cunninghamii, E. dendromorpha (including both the GBMWHA and Southern Highland populations), E. stricta, E. laophila, E. obstans, and E. langleyi. Genetic diversity parameters were also calculated for the groups found using the PCoA and STRUCTURE analyses described above, including E. cunninghamii, E. dendromorpha (GBMWHA populations only), E. dendromorpha (Southern Highland populations only), E. obstans, E. langleyi, as well as E. stricta and E. laophila combined.

We calculated pairwise FST values across all species and location combinations using the Frequency Module of GenAlEx v6.501. To better understand the contribution of species identity and geographical region to genetic differentiation, a hierarchical analysis of molecular variance (AMOVA) was used. AMOVA was performed using the PhiPT (Excoffier et al. 1992) analogue of FST in the AMOVA module of GenAlEx v6.501. The extent of genetic differentiation was determined between and within populations of each “taxonomic” species and for the groups found in the present study using the PCoA and STRUCTURE analyses. The extent of genetic differentiation was also determined between and within geographic regions (GBMWHA, Sydney, Southern Highlands and South Coast). Statistical significance was determined by comparison to 999 random permutations of the data.

Analysis of connectivity and between-species gene flow

Patterns of connectivity were examined using network analysis implemented in EDENetworks version 2.18 (Kivelä et al. 2015). Network analysis can illustrate gene evolution by taking into account uncertainties in mutational pathways or reticulate events such as recombination, lateral transfer, and hybridisation (Posada and Crandall 2001). This approach is free of many of the “a priori” assumptions that usually underlie other methods of interpreting population molecular datasets (e.g., geographic clustering) and provide a graphical approach of viewing multidimensional data (Kivelä et al. 2015). In EDENetworks, networks are constructed from genetic distances that are calculated using the FST-based distance of Reynolds (Reynolds et al. 1983). In the present study, population-level and individual centred networks were produced from the dataset comprising the highest quality markers (4783 SNPs). To investigate connectivity across the landscape, geographic locations of each sample were used to overlay the network onto a map of the study region.

We applied TreeMix (version 1.13, Pickrell and Pritchard 2012) to test for and visualise gene flow and admixture between species using the reduced DArTseq dataset (11,739 SNPs) that was used to generate the PCoA (described above). TreeMix uses allele frequency data and a Gaussian approximation for genetic drift among taxa and populations to estimate a maximum likelihood tree (Pickrell and Pritchard 2012). Admixture between branches of the tree is determined in a stepwise likelihood procedure, where the tree is searched for the optimal placement of each migration event (Pickrell and Pritchard 2012). The proportion and directionality of gene flow is displayed on the tree (this is estimated from the asymmetries in the relationships inferred by the tree, Martin et al. 2015). For our analysis, we estimated a maximum likelihood tree with E. cunninghamii (all populations) selected as the outgroup taxon (based on the phylogenetic analysis of Rutherford et al. 2016). We inferred a topology without admixture, as well as allowing for between one and six migration events.

Results

Relationships of green ash species and populations

A relationship network derived from the full dataset was generated in SplitsTree 4 (Supplementary Fig. S1). In a relationship network, the parallel lines indicate splits in the data, with the longer lines suggesting more support for that particular split (Huson and Bryant 2006). Eucalyptus cunninghamii formed a distinct cluster, as did one population of E. dendromorpha (from Mount Banks). A sample from the Stanwell Tops site (NSW908486) was also on a longer branch and isolated from all other species. The SplitsTree analysis highlighted the complexity of the full DArTseq dataset.

The higher-level relationships of the green ashes was visualised in a PCoA based on the reduced dataset (11,739 SNPs, Fig. 2a), which identified four main groups: E. cunninghamii, E. dendromorpha from Mount Banks (referred to as E. sp. Mount Banks hereafter), a single sample from Stanwell Tops (NSW908486) and the remainder of the green ashes. A PCoA of this latter group allowed the resolution of further divisions among the remainder of the green ashes (Fig. 2b). Eucalyptus langleyi and E. obstans formed species-specific groups, although the population of E. obstans from the South Coast (Jervis Bay) was positioned between E. langleyi and the Sydney populations of E. obstans. The GBMWHA populations of E. dendromorpha and E. dendromorpha from Jersey Lookout (Southern Highlands) formed separate groups. However, when axis 1 was plotted against axis 3 in the PCoA (Fig. 2c), differentiation was detected between E. dendromorpha from the GBMWHA, E. dendromorpha from Jersey Lookout, as well as E. dendromorpha from Redhills Road (Southern Highlands). All samples of E. stricta and E. laophila clustered together.

Fig. 2
figure 2

Principle coordnate analysis (PCoA) derived from the DArTseq SNP dataset showing: a all study taxa (based on 11,739 DArTseq markers), b the focal group based on 10,868 DArTseq markers and including all taxa except for Eucalyptus cunninghamii, NSW908486 from Stanwell Tops and E. sp. Mount Banks (axis 1 and 2 shown), and c the focal group (axis 1 and 3 shown). GBMWHA, Greater Blue Mountains World Heritage Area

A comparative PCoA of the high-quality dataset (4783 loci) used for STRUCTURE analyses displayed similar groupings of species and populations (Supplementary Fig. S2). In the STRUCTURE analyses, coefficient of membership values indicated that all populations of E. cunninghamii were assigned to one group (Q > 0.9), while the remainder of species, with the exception of NSW908486 from Stanwell Tops, were assigned to a second group (Q > 0.9, Supplementary Table S2). However, when all populations of E. cunninghamii were excluded from the dataset, only E. sp. Mount Banks registered a high coefficient of membership to one group (Q > 0.9, Supplementary Table S3). All populations of other species had lower coefficients of membership to a second group (Q < 0.8), with small coefficients of membership assigned to one or more of another five groups (Supplementary Table S3). STRUCTURE analyses identified distinct genetic clusters within many species (Fig. 3, Supplementary Fig. S4). In E. cunninghamii, the Pulpit Rock and Mount Banks populations formed separate groups to the other two populations, which were geographically closer together (Wentworth Falls and Kings Tableland). The mallee form (from the GBMWHA) and the tree form (from the Southern Highlands) of E. dendromorpha formed two groups (Fig. 3). STRUCTURE analyses of the mallee form of E. dendromorpha from the GBMWHA revealed that there was admixture between the populations. The two populations of the tree form of E. dendromorpha from the Southern Highlands (Redhills Road and Jersey Lookout) formed separate groups. Eucalyptus obstans from the South Coast (Jervis Bay) was in a separate cluster to the two Sydney populations (Beacon Hill and Royal National Park). We found ongoing admixture between E. stricta and E. laophila despite the geographic location of the populations. Although the results from STRUCTURE HARVESTER suggested that the value of K for the STRUCTURE analysis of E. stricta and E. laophila was 2, when we inspected the barplots from the analysis we found that there was no genetic differentiation between the two species (Fig. 3). Ongoing gene flow was detected between the two populations of E. langleyi.

Fig. 3
figure 3

Population structure of the green ash eucalypts (based on 4783 DArTseq SNP markers). The optimum value of K was calculated using the method of Evanno et al. (2005) (the ∆K plots from STRUCTURE HARVESTER for each STRUCTURE analysis are presented in Supplementary Fig. S3). Each graph produced from STRUCTURE was also inspected to ensure the optimum value of K for each species was calculated. STRUCTURE barplots with different values of K are shown in Supplementary Fig. S4. Location codes correspond to those presented in Fig. 1

Genetic diversity of populations and species

Eucalyptus cunninghamii had the highest FST among and the lowest FIS within populations (0.127 and –0.101, respectively; Table 2). Eucalyptus laophila had the highest FIS (–0.027), followed by E. stricta and E. laophila together (–0.042). Eucalyptus langleyi had the lowest FST (0.054) and the second lowest FIS (–0.090). Eucalyptus langleyi had the highest NA, He, and Ho (1.767, 0.246, and 0.220), while E. stricta and E. laophila together had the lowest NA, He, and Ho (1.458, 0.133, and 0.125). Pairwise FST values between all populations revealed that there was relatively strong genetic differentiation between E. cunninghamii and the other study species (Supplementary Table S4). Similarly, high pairwise FST values were found between E. sp. Mount Banks and all the other study species. In contrast, the magnitude of inter-specific pairwise FST values was not greater than that of the intra-specific (between-population) FST for E. stricta, E. laophila, E. obstans, E. langleyi, and E. dendromorpha.

Table 2 Summary of genetic diversity parameters for the green ashes

AMOVA revealed that for each taxonomic species (that were also supported by the groupings in the PCoA and STRUCTURE analyses), more than 80% of the variation could be attributed to within population variation (Supplementary Table S5). For example, Eucalyptus obstans had the highest between-population variation (19%), followed by E. cunninghamii (18%). Eucalyptus stricta and E. laophila (which were consistently grouped together in all other analyses) had the lowest between-population variation, as did populations of E. dendromorpha only from the GBMWHA (8%). We found that variation was better explained by “species” identity rather than by geographic partitioning. The variation among species was more than four times greater than variation among geographic regions (28 and 6%, respectively).

Analysis of connectivity and between-species gene flow

In networks produced by EDENetworks, network nodes (or vertices) represent populations as defined by sampling sites, and links (or edges) represent their relationships and interactions (Kivelä et al. 2015). In the current study, the population network (minimum-spanning tree) grouped most populations according to species (Supplementary Fig. S5). The population network of species and sampling sites overlain on a topographic map of the study region revealed a high degree of connectivity between many species across latitudes and along altitudinal gradients (Supplementary Fig. S6). There was a relatively high degree of genetic similarity between the population of E. obstans from Royal National Park and the other two populations of E. obstans, a population of E. stricta from Stanwell Tops, E. langleyi and NSW908486. In contrast, there was relatively low genetic similarity between populations of E. dendromorpha and E. stricta from Blackheath in the GBMWHA and samples from Stanwell Tops.

To test whether the low levels of genetic differentiation between species found in our other analyses were due to introgression, we used TreeMix on the reduced dataset (11,739 SNPs) to construct a maximum likelihood tree featuring varying numbers of admixture events. In our TreeMix analyses, we found that with the addition of each migration event, the log-likelihood (LL) of the analysis increased by a value of 36–132 until the addition of the sixth migration event (where the LL decreased by a value of 6). However, we found that TreeMix analyses with greater than four admixture events only showed minor improvements in LL. Furthermore, the TreeMix analysis with four migration events was consistent with all our other analyses. Therefore, the outputs from the TreeMix analysis featuring four migration events are presented in Fig. 4.

Fig. 4
figure 4

TreeMix analysis of the green ash eucalypts (based on 11,739 SNPs) showing the a inferred maximum likelihood phylogeny showing four migration events and b residual fit plotted from the maximum likelihood tree in (a). In (a) the directionality of gene flow is indicated by arrows and coloured (yellow to red) according to their weight (0–50%). In (b), the colour bar to the right of the matrix indicates degree of relatedness between populations, with residuals above zero indicating populations that are more closely related to each other in the data than in the best-fit tree (i.e., bluer shades indicate population pairs that are candidates for admixture events). Sample codes correspond to those presented in Table 1. TreeMix analyses under varying numbers of admixture events (0–3 events) are shown in Supplementary Fig. S7

Our TreeMix analysis recovered a maximum likelihood tree with similar groupings of taxa to those found in the PCoA (Fig. 4). All populations of E. dendromorpha formed a clade as did the three populations of E. obstans and the two populations of E. langleyi (Fig. 4a). All populations of E. stricta and E. laophila were grouped together, while E. sp. Mount Banks and NSW908486 each occupied isolated positions in the phylogeny. Weak admixture (<10%) was detected between E. cunninghamii from Wentworth Falls and a clade comprising the populations of E. dendromorpha from the Southern Highlands (Jersey Lookout and Redhills Road). Stronger admixture (>40%) was detected between E. stricta from Stanwell Tops (Sydney) and E. obstans (from Beacon Hill and Royal National Park, Sydney), between E. stricta and NSW908486 (both from Stanwell Tops), and between E. stricta and E. dendromorpha from the Southern Highlands (Sassafras and Redhills Road, respectively). The population of E. stricta from Stanwell Tops is 70 km from the population of E. obstans from Beacon Hill, while the population of E. stricta from Sassafras is 50 km from the Redhills Road population of E. dendromorpha. A number of populations were identified as likely candidates for recent admixture events (Fig. 4b). For example, E. stricta from the Southern Highlands (Sassafras) and E. obstans from the South Coast (Jervis Bay), and populations of E. stricta and E. dendromorpha from Blackheath (GBMWHA).

Discussion

A set of SNP genotype markers provided useful insights into the evolutionary diversification of a group of closely related species within the green ash group. Patterns of gene flow across the study species and associations between genetic variation and geographic factors suggest that a range of speciation mechanisms are likely to have been operating within the one species complex. DarTseq markers also provided some insights regarding species boundaries within a group where there has been much uncertainty regarding the divergence and differentiation of taxa. Our findings have implications not only for speciation mechanisms in Eucalyptus, but also for other groups comprising taxa that have recently diverged.

Speciation by ecological isolation mechanisms

We found E. cunninghamii to be highly genetically differentiated compared to all the other study species. This was consistent with the phylogenetic analysis of Rutherford et al. (2016), where E. cunninghamii was in a clade with species found in northern New South Wales and southern Queensland (E. approximans, E. codonocarpa, and E. microcodon) that was sister to a clade that included the green ashes from the Sydney region and GBMWHA. Eucalyptus cunninghamii is morphologically distinct from all other green ash species (in that it is usually less than 1 m tall with thin, soft, silvery green leaves; Hill 2002) and occupies a narrow environmental niche on exposed cliff edges (Benson and McDougall 1998). While the TreeMix analysis suggested some introgression between one population of E. cunninghamii (Wentworth Falls) and E. dendromorpha from the Southern Highlands, this admixture was relatively weak (<10%, Fig. 4a). Furthermore, although E. cunninghamii is sympatric with E. stricta and E. dendromorpha from the GBMWHA, it has remained genetically differentiated (with high inter-specific pairwise FST values relative to intra-specific pairwise values, Supplementary Table S4) while maintaining high levels of outcrossing (its populations have the lowest FIS, Table 2). The comparatively high genetic differentiation of E. cunninghamii may be due to differences in phenology between species, as E. cunninghamii is thought to be an autumn-winter flowering species, whereas E. stricta and E. dendromorpha flower during the summer months (based on voucher specimens from the National Herbarium of New South Wales; Benson and McDougall 1998). Differences in flowering times may limit interbreeding opportunities among closely related sympatric species (Mohler 1990; Cavender-Bares and Pahlich 2009) and additional research on the phenology of green ash species would further clarify the role of temporal isolation in facilitating species co-occurrence without admixture.

Specimens of E. sp. Mount Banks in the National Herbarium of New South Wales were previously identified as E. dendromorpha, or in one case, as a possible hybrid between E. dendromorpha and E. cunninghamii. However, our results strongly suggest that this population is differentiated from other populations of E. dendromorpha, as well as all other green ash species in the study (Fig. 2a). In Rutherford et al. (2016), E. sp. Mount Banks (referred to as E. dendromorpha from Mount Banks) was in a clade with E. stricta and E. laophila. Eucalyptus sp. Mount Banks has morphologically distinct characters, in that its leaves have a bluish tinge and are shorter than E. dendromorpha and wider than E. stricta. While it is possible that E. sp. Mount Banks is a hybrid between a green ash and another non-green ash Eucalyptus species, the results of the STRUCTURE analysis suggest that there was very little admixture between this population and the other green ash species (in Supplementary Table S3 E. sp. Mount Banks was strongly assigned to Group 2 with a Q-value of 0.9982). It is therefore more likely that E. sp. Mount Banks is an undescribed species existing in sympatry with related species.

Between-species admixture and reticulate evolution

Relatively low genetic differentiation was detected among all other species. This was consistent with the findings of previous studies (Prober et al. 1990; Rutherford et al. 2016) and could be interpreted as a signature of recent radiation and reticulate evolution. Recent molecular dating of eucalypts places the radiation of subgenus Eucalyptus in the last 10 Ma (Crisp et al. 2011), while the earliest known fossils of subgenus Eucalyptus are from Late Miocene deposits (5–10 Ma; Blazey 1994). However, despite low between-species genetic differentiation many species in the present study maintain distinct morphological characters. For example, E. dendromorpha and E. stricta in the GBMWHA are often sympatric but can be distinguished by habit (E. dendromorpha is taller than E. stricta; Hill 2002) and leaf size (leaves are significantly narrower in E. stricta; Hill 2002). Low genetic differentiation between sympatric species that can be distinguished morphologically has been documented in previous studies of Eucalyptus (e.g., Shepherd and Raymond 2010; Pollock et al. 2013), as well as studies of other plant genera, such as Quercus (e.g., Petit et al. 1997) and the Australian and South American genus Lomatia (McIntosh et al. 2014). This phenomenon could be due to strong selection of morphological characters resulting in taxa that are phenotypically different despite ongoing gene flow (Latta 2004). Alternatively, taxa could still be in the early stages of speciation and have developed morphological differences even though they are not yet strongly differentiated (Wu 2001).

Between-species admixture and hybridisation in the current study was supported by our TreeMix analysis, which showed relatively strong admixture between many sympatric and geographically close populations of many species (Fig. 4a). This is concordant with the findings of inter-specific hybridisation in other eucalypt species (e.g., McKinnon et al. 2001; Nevill et al. 2014; Jones et al. 2016), as well as species of Lomatia (Milner et al. 2012), Quercus (e.g., Petit et al. 1997), and Pinus (e.g., Matos and Schaal 2000). Between-species gene flow found for many species in the present study is consistent with historic and/or present hybridisation, as well as incomplete lineage sorting. Natural inter-specific hybridisation and introgression is likely to have played a significant role in plant evolution and speciation (Mallet 2005). While gene flow has been considered to be a countervailing process to speciation, more recently hybridisation is thought to have resulted in the evolution of new and stable evolutionary lineages (Rieseberg 1997; Abbott et al. 2013). Hybridisation can be a source of adaptive variation, functional novelty, and new species (Seehausen 2004). Incomplete barriers to gene flow can result in the introgression of selectively favoured alleles from one population into another (Abbott et al. 2013). For example, in Darwin’s finches, it was estimated that the genetic variation introduced to populations through hybridisation was 2–3 times greater than that introduced by mutation (Grant and Grant 1994). Hybridisation can also act as a gene dispersal mechanism by extending a population’s gene pool (and hence ecological range) through introgression (Potts and Reid 1988). Gene flow via hybridisation and introgression could be very important in species with small, fragmented, and isolated populations (Pollock et al. 2015), such as many of the species within the green ashes, and could contribute to the high outcrossing rates measured for most populations (Table 2). We note that individuals of E. cunninghamii at the Mount Banks site had strong assignments to different ancestral populations, as did individual samples of E. langleyi from Parma Creek Firetrail (as indicated by the STRUCTURE groups, Fig. 3). Also, the TreeMix analysis identified populations of E. langleyi from Parma Creek Firetrail and E. cunninghamii from Mount Banks as likely candidates for admixture with other populations (Fig. 4b). In a previous study, evidence of hybridisation was found between E. langleyi and a co-occurring species, E. consideniana (which is also in subgenus Eucalyptus) (Rutherford et al. 2016). A detailed field study focusing on seed and pollen dispersal is needed to better understand hybridisation in the green ashes. Such a study could include progeny trials and genetic analysis of seeds from proximate and geographically isolated populations, as well as from stands of potential hybrid zones.

Speciation by geographic isolation

We found that patterns of genetic differentiation across some species could be associated with geographic factors. For example, genetic differentiation between E. langleyi and the other study species was higher than between E. stricta, E. laophila, and E. obstans (as indicated by the coefficient membership values, Supplementary Table S3). Eucalyptus langleyi is one of the most morphologically distinct species within the green ashes (with angular buds; Klaphake 2012) and is geographically restricted and isolated (occurring within a 17 × 7 km area near Nowra; Mills 2010). The nearest green ash populations examined here were more than 20 km away (E. obstans from Jervis Bay and E. stricta from Sassafras). Consequently, drift and divergence in allopatry are likely mechanisms impacting the evolution of E. langleyi.

Our findings also indicated geographic factors to be potential drivers of population genetic structure within species (apart from E. stricta and E. laophila). For example, in E. cunninghamii, the Mount Banks and Pulpit Rock populations were genetically differentiated from the others (these populations are higher in altitude and are on the other side of the ridge from the Wentworth Falls and Kedumba Valley populations, Fig. 3). A similar pattern was found along a latitudinal gradient for E. obstans, where the South Coast population was genetically more distinct than the Sydney populations (Fig. 3). These findings suggest that there can be strong associations between genetic variation and geographic factors, which is consistent with previous studies (e.g., Eckert et al. 2008), and indicate that vicariance and/or local adaptation may have had a role in the genetic differentiation and broader evolution of the green ashes. While dispersal could offer an explanation for the patterns observed here, the majority of studies suggest that eucalypts have very limited seed dispersal capabilities (Booth 2017). Earlier studies found that the bulk of seeds in most eucalypt species fall within a distance approximately equal to the height of the tree (e.g., Cremer 1966). More recently, Ruthrof et al. (2003) suggested that seed dispersal rates in many eucalypt species is 1–2 m per year over a 70-year period. While pollen in many eucalypt species may travel as far as 1 km, most of the pollen has been found to be distributed within 200 m of the parent plant (Byrne et al. 2008; Broadhurst 2013).

Implications for species boundaries within rapidly differentiating lineages

We found no genetic differentiation between E. stricta and E. laophila. These two taxa are difficult to distinguish on the basis of morphology alone (both are mallees with overlapping leaf, bud, and fruit dimensions) and are distinguished on the basis of geographic location (Hill 2002). Our results together with the current morphological evidence suggest that E. laophila has been incorrectly assigned specific rank and is, rather, likely to be an ecotype of E. stricta. We found that E. sp. Mount Banks is likely to be an undescribed species. However, morphological data will be required to investigate this hypothesis further. With the exception of E. cunninghamii and E. sp. Mount Banks, we found moderate levels of connectivity between all the studied green ash species. However, combining all taxa (excluding E. cunninghamii and E. sp. Mount Banks) into one species would ignore observed morphological differences and may underestimate the diversity of the green ashes. Our findings suggest that the green ashes fit into the genic view of speciation (Wu 2001), with taxa being at varying stages along the speciation continuum. Despite occurrences of inter-specific hybridisation, many of the species can be recognised based on morphological discontinuities (e.g., E. obstans, E. dendromorpha, E. stricta, and E. langleyi). Although there is some geographic and ecological differentiation, most taxa in the GBMWHA have overlapping ranges. It is notable that the geographically restricted E. langleyi is one of the most morphologically distinct species and this is reflected in the higher level of genetic differentiation between it and the GBMWHA taxa. Nevertheless, there is still a degree of gene flow between it and nearby populations of E. obstans. We found that populations currently recognised as E. dendromorpha did not all form one “group”. While most authorities, including Brooker (2000) and Hill (2002), regard all populations of E. dendromorpha to be the one species, Klaphake (2012) considers the tree form found in the Southern Highlands to be a taxon separate from the mallee form in the GBMWHA. Our findings supported genetic differentiation between the GBMWHA and Southern Highlands populations. Previous studies suggest that difference in growth habit in eucalypts and other species (such as Nothofagus) is likely to be due to environmental-induced variation, including soil, wind exposure, and fire regimes (e.g., Byrne et al. 2016; Barrera et al. 2000). Further investigation is required to determine whether this is the case for the mallee and tree forms of E. dendromorpha. Although E. laophila has been shown to be inadequately differentiated, presently accepted concepts of other species remain as hypotheses that can be tested further in future. A detailed morphometric analysis of the study species would complement this molecular dataset and could be used to resolve these taxonomic issues. A large-scale population genetic analysis (>10 populations) of species may provide further insights into species boundaries.

Finally, a sample from Stanwell Tops (NSW908486) should be investigated in greater detail. It was genetically distinct from the other species in all of our analyses and was morphologically distinct from the E. stricta population found at Stanwell Tops with which it co-occurred (in that it had broader leaves). However, genomic DNA from more individuals from this site will be needed to investigate this further.

Conclusions

Speciation in the presence of gene flow was historically considered to be problematic because gene flow constrains population differentiation, thereby preventing the evolution of reproductive isolation (Mayr 1963). However, increasing evidence of speciation in the presence of gene flow is emerging (e.g., Nosil 2008; Pinho and Hey 2010), and in such cases, concepts that recognise speciation with gene flow are regarded as more appropriate in understanding species divergence (Schield et al. 2015). The clade of green ash eucalypts contains lineages at a range of different points along a “continuum” of speciation. In aggregate, our findings indicate that many mechanisms (e.g., vicariance, ecological speciation, and reticulate evolution) can or have been operating in tandem within the one species complex. This interplay of mechanisms has implications for speciation in other recently radiated taxa.

Data archiving

Genotype data is available from the Dryad Digital Repository (https://doi.org/10.5061/dryad.gd8gp17).