Introduction

A key area of interest in evolutionary biology is understanding the consequences of selection for genetic diversity and the future ability of populations to adapt. One potential consequence of strong selection is a reduction in the raw material, the genetic variation, available for selection1. At the individual level, this reduction can be detected in persistent genomic signatures of selective sweeps such as those associated with human evolution (reviewed by2,3), animal e.g.4,5 and crop domestication e.g.6,7. The evolution of herbicide resistance (HR) in agricultural weeds results from strong selection pressure. The spread of these resistance alleles through populations provides a growing number of study systems for understanding the consequences of selective sweeps at the individual and population level8,9,10. For practical reasons, weed managers are interested in understanding whether or not the standing variation for other herbicide modes of action is likely to be lost from populations following this selection. This could delay the evolution of additional HR within the population resulting from the time expected for new mutations conferring this resistance to arise in the population9,11,12,13,14. However, even when selection pressure is strong, loss in genetic variation depend on multiple factors, including the genetic basis of the resistance, mating system, population size, spatial or temporal variation in selection pressure, and gene flow9.

Kochia (Bassia scoparia (L.) A.J. Scott syn. Kochia scoparia (L.) Schrad.) is native to Europe and Asia and introduced to Canada, the United States, Africa, and South America. It is an annual noted for early germination, tolerance of arid and saline conditions, and a tumble weed habit. Kochia is wind pollinated and produces a large amount of pollen. However, it is also self-compatible, and as a result, the species is likely predominately outcrossing, but with high levels of variablity15. It has North American herbarium collections dating from the 1880s (reviewed by15). Kochia was found to be the fastest spreading alien species in the western USA from 1880 to 198016. It causes significant yield losses (30–60%) in crops such as winter wheat and sugar beet15, a problem exacerbated by the evolution of multiple herbicide resistance17.

The species has evolved resistance to four herbicide modes of action18. This includes resistance to photosystem II inhibitors18, acetolactate synthase (ALS) inhibitors19,20, synthetic auxins21 and glyphosate22,23,24. Individuals with multiple herbicide resistance to all four modes of action have been detected in Kansas17, while individuals combining ALS, glyphosate (GR) and synthetic auxin resistance were documented in Alberta, Canada in 201725. Kochia populations with ALS inhibitor resistance were first detected in Canada in 198826 and the point mutations conferring this resistance have become nearly ubiquitous throughout the Prairie provinces in less than 20 years27. Glyphosate resistance was first detected in Kansas in 200722 and was widespread in the USA’s Great Plains and confirmed in all three Canadian Prairie provinces by 201328,29,30. This suggests that glyphosate resistance arose de novo in Texas after 33 years of glyphosate use31 and 11 years of intensified glyphosate use following the introduction of glyphosate resistant crops31,32. This resistance appears to have then spread through populations, rather than being a common variant in the species’ standing variation or emerging repeatedly de novo across the range. The glyphosate resistance (GR) mechanism described for kochia is increased copy number and expression of the 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) enzyme, which is inhibited by glyphosate in susceptible plants33. EPSPS copy number correlates with glyphosate resistance, with four or more copies resulting in resistance17,24. Inheritance of increased copy number follows a single locus Mendelian pattern, as the gene copies are in a tandem array on a single chromosome34. At this time, no other mechanisms of glyphosate resistance for kochia have been described.

While the spread of herbicide resistance genes, via pollen and seed, depends on the interconnectivity of populations, the selection of novel herbicide resistance from standing variation in a population depends on variation being available. In addition to selection for multiple HRs, often which have resulted from changes at a single locus (ALS, EPSPS, and photosystem II inhibitors), kochia’s history as an introduced species may limit available variation as only a subset of the variation available within a species is expected to be introduced to a new region35,36,37. Here we used double digested restriction enzyme associated markers and 26 populations from the Canadian Prairies to understand the current genetic variation in kochia populations and how selection for glyphosate resistance may have changed this variation. Specifically, we investigated three questions: (1) what is the level of gene flow and inbreeding among the kochia populations? (2) what is the current level of genetic diversity in these populations? and (3) do populations where high EPSPS copy number (EPSPSCN) has been introduced or individuals with high EPSPSCN show evidence of reduced diversity?

Results

No individual sampled from a susceptible population had increased EPSPSCN relative to ALS, while resistant populations were generally a mixture of individuals with and without increased EPSPSCN. In one population, identified as 10% resistant in the initial screening of 100 individuals29, none of the 12 individuals sampled had increased EPSPSCN (Table 1), as a result, this pair was excluded from comparisons of resistant and susceptible populations.

Table 1 Number of individuals with fewer (nS) or more (nR) than 4 copies of EPSPS relative to ALS included in estimates of: allelic richness (AR); observed heterozygosity (HO), expected heterozygosity (HE), bootstrapped estimate of inbreeding coefficient (FIS); the average proportion of polymorphic nucleotide sites within individuals by 10−3 (Pn), the average proportion of loci that showed variability within individuals (HL), population-specific estimates of genetic differentiation (FST ) (BayeScan), and geographic distance (km) between population pairs.

In total, after following the STACKS pipelines38 for SNP discovery and genotyping, 360 (94%) and 362 (95%) of individuals had sufficient coverage in the de novo and reference based pipelines, respectively. However, the sibling sets were excluded in the majority of analyses, resulting in the inclusion of 89 high EPSPS individuals and 206 low EPSPS individuals (Table 1). For the reference based pipeline, the dataset included 3248 variable (polymorphic) loci with 10.6% missing data, while the de novo pipeline had 3173 variable loci with 11.2% missing data. Overall, 1.29% of nucleotides examined were polymorphic across alleles and nucleotide diversity (π) was 0.0036. Most consensus reference loci (99.6% of 6041 polymorphic and fixed loci) and 83.9% (of the 5626 polymorphic and fixed) of the de novo loci mapped to the genome (Bowtie239). In total, 3440 loci were identified by both pipelines. Population analyses were run on all four data sets, but statistical results were very similar (Supplementary Table S2) and only the reference based pipeline’s results are presented. The minor allele frequency (MAF) in the reference-based set of loci averaged 0.19.

Population structure and gene flow

Overall population differentiation (FST) was very low at 0.01, and pair-wise FST values ranged from 0 to 0.07 (Fig. 1) and did not correlate with geographic distance (Mantel test, r2 = − 0.02 p = 0.57). AMOVAs indicated the majority of molecular variation (79.8%) was attributed to individuals, with less than 1% of the variation explained by differences among populations and no variation explained by population status (resistant or susceptible) (Table 2). A principal components analysis (PCA) explained 23.6% and 11.2% of the variance on the first and second axes, respectively, but showed no clustering by province, population or EPSPS type (Fig. 2). Different runs of find.clusters assigned the lowest Bayesian information criterion to different numbers of clusters. However, six clusters were selected with the lowest BIC for both the reference and the de novo loci sets. These groups did not correspond to population, province or EPSPS status (Supplementary Fig. S1), nor did they correspond to clusters or regions within the PCA. When the full sibling groups were included, the optimal number of groups ranged from eight to eleven, but full siblings were not assigned to the same group (Supplementary Fig. S2). The analysis produced by fineRADStructure indicated little population structure with a diffuse pattern of co-ancestry levels averaged by population. Groups of individuals with higher co-ancestry were mixed by province, population and EPSPS status (Supplementary Fig. S3). The overall FIS was calculated as 0.23, but ranged from 0 to 0.42 within populations (Table 1). The proportion of alleles shared by two individuals averaged 75% (range 64–98%), the proportion of polymorphic nucleotide sites (Pn) averaged 2.73 × 10–3 (range 1.02–4.58 × 10–3), and the average proportion of loci showing variation within individuals (HL) was 13% (range 5% to 23%).

Figure 1
figure 1

Between population heat map with higher values as more intensely coloured with Nei’s genetic distance (DST; ranged from 0.01 to 0.05) below the diagonal and FST values (ranged from 0 to 0.08) above the diagonal. Values that were not considered statistically different from 0 (bootstrap p-value < 0.05) are coded in black.

Table 2 Analysis of molecular variance: (A) among populations within population type (resistant or susceptible); and (B) among individuals with high or low EPSPS status within resistant populations.
Figure 2
figure 2

Principal components analysis (PCA) using SNPs from 3248 variable loci with first and second axes account for 23.6% and 11.2% of the variation, respectively, but showing no clustering by EPSPS:ALS ratio (size), population (colour), or province (shape) indicating little to no population structure in this species.

The unweighted pair group method with arithmetic mean dendrograms included groupings mixed by province, population and EPSPS status (Fig. 3). While some clustering of individuals with increased EPSPSCN was apparent, for example, a small cluster of high EPSPSCN individuals from Alberta, Saskatchewan and Manitoba populations (e.g. Fig. 3 at 10 o’ clock), others were scattered through the tree likely reflecting the high rates of gene flow rather than multiple independent origins.

Figure 3
figure 3

UPGMA tree based on Prevosti’s Genetic distance (see scale at top of tree) for individuals from kochia populations in Alberta, Saskatchewan and Manitoba. Population of origin is represented by the coloured blocks at the tips, while EPSPS:ALS ratio is represented by the size of the bar. Red branches belong to individuals with an EPSPS:ALS ratios of 4 or greater, which are considered resistant to glyphosate, and blue dots indicate nodes with 80 or greater bootstrap support.

The number of migrants among the populations (Nm) was calculated as 16.4 for comparison to40.

Population statistics and genetic diversity by EPSPS status

AMOVAs indicated that neither population type (Table 2A) nor individual EPSPSCN status within resistant populations (Table 2B) explained genetic variation. Neither the proportion of polymorphic nucleotide sites (Pn) nor the proportion of variable loci (HL) differed by population type or individual EPSPS status (Table 3). Nucleotide diversity estimates were 0.0034 and 0.0035 for individuals with low and high EPSPSCN, respectively. BayseScan indicated that no loci showed evidence of selection when coded by individual or population’s EPSPS status.

Table 3 Population statistics comparing low and high EPSPSCN individuals and populations: average proportion of polymorphic nucleotide sites within individuals (10−3, Pn), average proportion of variable loci within individuals (HL), and bootstrapped inbreeding coefficients (FIS) with confidence interval and p-values.

Alignment to chloroplast

Thirty-six de novo consensus loci aligned to the chloroplast, but all were fixed across individuals. This may indicate that too few regions were used to detect variation, that markers fell in invariant regions, or that there is little variation in the chloroplast. Previous attempts in our laboratory to find variability in rbcL, matK, trnL-F, psbA-H2, pshH-psbB and atp-rbcL yielded only a pair of SNPs across these populations (unpublished data), which may indicate a lack of variation.

Discussion

Here, we determined that these 26 kochia populations, sampled from across the Canadian Prairie provinces, showed high levels of gene flow. This was indicated by: (1) the very low levels of genetic differentiation (Table 1, Fig. 1); (2) individuals harbouring the majority of genetic variation (Table 2); and (3) the absence of population structure (Figs. 1, 2, 3). This estimate of genetic differentiation for kochia (FST 0.01) is lower than the moderate level41 reported for 13 North Dakota and Minnesota populations based on 45 microsatellites (GST = 0.0940). While we note that calculating the number of migrants from FST has been criticized as underlying assumptions are likely to be violated42,43, our estimate would be six times higher than that of40, which suggests a higher level of connection between these populations. The level of genetic differentiation observed here for kochia is also lower than the average genetic differentiation reported for outcrossing wind-pollinated species (GST value 0.101) based on allozymes44. Other weedy species with similarly low FST include the wind-pollinated grasses Apera spica-venti L. (GST = 0.01, 0.024 for Canadian and European populations, respectively, from allozymes45) and Alopecurus myosuroides Huds. (FST = 0.023 from AFLPs46), as well as the outcrossing and self-incompatible Rosa rugosa Thub. (FST = 0.045 from microsatellites47). Populations of GR plants with similar genetic differentiation include the outcrossing and self-incompatible Lolium perenne ssp. multiflorium (Lam.) Husnot (FST = 0.006–0.088 from microsatellites48); and some population pairs of the obligate outcrossing species Amaranthus palmeri S. Watson in the USA (e.g. FST = 0.052 for R-S pairs from Arizona from SNPs49).

We observed an overall inbreeding coefficient (FIS) of 0.23, indicating a 23% higher level of homozygosity than expected by random mating. As material used here was grown from openly pollinated seed, this could indicate that inbreeding is occurring in many populations. Given the low level of genetic differentiation between populations, these high inbreeding coefficients are unusual; plants are generally expected to have either low or high values for both FST and FIS, as self-pollination increases divergence while outcrossing reduces divergence. For example, for two species with low FST values mentioned above, R. rugosa and A. palmeri, the FIS values were estimated at 0.04347 and 0.01649, respectively. In contrast, central European populations of Amaranthus retroflexus L. had inbreeding coefficients similar to the higher values found here at 0.382, but FST was 0.27 indicating strong population differentiation (allozymes50). However, a similar relationship to that observed here was reported for L. perenne ssp. multiflorium, which had FIS estimates ranging from 0.396 to 0.517 despite low values for genetic differentiation48. The authors suggested that this could result from genetic bottlenecks caused by glyphosate selection, but noted that FIS values did not correlate with the frequency of GR plants. Similar processes may be contributing to the high FIS values estimated in this study, since both high and low ESPSCN populations have been subject to selection pressure.

Currently, it is difficult to evaluate kochia’s level of genetic diversity compared to other species, as it is unclear what levels should be expected in weedy or outcrossing plants. This challenge results from the variety of molecular markers used over the last 50 years and the variety of information presented by SNP studies. Genetic diversity (HE) in these populations averaged 0.28, lower than the previous report for kochia of 0.35 (Nei’s gene diversity51 or h in40). While this may represent a reduction in genetic diversity, these populations are further north along kochia’s invasion path and may have had lower initial genetic diversity. Alternatively, this difference may be the result of using different genetic markers. Kochia’s genetic diversity is higher than the average of 0.16 for outcrossing wind-pollinated species from allozyme studies44. However, the percentage of variable loci (variable vs. variable and fixed loci) was 53%, similar to the 51% of variable allozyme loci in outcrossing species52. It is lower than genetic diversity of 0.678–0.824 estimated for GR populations of L. perenne ssp. multiflora48, but slightly higher than estimates for 42 GR populations of the highly self-pollinating Conyza canadensis (L.) Cronq. (microsatellites; 0.21 (0–0.45)53). Unfortunately, few other studies using approaches such as ddRADseq to identify SNPs report the number of loci that were identified, but that were invariant. Our estimate is higher than that for two species of bee-pollinated perennial Rhododendron in Japan, where 23% (144 loci of 675) of loci were variable; this was the only other study we located that reported this information54. Similarly, kochia’s nucleotide diversity (all sites) was 0.0036, which is lower than estimates of 0.0047 from AFLPS for A. myosuroides46, but higher than the nucleotide diversity estimated overall for core and invasive populations of Mercurialis annua L. (0.0021; SNPs47). With increasing numbers of GBS studies examining genetic diversity, we anticipate that sufficient context to evaluate the potential for a particular weed to adapt from standing variation will soon be available14.

Whether kochia populations are more or less diverse than expected, there were no differences associated with high EPSPSCN within population or individuals. Similarly, population genetics parameters (e.g. FST, Pn, HL) did not differ between high EPSPSCN populations or individuals and their low EPSPSCN counterparts. The bootstrapped confidence intervals of the inbreeding coefficient for low and high EPSPSCN individuals did not overlap (high EPSPSCN individuals FIS = 0.29, low FIS = 0.24), but random permutation tests indicated no statistical significance (p = 0.057). As a result, if kochia populations are depauperate or rich in comparison to expectations, this is the case whether or not the individuals or populations have high EPSPSCN, indicating that we have no evidence that selection for EPSPSCN has altered genetic diversity in these populations. This study sampled at an early stage in the spread of high EPSPSCN individuals across the Prairie Provinces (i.e. prior to fixation); sampling kochia populations after fixation of GR or after the spread of additional HR genes would provide additional insights.

The high gene flow among populations suggests that kochia on the Canadian Prairies, and perhaps beyond, could be considered a single population. Estimates from tracking tumbling kochia suggest that approximately 90% of seeds are dispersed over the first kilometer55, leaving 10%, potentially 3000 seeds15, to be dispersed over greater distances. This strong dispersal likely contributes to high connectivity, though human-mediated seed movement is also likely a factor. As a result, beneficial alleles, such as those for herbicide resistance, can be expected to spread rapidly through the species’ range. Further, any selection for a suitable combination of beneficial alleles and genetic background will have all genetic material available in the species to select from when introduced into one area56,57. This is congruent with the speed at which ALS mutations spread through the Prairies27 and suggests that high EPSPSCN and auxinic resistance are likely to spread as rapidly as ALS resistance. This prediction is supported by Canadian Prairie random surveys showing a rapid increase in incidence of GR (5 to 50% of populations) and auxinic-resistant kochia (0 to 18%) in a 5-year period25. With the refinement of our understanding of kochia’s genome and the generation of a chromosome level assembly for the species, it will become possible to use this data to look at the signatures of selection near the EPSPS gene and determine whether this event is associated with a hard or soft sweep and better explore the origin of GR resistance in this species and its consequences2, 58.

A potential consequence of this high gene flow is that evolving locally adapted ecotypes would require extremely strong selection in kochia. However, the evolution of locally adapted ecotypes has been considered a key feature of successful invasions59,60. Based on this study, we expect the spread of GR will result in little change in kochia’s capacity to evolve additional herbicide resistance from standing variation. Swift, comprehensive, and ongoing action would be needed to curtail the spread of herbicide resistance genes from points of evolution in kochia populations. The species will need to be managed as a whole, as there are no smaller, individually controllable units require coordination and cooperation among producers and levels of government36. Future work expanding the geographic coverage of sampling, and investigating the genetic variation of these populations as GR and auxinic resistance spread, would provide further insights.

Materials and methods

Plant material

Plant material was from bulk-collected seed from population pairs where high EPSPSCN individuals had been detected (resistant) and from where they had not (susceptible) in relatively close geographic proximity. These populations were sampled and identified during surveys to determine the extent of GR kochia in Alberta (2011, 2012), Saskatchewan (2013) and Manitoba (2013) (Fig. 4)28,29,30. In these surveys, populations were considered resistant if they had individuals not controlled by glyphosate at 900 g ae/ha in greenhouse screens28,29,30. We extracted DNA from twelve individuals from four, seven and two pairs from Alberta, Saskatchewan and Manitoba, respectively. We also used six groups of reciprocally related progeny, sibling plants resulting from reciprocal controlled crosses between high and low EPSPSCN individuals from within populations in Alberta and Saskatchewan61. In total, this included 312 individuals from 26 populations and 72 progeny. The maps of the locations of these populations were made in QGIS Desktop 2.18.1562 with layers available and downloaded from Natural Earth (https://www.naturalearthdata.com/downloads/).

Figure 4
figure 4

Locations of the kochia populations sampled in Alberta, Saskatchewan and Manitoba, Canada with north toward the top of the figure. Populations where glyphosate resistant individuals were detected in screens by Hall et al.28 and Beckie et al.29 are filled and contain a “R” in their label while those with no resistance detected in these screens are shown as empty and include an “S” in their labels.

DNA sequencing and analysis

DNA extraction

Seed were germinated and grown in the greenhouse at Agriculture and Agri-Food Canada’s Ottawa Research and Development Centre. Young leaves were collected and DNA extracted using FastDNA kit (MP BioMedicals, USA). All material from the greenhouses and residual debris following seed cleaning were autoclaved before disposal.

Quantitative PCR

Quantitative PCR was used to determine relative EPSPSCN compared to ALS following the method described in33. Specifically, two replicates were averaged to determine the EPSPS:ALS ratio. We measured the DNA concentrations of samples using a NanoDrop ND-1000 and 8000 Spectrophotometers (Thermo Scientific, Wilmington, DE, USA), corrected their concentration to 5 ng/μL and conducted Quantitative PCR (qPCR) using an Eppendorf Mastercycler ep cycler. The specific primers for EPSPS: 5′ GGCCAAAAGGGCAATCGTGGAG 3′ and 5′CATTGCCGTTCCCGCGTT TCC 3′63, and ALS ALS890F: 5′AGCCTGTGTTGTATGTGGGA 3′ and ALS999R: 5′ AGCGCCCAAACCCATTAAAG 3′61 were used and produced products of 102 and 110 bp, respectively. BioRad strip wells containing 10 μL ABI Power Sybr Green MM (2X) (Life Technologies, Hercules, CA, USA), 0.5 μL of the appropriate forward and reverse primer (5 μM), 10 ng of gDNA, and 7 μL of dH2O were used for the qPCR reactions33. Cycle parameters were initial denaturing at 95 °C for 15 min, followed by 95 °C for 30 s and annealing and extension at 60 °C for 60 s, for a total of 40 cycles. The ALS reference gene was used to standardize the EPSPSCN using the equation R = 2−ΔCTsample−ΔCTcalibrator to produce the estimate of the ratio between EPSPS and ALS33.

Double digested restriction enzyme associated markers

Double digested restriction enzyme associated marker library preparation and sequencing were completed at the University of Georgia using a 3DRAD based protocol. The enzymes used to generate the markers were HindIII (A|AGCTT) and NdeI (C|ATATG), and the project was designed to result in 300 million paired end reads for each of four plates.

Population genetics

Distances between the populations were calculated from their GPS coordinates using the Geographic Distance Matrix Generator v1.2.364, and the population map was created in QGIS 2.18.2562 with data from Natural Earth (https://www.naturalearthdata.com).

Data were analyzed using STACKS v1.4438 and custom R (3.4.3 “Kite-Eating Tree”) scripts65. STACKS (process_radtags) was used to demultiplex and filter data. In total, 1.6 billion reads were received and 1 billion were retained for an average of 1.5 million reads per individual. Stacks parameters were determined using a subset of samples as recommended66 and both the de novo and reference pipelines were followed. The parameters used were (M = n =) 5, (m =) 3, the minimum minor allele frequency allowed was 0.05, the maximum observed heterozygosity allowed was 0.7, and a random SNP was used from each locus. For the reference based pipeline, tags were aligned to kochia’s draft genome67 using Bowtie 2 version 2.1.039. Average coverage was 20.2 × for the reference pipeline and 15.8 × for the de novo pipeline. Individuals with less than 10 × coverage were excluded. Additionally, we removed loci with more than four alleles identified within full sibling groups, loci with 3 or fewer individuals represented in a population, and individuals with fewer than 60% of the loci. To identify loci associated with the chloroplast, consensus loci sequences were aligned to an assembly of kochia’s chloroplast67.

Population genetics parameters were calculated in R65. Observed heterozygosity (HO), within-population gene diversity (HS), overall gene diversity (HT), bootstrapped estimates of the inbreeding coefficient (FIS) with confidence intervals, and levels of genetic differentiation among populations (FST) were all calculated by hierfstat68. Mantel tests were conducted with ade469. Bootstrapped values for FST, associated p-values and Nei’s genetic distance (DST) were calculated with StAMPP70. BayeScan71 (version 2.1), used with default parameters except an increased prior of 300, produced estimates of FST with upper and lower limits for each population. The R packages boa72 and coda73 were used to assess BayeScan results and model convergence. Allelic richness estimates were generated by PopGenReports74. Unweighted pair group method with arithmetic mean trees were calculated using poppr75, while AMOVAs were calculated with 1000 permutations by poppr.manova with the “ade4” method69. The proportion of shared alleles among groups and k-means clustering were estimated (testing k = 1 to 40) with adegenet76. The program fineRADStructure was used to further investigate clustering77. A custom R script processed STACKS’ haplotype files containing all SNPs for each loci in order to calculate the proportion of polymorphic nucleotide sites (Pn) and the percentage of heterozygous loci (HL). The function oneway_test from coin78 estimated the p-value for comparing low and high EPSPSCN individuals or populations using 100,000 permutations. The proportion of shared alleles was calculated by adegenet76. Following40, the number of migrants were estimated with Nm = 0.25((1 − FST)/FST)79.

The packages ape80, gdata81, pegas82, phytools83, reshape84, Hmisc85 and vcfR86 were used for data handling and manipulation, while ggplot87 and colorspace88 were used for plotting.