Cost-effective detection of genome-wide signatures for 2,4-D herbicide resistance adaptation in red clover

Herbicide resistance is a recurrent evolutionary event that has been reported across many species and for all major herbicide modes of action. The synthetic auxinic herbicide 2,4-dichlorophenoxyacetic acid (2,4-D) has been widely used since the 1940s, however the genetic variation underlying naturally evolving resistance remains largely unknown. In this study, we used populations of the forage legume crop red clover (Trifolium pratense L.) that were recurrently selected for 2,4-D resistance to detect genome-wide signatures of adaptation. Four susceptible and six derived resistant populations were sequenced using a less costly approach by combining targeted sequencing (Capture-Seq) with pooled individuals (Pool-Seq). Genomic signatures of selection were identified using: (i) pairwise allele frequency differences; (ii) genome scan for overly differentiated loci; and (iii) genome‐wide association. Fifty significant SNPs were consistently detected, most located in a single chromosome, which can be useful for marker assisted selection. Additionally, we searched for candidate genes at these genomic regions to gain insights into potential molecular mechanisms underlying 2,4-D resistance. Among the predicted functions of candidate genes, we found some related to the auxin metabolism, response to oxidative stress, and detoxification, which are also promising for further functional validation studies.


Results
SNP calling and genetic relationship among samples. In this study, six resistant and four susceptible pools of individuals from synthetic red clover populations were sequenced using a capture-seq approach ( Fig. 1 and Table 1). A total of 6,407,589 SNPs were detected at first. After applying stringent filtering criteria, we selected 11,768 SNPs that were biallelic, present across all ten pools (no missing data), uniquely mapped (read mapping quality greater than 20), with minimum and maximum depth of coverage of 40 and 400, respectively. SNPs were also well distributed throughout the seven red clover chromosomes.
The estimated covariance matrix of allele frequencies (Ω) based on read counts of 11,768 polymorphic sites was used to quantify the genetic relationship among pools (Fig. 2). The correlation plot ( Fig. 2A) and PCA (Fig. 2B) clusters reflected the expected relationship between synthetic cultivars from the recorded pedigree information (Fig. 1). frequency among pools. The pairwise comparisons showed consistent outliers when resistant and susceptible (R-S) pools were compared, which were not detected in resistant-resistant (R-R) or susceptible-susceptible (S-S) contrasts (Figs. 3 and S1). Most SNPs showing high allele frequency differences between R-S were detected at chromosome 2.
To formally detect significant signatures of selection based on the allele frequency differences among the ten pools, we used two distinct and robust Bayesian frameworks, correcting for the relationship among pools and sampling noise. First, a genomic scan for overly differentiated SNPs was performed based on the XtX measure, which is analogous to F ST , but explicitly accounts for the relationship among populations and sampling noise in pooled samples. A pseudoobserved data set (POD) was simulated to estimate the posterior predictive distribution of the XtX statistics under neutrality, providing the threshold for detecting overly differentiated SNPs among populations. The estimate of Ω on the POD neutral simulation was close to the matrix estimated on the original data set (FMD = 0.39), indicating that the POD can be used to define the significance threshold on XtX analysis. In total, 107 SNPs were found as outliers at the 0.1% POD significance threshold. Most of the significant outliers (47) were found at chromosome 2 (Fig. 4A). Second, analyses of association were conducted using the resistance/susceptible phenotype as a categorical pool-specific covariable. In total, 88 SNPs were significant at 20-dB threshold, with 59 identified at chromosome 2 (Fig. 4B). Interestingly, some overly differentiated SNPs (high XtX value) were not associated with the herbicide resistant/susceptible phenotype (small Bayes Factor), indicating the presence of other selective pressures (Fig. 4C). Considering common outliers from both approaches, we detected 50 significant SNPs (Fig. 4C), providing consistent evidence for selection at these genomic regions. Moreover, we selected the most significant variant in each chromosome for individual Sanger sequencing and SNP validation. To this end, we used an independent set of resistant and susceptible individuals from 'FL24D' and 'Southern Belle' respectively. SNPs at chromosome 1, 2, and 3 were also significant at Fisher's exact test, providing further empirical support for their presence and association (Supplementary Table S2 and Fig. S2).
Candidate genes underlying significant SNPs. To gain insights into the potential functional significance of the outlier loci detected by two distinct approaches, we retrieved the annotation of the protein-coding genes flanking the 50 SNPs in the red clover genome. Most of the significant SNPs (37) were located at chromosome 2, followed by six SNPs at chromosome 3, three at chromosome 1, two at chromosome 4, and one each at   Table 2). Twenty SNPs were located in protein coding sequences, with six of them causing missense mutations. Among the remaining SNPs, 20 were located at introns, seven at untranslated regions (UTR), and three at intergenic regions (Supplementary Table S1). Based on sequence homology of candidate genes surrounding significant SNPs, we detected several candidate genes with putative orthologs known to be directly involved in auxin homeostasis, such as regulators of auxin response (cullin-associated NEDD8-dissociated protein 1; NEDD8-conjugating enzyme Ubc12; BTB/POZ domain-containing protein NPY4; protein SHI RELATED SEQUENCE 1; transcription factor MYB44 and MYB61; receptor-like kinase TMK4, protein PIN-LIKES 7; auxin-responsive protein IAA20; auxin-responsive protein SAUR32), transport (serine/threonine-protein kinase D6PKL1; VAN3-binding protein; protein WALLS ARE THIN 1; protein SHOOT GRAVITROPISM 5), conjugation (indole-3-acetic acid-amido synthetase GH3.1; IAA-amino acid hydrolase ILR1), and catabolism (auxin peroxidases). Besides auxin-related genes, we also found genes responsive to ABA (e.g., protein EARLY-RESPONSIVE TO DEHYDRATION 7; E3 ubiquitin-protein ligase AIRP2) and ethylene (e.g., ethylene-overproduction protein 1; senescence-associated protein DIN1), and genes involved in detoxification (e.g., protein DETOXIFICATION 40) and response to ROS (e.g., protein ACTIVITY OF BC1 COMPLEX KINASE 7; aconitate hydratase 1) (Fig. 5). The detailed annotation of the SNPs and candidate genes can be found in the Supplementary Table S1.

Discussion
In this study, we investigated genome-wide signatures of selection for 2,4-D herbicide resistance in red clover by contrasting Pool-Seq data from resistant and susceptible populations. Genomic studies to find regions associated with naturally-evolved resistance to 2,4-D have been largely unexplored. Elucidating the genetic and molecular basis of natural herbicide resistance is a central challenge for either developing resistant crops, improving herbicide targets, or predicting the potential of weeds to overcome herbicide mechanisms. Furthermore, we have  www.nature.com/scientificreports www.nature.com/scientificreports/ also shown the feasibility of utilizing Capture-Seq technique, which in conjunction with the Pool-Seq approach, allowed the cost-effective identification of genetic variants. Therefore, this approach is also promising for similar genomics studies in non-model species with less resources.
The selective pressure imposed by 2,4-D treatment over multiple breeding cycles to obtain resistant cultivars has left genomic footprints of selection. In a preliminary screening, we detected allele frequency differences among pools with contrasting phenotypes. As part of breeding programs, the red clover synthetic cultivars used herein are connected by the pedigree and shared genetic relationship, as demonstrated by the covariance matrix and the PCA based on the allele frequencies. Therefore, to detect significant signatures of herbicide adaptation, we considered two other analytic methods that accounted for relatedness and also for sampling noise of Pool-Seq data. Taking these confounding factors into account, we detected 107 overly differentiated variants using the XtX statistics. Nonetheless, XtX is a covariable-free statistic that is powerful to identify SNPs subjected to a broader kind of adaptive constraint 36 . Therefore, XtX outlier loci can also be responding to a distinct selection pressure other than the herbicide. To refine the list of outlier loci, we considered a third strategy based on genome-wide association analysis using the resistant/susceptible phenotype as a population-specific covariable. From these combined approaches, we detected 50 SNPs exhibiting both strong genetic differentiation and significant association with the phenotype.
The 50 significant SNPs were located at six chromosomes, indicating that several genomic regions are putatively involved in the herbicide resistance adaptation. Resistance to herbicides with complex modes of action, such as 2,4-D, is indeed likely to be affected by many genes with minor-effects, arising gradually in the population via recombination of standing genetic variants into the same genetic background over generations 1,23,37,38 . The quantitative genetic architecture of 2,4-D resistance in red clover is also in agreement with the breeding strategy employed to obtain resistant plants, where recurrent cycles of mass selection under 2,4-D application www.nature.com/scientificreports www.nature.com/scientificreports/ www.nature.com/scientificreports www.nature.com/scientificreports/ were carried out to increase the number of favorable alleles in the population 12,30,39 . However, monogenic and dominant patterns of inheritance were inferred through segregation studies of the 2,4-D resistance phenotype in some weed species, such as wild mustard (Brassica kaber L.), prickly lettuce (Lactuca serriola L.), oriental mustard (Sisymbrium orientale L.), and wild radish (Raphanus raphanistrum L.) [40][41][42][43][44] . A single dominant resistance allele was also shown to be the causal basis for dicamba/2,4-D resistance in kochia 22 . In this sense, the consistent and higher number of significant variants detected through all approaches at chromosome 2 in red clover led us to speculate that a quantitative trait locus with major effect might exist in this region. Further studies are needed to draw this conclusion, but this result is already promising for marker-assisted selection in the red clover breeding program.
Most of the significant SNPs were located nearby or within protein-coding genes. However, the majority did not have a clear functional effect and their detection as outliers probably resulted from hitchhiking rather than a causative variation. Therefore, at this point, we cannot identify the specific loci and molecular mechanisms that directly contribute to the 2,4-D resistance adaptation. However, many interesting candidate genes are present at the vicinity of significant SNPs, providing some insights into potential mechanisms for 2,4-D resistance in red clover.
Out of the 50 significant SNPs, six were predicted to cause non-synonymous amino acid changes. Among those, two variants affected a gene likely encoding an E3 ubiquitin-protein ligase HOS1. In Arabidopsis, HOS1 mediates the proteasomal degradation of ICE1, which is a transcription factor involved in chilling and freezing tolerance 45 . Interestingly, an ICE1-homolog was upregulated by 2,4-D in resistant but not in susceptible populations of wild radish 29 , suggesting that the regulation of ICE1 may also influence 2,4-D stress tolerance. Moreover, HOS1 is also required for photoperiodic control of flowering in Arabidopsis, with distinct hos1 loss-of-function mutants displaying an early flowering phenotype 46 . Although there was no selection for early flowering in the development of the first 2,4-D resistant cultivar, 'FL24D' grew earlier in spring than any other red clover cultivar at the University of Florida 12 . It seems plausible that the early flowering time in red clover resulted from a pleiotropic effect or genetic hitchhiking of hos1 or other regulators of flowering time along with 2,4-D resistance locus.
To explore the possibility that selection targeted untyped variants in the region flanking significant SNPs, we also annotated the genes within a ±100 kb window. Based on sequence homology, we detected several candidate genes with putative orthologs known to be directly involved in auxin homeostasis, such as regulators of auxin response, transport, conjugation, and catabolism. Besides auxin-related genes, we also found genes responsive to ABA and ethylene, and genes involved in detoxification and response to ROS that may constitute NTSR mechanisms. We also highlighted genes encoding cytochrome P450 family members as they have been identified as potential mediators of rapid detoxification mechanism for different classes of herbicides [47][48][49][50] , including auxinic herbicides, such as quinclorac 51 and potentially 2,4-D 27,28 . Although the aforementioned genes seem to have a plausible role in the auxin related pathways and stress responses, more studies and functional validation experiments are needed.
In summary, by using a cost-effective approach, we were able to identify genomic regions, mainly at chromosome 2, that likely contain the gene(s) responsible for 2,4-D resistance adaptation in red clover. We believe that our findings provided a promising starting point for marker-assisted selection implementation in the red clover breeding program and for guiding the discovery of novel auxinic herbicide resistance mechanisms.

Methods
Plant material. Red clover cultivars are synthetic populations generated by open-pollination of selected parents and propagated for a limited number of generations. In this study, six resistant and four susceptible pools of individuals from synthetic red clover populations were used (Fig. 1). The red clover synthetic cultivar, 'FL24D, ' was specifically bred for 2,4-D tolerance 30 . 'FL24D' was generated after six cycles (Gen6) of phenotypic recurrent selection using a source germplasm (Gen0) of three different commercial synthetic cultivars ('Kenstar' , 'Nolins Red' , and 'Cherokee') with 2,4-D treatment as selection factor at the University of Florida. A detailed description of how the 'FL24D' cultivar was generated can be found in Quesenberry et al. 30 . Briefly, seedlings derived from the intercross of the Gen0 population were sprayed with 1.1 kg a.i. ha-1 of 2,4-D dimethylamine salt formulation, and resprayed using similar rates three weeks later. Plants with superior survival and regrowth were intercrossed. This process was repeated throughout six cycles and the resistant synthetic cultivar 'FL24D' was obtained (Fig. 1). The resistance level of 'FL24D' was compared against a susceptible cultivar 'Southern Belle' in greenhouse and field experiments under three rates of 2,4-D (1/2x = 0.53 kg ha −1 , 1x = 1.06 kg ha −1 , and 2x = 2.12 kg ha −1 ). A damage rating scale of 1-to-9 was used, where 9 meant no visible symptoms and 1 meant severe leaf and stem curling and/or plant death. For example, 'FL24D' rated 7.0 whereas 'Southern Belle' rated 1.2 at the 1x concentration in greenhouse experiment 30 .
Individuals from the 'FL24D' cultivar were used as one of the resistant pools (Gen6). Individuals from the three cultivars ('Kenstar, ' 'Nolins Red, ' and 'Cherokee') that make up the foundational germplasm were used as  www.nature.com/scientificreports www.nature.com/scientificreports/ the initially herbicide-susceptible population in our experiment, in a pool sample named Gen0. Additionally, the cultivar 'Cherokee, ' which is one of the parents from the initial germplasm with earlier spring growth, and 'Southern Belle, ' a cultivar developed from 'Cherokee' for root-knot nematode resistance 52 , were also included as 2,4-D susceptible populations.
As 'FL24D' is a synthetic cultivar, genetic and phenotypic variability exist among the individuals from that cultivar. To increase the chances of including only highly resistant individuals, another 2,4-D application was performed on the 'FL24D' population, and individuals with minor damage were selected to compose the 'FL24DElite' pool ( Fig. 1). 'FL24D' individuals were also split into early flowering 'Early24D' and late flowering 'Late24D' pools. Furthermore, 'FL24D' was also introduced into the breeding program of northern adapted red clover at the University of Kentucky 39 . The 2,4-D resistant line, 'UK, ' was developed after eight recurrent selection cycles, using 'FL24D' and the susceptible cultivar 'Kenland' as parents. Similar to the way that 'FL24DElite' was generated, the pool 'UKElite' was created from 'UK. ' The susceptible parental cultivar, 'Kenland' , was also included in the analyses. More details on how cultivars were developed can be obtained at 30,39,52 . Total genomic DNA extraction. One hundred seeds from each population were germinated in petri-dishes. Out of those, 72 germinated seeds from each population were transplanted individually into 5-cm-square trays containing an equal mixture of local fine sand and potting mix. Seedlings were grown in a greenhouse. Young trifoliate leaves were collected 28 days after seed germination. Total genomic DNA was extracted from leaves of each individual plant using the DNeasy Plant Mini Kits (Qiagen, Valencia, CA, USA). DNA quality and purity were assessed on 1% agarose gel electrophoresis and by the A260/280 ratio, using a Nanodrop 1000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA). DNA was accurately quantified using a Qubit Fluorometer (Invitrogen, Carlsbad, CA, USA) with the PicoGreen dsDNA Assay Kit (Molecular Probes, Eugene, OR, USA). DNA samples from the same population, with satisfactory quality and quantity, were mixed in an equimolar concentration to generate a DNA pool for sequencing (Table 1).
Probe design and genotyping of pooled samples. Genotyping of DNA pools using next generation sequencing (Pool-Seq) was carried out by RAPiD Genomics (Gainesville, Florida, USA) using a sequence capture approach (Capture-Seq). Briefly, 120-mers probes were designed based on publicly available expressed sequence tags (ESTs) and assembled transcripts from the closely related species white clover (Trifolium repens L.) 53 . EST sequences (15,260) and transcript sequences (71,545) were filtered to remove identical and low-quality sequences using SeqClean 54 . Filtered sequences were aligned to the Medicago truncatula L. genome 55 and to the 'Milvus B' red clover genome 56 , resulting in an average of 87.65% and 92.83% similarity, respectively. To synthesize biotinylated oligonucleotide probes for Capture-Seq genotyping, 15,885 sequences that aligned to the genome of both species were selected, avoiding mitochondrial and chloroplast DNA, enriching for exonic sequences, with GC content between 20-60%, and lacking homopolymers (less than eight nucleotides).
Ten sequencing libraries from pooled DNA samples were prepared according to Neiman et al. 57 . Sequencing was carried out in two batches. The first samples were sequenced using the Illumina NextSeq. 500 platform with 75 bp paired-end cycles. The second sequencing batch was performed using the Illumina HiSeq. 3000 with 100 bp paired-end cycles (Table 1). SNP calling and filtering. Raw reads were trimmed by quality using Trimmomatic v.0.36 58  Genetic relationship among the pooled samples. To assess the genetic relationship among each pair of pooled samples, we estimated the scaled covariance matrix of allele frequencies (Ω) using the software Baypass v.2.1 under the core model 36 . The Ω matrix was transformed into a correlation matrix, using the cov2cor() R function and a heatmap was generated using the corrplot() function from the R package corrplot. A principal component analysis (PCA) was carried out based on the Ω matrix with the dudi.pca() function of the R package ade4 62 .
Pairwise differences in allele frequency. As a preliminary screening for changes in allele frequency potentially related to the selection pressure toward herbicide resistant in red clover, we contrasted the differences in raw allele frequencies for each SNP among all ten pooled samples. Alternative and reference read counts were extracted from the variant calling file using VCFtools v.0.1.15 63 . Alternative allele frequency was estimated for each locus within the pool by dividing the alternative read count by the total read count (i.e., alternative plus reference read counts). The absolute pairwise differences in allele frequency among pools were plotted with a threshold value of 99.9 th percentile of the allele frequency difference distribution across all values.
Genome scan for adaptive differentiation. SNPs subjected to adaptive differentiation were formally inferred through the XtX differentiation measure 64  www.nature.com/scientificreports www.nature.com/scientificreports/ across populations and SNPs) 64 . The XtX genetic differentiation value for each SNP was estimated under the core model implemented in Baypass v.2.1 36 . To run the Baypass core model, we used the alternative and reference read count data for the ten pools and the haploid pool sizes as inputs, with the -d0yij option set at 8 for Pool-Seq mode and MCMC options as 25 short pilot runs (1,000 iterations each) to adjust the proposal distributions for each model parameter. Subsequently, an 100,000 burn-in period and an 100,000 updating steps were performed with a thinning interval of 40 steps. A pseudo-observed data set (POD) was simulated considering the same parameters as those estimated in the original data using the R function simulate.baypass() available in the BayPass software. The POD was further analyzed under the core model with the same parameters to estimate the posterior predictive distribution of the XtX statistics under neutrality. We compared the posterior estimates of Ω in the simulated data against the original data using FMD distance 65 to assess the precision and robustness of the simulated data. The 99.9 th percentile of this empirical distribution was used to calibrate the original XtX values, i.e., the POD analysis provided the 0.1% threshold XtX value as a decision criterion for discriminating between selection and neutrality, and detect overly differentiated SNPs 36 .
Association analysis with the 2,4-D resistance/susceptibility phenotype. In order to refine the list of outlier loci, we also performed a genome-wide association analysis using the herbicide susceptibility or resistance as a population-specific covariable (coded as a binary variable with values of −1 and 1, respectively). The read count data was analyzed under the auxiliary variable covariate (AUX) model, also implemented in Baypass v.2.1 36 . In the AUX model, for each SNP i and a given covariable k, a Bayesian (binary) auxiliary variable δ ik is attached to the regression coefficient β ik in a model that also accounts for relatedness using the Ω matrix and sampling noise. The binary auxiliary variable indicates whether a specific SNP can be regarded as associated with the covariable k (δ ik = 1) or not (δ ik = 0). Therefore, the posterior mean of δ ik can be interpreted as a posterior probability of association of the SNP i with the covariable k, from which a Bayes factor (BF) is derived, also taking multiple-testing issues into account 36 . The same MCMC parameters specified in the core model were also used for running the AUX model. The BF was further converted in deciban (dB) units using the transformation 10 log 10 (BF). Considering the Jeffreys' rule to quantify the strength of evidence 66 , we set dB = 20 as a stringent threshold for "decisive evidence. " Candidate gene mining. SNPs consistently detected across the latter two approaches, i.e., SNPs overly differentiated at XtX >1% POD significance threshold and association at the 20-dB threshold, were further considered as candidates for evaluation. The genomic position and functional effect of significant SNPs were annotated using snpEff v.4.3 67 , using the 'Milvus B' genome and gene predictions 56 . Predicted gene models were retrieved from the Ensembl Plants database 68 . To explore the possibility that selection targeted untyped variants in physical linkage with significant SNPs, we defined an ad hoc window of ±100 kb surrounding significant SNPs and annotated all genes within this interval. Gene annotations were performed using the Blast2GO tool with BLASTp searching against the non-redundant protein database 69 . Additional information about the potential role of candidate genes was recovered from SWISS-PROT curated annotations 70 . SNP validation. The most significant variant at each chromosome was selected for SNP validation in an independent set of individuals from the resistant cultivar 'FL24D' and the susceptible cultivar 'Southern Belle' . The two SNPs causing non-synonymous mutations at the putative hos1 gene at chromosome 2 were selected, plus one SNP at each chromosome 1, 3, 4, 6, and 7. Primers were designed in the region surrounding the SNPs (Supplementary Table S2). Genomic DNA from ten individuals of each cultivar were extracted and used as template in PCR amplifications. Amplifications were carried out using Kapa Hifi Hotstart DNA polymerase (Kapa Biosystems, Boston, MA, USA), with the following thermal cycling conditions: 95 °C for 3 min, 30 cycles of 98 °C (20 s), 58 °C (15 s), 72 °C (20 s), and a final extension of 72 °C for 1 min. Amplicons were visualized on 1% agarose gel prior to PCR clean-up and Sanger sequencing at Genewiz Corporation (South Plainfield, NJ, USA). Sequences were processed and aligned using CLC genomics workbench v12. Fisher's exact test was performed using the R software (http://www.r-project.org).

Data availability
Raw sequence data for each pooled sample were deposited in NCBI's sequence read archive (SRA) under accession numbers from SRR8157534 to SRR8157543. The corresponding Pool-Seq libraries are provided in Table 1.