Genomic regions associated with herbicide tolerance in a worldwide faba bean (Vicia faba L.) collection

Weeds represent one of the major constraints for faba bean crop. The identification of molecular markers associated with key genes imparting tolerance to herbicides can facilitate and fasten the efficient and effective development of herbicide tolerant cultivars. We phenotyped 140 faba bean genotypes in three open field experiments at two locations in Lebanon and Morocco against three herbicide treatments (T1 metribuzin 250 g ai/ha; T2 imazethapyr 75 g ai/ha; T3 untreated) and one in greenhouse where T1 and T3 were applied. The same set was genotyped using genotyping by sequencing (GBS) which yield 10,794 high quality single nucleotide polymorphisms (SNPs). ADMIXTURE software was used to infer the population structure which revealed two ancestral subpopulations. To identify SNPs associated with phenological and yield related traits under herbicide treatments, Single-trait (ST) and Multi-trait (MT) Genome Wide Association Studies (GWAS) were fitted using GEMMA software, showing 10 and 14 highly significant associations, respectively. Genomic sequences containing herbicide tolerance associated SNPs were aligned against the NCBI database using BLASTX tool using default parameters to annotate candidate genes underlying the causal variants. SNPs from acidic endochitinase, LRR receptor-like serine/threonine-protein kinase RCH1, probable serine/threonine-protein kinase NAK, malate dehydrogenase, photosystem I core protein PsaA and MYB-related protein P-like were significantly associated with herbicide tolerance traits.

www.nature.com/scientificreports/ Faba bean has a relatively large genome size of 13 Gb 14 . Thanks to the advances in the next generation sequencing technologies (NGS) that has enabled the generation of large volumes of sequences [15][16][17] and facilitated the discovery of single nucleotide polymorphisms (SNPs) that can be associated with key breeding traits either through biparental mapping or through genome wide association studies (GWAS) 15,18 . Unlike biparental mapping, GWAS utilizes natural populations and exploits linkage disequilibrium (LD) to detect SNP-trait associations with higher resolution 19 . However, the power of GWAS depends on the size and structure of the population used for the analysis 20 . While it is sometimes not feasible to phenotype large populations in a single field trial, multiple field trials, e.g. different treatments, locations or seasons, can be jointly analyzed in one model named as multi-variate or multi-trait GWAS which has shown to have higher power compared to the standard single-trait GWAS 21 . Such approach can assist conventional breeding by implementing marker assisted selections in early generations 18,22 . Although significant progress has been made in faba bean genomics and many genetic maps are available [23][24][25] , the marker density of most of them is still too low to enable accurate prediction of desired traits. SNPs correlated with traits of interest such as resistance to ascochyta and broomrape or vicine-convicine content [26][27][28][29] have been identified, however, no study was conducted to associate SNPs with herbicide tolerance in faba bean.
Weeds are among the difficult-to-control biotic stresses that affect faba bean 30 . When weeds are left uncontrolled, they cause severe loss on grain yield of up to 70% 31 . An integrated approach with many control measures has been recommended to provide protection against weeds [32][33][34] but with limited success. Many studies have acknowledged breeding for weed resistance by selecting for morphological characteristics that promote competition and allelopathy such as early seedling emergence, seedling growth, greater plant height, greater root volume [35][36][37] , but the resistance against most parasitic weeds is a difficult task because of its complex nature and low heritability 38,39 . Thus, recent studies have focused on developing herbicide tolerant faba bean lines 40,41 . Abou-Khater et al. 42 evaluated faba bean germplasm for traits associated with tolerance to metribuzin and imazethapyr, two herbicides commonly available that can control the majority of weeds threatening faba bean production. They found that crop phenology, plant architecture and grain yield related traits were greatly affected by the herbicide treatments. Although useful sources for herbicide tolerance were identified by the authors, such field techniques are very laborious and require multi-environmental data. Associating the herbicide tolerance related traits 42 with molecular markers to select for herbicide tolerance would facilitate the detection of useful markers that can be used to select herbicide tolerant lines in early generations. Keeping this in mind, the present study was undertaken to identify candidate loci significantly associated with tolerance to two post emergence herbicides, namely metribuzin and imazethapyr under different environments using GWAS and to identify associated SNP markers that can be used for introgressing such traits into desired agronomic background.

Results
Phenotyping. Multiple environmental models were fitted to obtain the best linear unbiased prediction (BLUP) values for each genotype and treatment across field trials. The genotypic effects for all studied traits and reduction indexes were significant across trials at a p-value < 0.001 except for the RI GCC (Table 1) indicating a wide range of genotypic variation in faba bean. Significant differences were observed among treatments for all studied traits and reduction indexes except for RI PLHT , RI GYPLT and RI NPPLT ; while significant Genotype × Treatment interactions were observed across trials for DFLR, DMAT, PLHT, GYPLT and NSPLT ( Table 1). The Genotype × Treatment × Environment interactions show that the effect of herbicide treatments on the traits and reduction indexes of the genotypes was not affected by the environment except for DFLR and NSPLT and their reduction indexes ( Table 1). As for the greenhouse experiment, the DFLR, PLHT and GCC varied significantly among genotypes and treatments and significant Genotype × Treatment interactions were observed. The reduction indexes for DFLR, PLHT and GCC varied significantly also among genotypes (Table 2). Our results showed that both herbicide treatments affected the faba bean phenology by delaying significantly the DFLR and DMAT (Tables 1, 2). In addition, the post emergence application of metribuzin and imazethapyr affected the architecture of the faba bean plants by reducing the PLHT and the GCC and increasing the NbrPLT (Tables 1, 2). Moreover, a significant reduction in the GYPLT, NPPLT and NSPLT of the genotypes treated with metribuzin or imazethapyr was observed across trials. The plant height recorded in the green house experiment at two different stages showed that the treated plants tend to recover from the herbicide effect (Tables 1, 2).
Our results also showed that the first (HDS1) and second (HDS2) herbicide damage scores per genotype varied from 1 to 5 across trials. Combined results of the herbicide damage scores (HDS1 and HDS2) showed that after one month of the herbicide application, 5 and 42% genotypes recovered from the damaged caused by metribuzin and imazethapyr treatments while damages in 56 and 10% genotypes exacerbated (Fig. 1). The herbicide damage on the remaining genotypes remained unchanged between the first and second recording dates. Figure 2 shows that the herbicide treatments affected differently the plant height and grain yield of the treated genotypes. The reduction in plant height and grain reduction varied between almost negligeable (< 10%) and high levels (> 40%).
Genotyping and population structure. The SNP calling analysis revealed 10,794 high-quality SNPs among the studied faba genotypes. The sequence variations of these SNPs were C/T (4251 SNPs), and A/G (4029 SNPs), followed by A/T (836 SNPs), G/T (761 SNPs), A/C (619 SNPs), and C/G (298 SNPs). The average CV values for the 100 replicates of the population structure started to increase directly after K = 2 indicating the presence of two ancestral subpopulations in the germplasm set used in the present study. However, we presented the results of K up to 4 because their 100 replicate runs resulted in comparable classification of genotypes into ancestral subpopulations with top > 20% of replicates having almost the exact log-likelihood values (Fig. 3).    Table S1). These associations were represented by 105 SNPs. Only one SNP (SNODE_27970_52) for DFLR was associated with three treatments (I, M and C) in TR16, while another five SNPs were associated with two scores for PLHT or DFLR (Supplementary Table S1). Of these, two SNPs (SNODE_168698_34 for PLHT with treatment I, and SNODE_23759_68 for DFLR with treatment M in TR16) were associated with a specific trait and its correspondence RI score. Another five SNPs showed association with two different traits of which SNODE_3696_16 and SNODE_77186_51 were associated with GYPPLT in TR16, treatments M and I respectively, while SNODE_22383_32 was associated with RI in TR16, treatment M, for the same traits. SNODE_239220_75 was associated with TR19_DMAT_I and TR16_NPPT_I_RI, while SNODE_7114_58 showed associations with five DFLR and DMAT across environments/treatments (Supplementary Table S1).
The MT-GWAS model for 20 traits across environments (including RI scores) resulted in 14 highly significant associations and 64 suggestive associations for all traits represented by 72 SNPs (Table 3). The largest number of associations (12) were detected of DMAT, while GYPPLT_RI, NPPT_RI and Score1 had the lowest number with only one association each. Most of the SNPs that showed associations with multiple traits/treatments in the ST-GWAS analysis were also detected in MT-GWAS analysis. Four SNPs showed associations with a specific trait with its reduction index which were SNODE_23759_68 for DFLR, SNODE_14558_21 for NSPP, SCON-TIG73439_18 for PLHT, and SNODE_103_72 for NBBR ( Table 3). The SNP SCONTIG127798_41 was associated with GCC and DFLR while the SNP SNODE_22383_32 was associated with GYPPLT (Table 3).
Gene annotation showed that SNP SCONTIG127798_41 associated with reduction index of GCC and DFLR is located within a gene annotated as acidic endo-chitinase annotation, SNODE_14298_44 associated with the reduction index of PLHT is located within a gene annotated as LRR receptor-like serine/threonine-protein kinase RCH1, SNODE_3696_16 associated with GYPLT is located within a gene annotated as Probable serine/ threonine-protein kinase NAK, SNODE_4187_38 associated with the reduction index of DFLR is located within a gene annotated as malate dehydrogenase, SNODE_559376_60 associated with the reduction index of NPPLT is located within a gene annotated as photosystem I core protein PsaA, while SNODE_7114_58 associated with DFLR is located within a gene annotated as MYB-related protein P-like (Supplementary Table S2).

Discussion
Weed menace is a serious threat to faba bean production, and the identification of herbicide-tolerant varieties is one of the most effective methods for weed control. The results obtained from the present field and greenhouse studies demonstrated how the post-emergence application of metribuzin or imazethapyr negatively affects faba bean plants. Herbicide application affected the crop phenology by delaying flowering and maturity. Although the delayed flowering helps plant escape the risk of frost in regions like Western Australia, there might be a potential yield penalty as the plants run out of moisture before it can fill its grain 43,44 . In addition, herbicide application also affected biological and grain yields of faba bean by reducing plant height, green canopy cover, and grain yield components and by increasing the number of branches. Many studies 30,42,[45][46][47][48] reported significant reduction in plant height, grain yield and yield components while studying the effect of post-emergence herbicide application on faba bean, lentil and chickpea. On the other hand, Wall 49 and Sajja et al. 50 , reported an increase in the number of branches of treated plants. The observed damage after metribuzin and imazethapyr treatments is the consequence of the growth inhibition caused by both herbicides. Metribuzin hampers photosynthesis activity by inhibiting the photosynthetic electron flow 51,52 and imazethapyr inhibits acetolactate synthase (ALS) 53 , the first common enzyme in the biosynthesis of the branched-chain amino acids 54 causing the death of meristematic cells. On the other hand, significant increase in the number of branches in herbicide treated plants could be caused by the plant recovery which occurs at the lateral meristem in dicots resulting in the development of new branches. The genotypic variation observed in the herbicide damage scores (HDS) highlights the difference in the reaction of each genotype toward post emergence herbicide application in faba bean. This observation was expected as the evaluated genotypes are genetically diverse 22 . The differences observed between the first (HDS1) and the second (HDS2) scores were due to the recovery or deterioration of the plants one month after herbicide treatment. The recovery might result from the metabolism of the herbicides into inactive compounds 55 . Therefore, Figure 3. Population structure constructed using the SNPs data for the individual ancestry estimated using the ADMIXTURE analysis. Individuals are represented in thin vertical lines separated into segments corresponding to the assumed membership in K = 2, 3 and 4 genetic groups as shown by colors.Each color represents one ancestral subpopulation. www.nature.com/scientificreports/ the observed differences in the genotype ability to recover might be due to differential rate of metabolic degradation for imazethapyr treatment 47 and to differential disruption of electron transfer for metribuzin treatment. Population structure analysis revealed two major ancestral populations for the germplasm which is compatible with the original germplasm of 995 genotypes genotyped with 20 microsatellite markers, from which this population was selected 22 22 . As expected, MT-GWAS analysis exposed higher detection power compared to ST-GWAS analysis due to the larger datapoint fitted in the model which is equivalent to increasing the population size 21 . This was revealed by the larger number of highly significant as well as suggestive association per trait detected (Table 3, Supplementary Table S1). Another advantage is the ability to detect QTL with stable effect across different environments or treatments which should have higher potential to improve the efficiency of marker assisted selection in diverse environments 56,57 .
To the best of our knowledge, the present study is the first GWAS for herbicide tolerance in faba bean and the first for all phenotyped traits under the control treatment with no herbicide application. Thus, most of the QTL detected in the present study seem novel and have not been reported before. Very limited studies used SNP data on biparental or multi-parental faba bean populations 15,26,58 but none aimed to dissect quantitative traits in natural diverse populations. Sallam and Martsch 58 associated 156 SNPs with frost tolerance in a population derived from 11 parental lines, while Ali et al. 59 used the same population to detect loci associated with freezing and drought tolerance using 175 SNPs and AFLP markers. The identification of QTL through GWAS in faba bean is complex due to the large undecoded genome and highly repetitive sequences. These issues have delayed the progress made towards the development of genomic resources and marker assisted selection in faba bean breeding programs 60 .
Identification of key genes, mechanisms and functional markers is essential to develop herbicide tolerant faba beans. The associations between some genes identified in this study and herbicide tolerance have been reported previously in different crops. Acidic endochitinase and malate dehydrogenase which were found to be associated with the reduction indexes of DFLR and GCC were among the proteins affected by the application of sulfonylurea herbicide in soybeans 61 . Sulfonylurea herbicides and imazethapyr have similar mode of action; both herbicides block the biosynthesis of the branched-chain amino acids 54,62 . The two protein kinase LRR receptor-like serine/ threonine-protein kinase RCH1 and probable serine/threonine-protein kinase NAK which were found to be associated with the reduction index of PLHT and GYPLT in the present study are generally considered key regulators of plant architecture and growth behavior, and the expansion of these proteins during plant evolution has also been correlated with the specific adaptations of the species in defense and stress responses 63 . Their direct involvement in abiotic stress resistance (drought, heat, cold, salinity) has also been demonstrated in different   67 concluded that herbicide stress is perceived similarly to other abiotic stresses and reported modification in the level of the protein kinase gene family in the multiple herbicide resistant Avena fatua. The MYB-related protein P-like which was found associated with DFLR is involved in herbicide tolerance belongs to the MYB gene family that comprises one of the richest groups of transcription factors in plants. Members of this family have a well-established role in abiotic stress responses 68,69 . Bhoite et al. 70 found also that the transcription factors MYB were significantly expressed under metribuzin stress. The photosystem I (PS I) core protein PsaA that is found in the present study to be associated with the reduction index of NPPLT is a subunit membrane protein complex involved in photosynthesis. PS I and PS II drive the light reaction of photosynthesis. The first stage of the light reaction occurs in PS II whereas the final stage of the light reaction occurs in PS I 71 .
The metribuzin applied to faba bean plants in this study inhibits PS II by disrupting electron transfer through binding to the D1 protein of the photosystem II complex in chloroplast thylakoid membranes 51 . This mode of action explains the involvement of the PS I in the reaction toward herbicide application especially that PS II comes first in the path of the electron flow followed by PS I.
The described mechanism of action of the annotated genes suggests that DFLR_RI and GCC_RI are associated with tolerance to imazethapyr while DFLR and NPPLT_RI are associated with tolerance to metribuzin, and GYPLT and PLHT_RI are associated with tolerance to both herbicides.

Conclusions
Weeds represent a major problem to faba bean crop which limits its expansion in many production regions. By excluding faba bean and other legume from the cropping system, cereal monoculture will continue to deplete the soil, lowering its quality and indirectly reducing yield and quality of the produce. Herbicide tolerant faba bean lines could be a game changer in the reintegration of faba bean in modern cropping systems as it contributes to the reduction of production cost by avoiding excessive use of manual weeding. Considering the many advantages of herbicide tolerance in faba bean, it is imperative to breed elite cultivars that features this trait. However, field selection is very costly and time consuming. The integration of genomic selection and marker assisted selection will improve selection accuracy, increase the selection intensity and shorten the breeding cycle when selecting at early generations. In the present study, we identified genomic regions associated with tolerance to imazethapyr and metribuzin herbicides as highly significant associations between SNPs markers and phenological and yield traits related to herbicide tolerance were detected using multi-trait association. These markers will be useful for improving the efficiency of faba bean programs and represent important steps towards the selections for herbicide tolerance.

Plant materials. A set of 134 faba bean genotypes comprising 118 landraces from 42 countries and 16
ICARDA breeding lines that were used to establish a reference set under the Generation Challenge Program (GCP) was used for phenotyping and genotyping in the present study. Previous assessment with Simple Sequence Repeat (SSR) markers showed that the set was genetically diverse and comprised 45 major, 17 minor, 63 equina and 9 paucijuga genotypes 22,42 . In addition to the test genotypes, a total of 6 faba bean cultivars (FLIP86-98, ILB1814, Ed-damer, Hudeiba-93, Shambat-75, SML) were included in the experiments. The seeds used in the current experiments are sourced from the reserve seeds that are multiplied each year under insect-proof cages in order to ensure purity of the evaluated accessions. www.nature.com/scientificreports/ in pots. Temperature inside the greenhouse was fixed at 24 to 28 °C the optimal day time temperature of faba bean. The herbicide treatments applied in all experiments are metribuzin (M, T1), imazethapyr (I, T2) and the control treatment (C, T3) in which no herbicide was applied. The doses of herbicides applied are the recommended doses as per the labels of metribuzin (Sencor: Bayer) and imazethapyr (Pursuit: BASF). Both herbicides were uniformly sprayed at the rate of 250 g ai ha −1 and 75 g ai ha −1 respectively at the inflorescence stage BBCH code 5072 72 for the field experiments and at the stem elongation stage BBCH code 30 72 for the greenhouse experiment using an electric sprayer with automated flow (375 L/ha). In the field, the herbicide was sprayed early in the morning to ensure a low wind speed. Details of traits scored in each trial can be found in Supplementary  Table S3. Traits were coded as the environment, followed by the trait, the treatment and "RI" if the score describe a reduction index. For the multi-trait GWAS analysis, the trait name does not have the name of the environment or the treatment.

Experiments.
Phenotyping for herbicide tolerance. Observations (Supplementary Table S3) were recorded on days to 50% flowering (DFLR) and maturity (DMAT) on plot basis for the untreated treatment, and plant height (PLHT) and grain yield per plant (GYPLT) on three plants selected randomly for all the three treatments at Marchouch 2014/2015. At Terbol station, the following additional traits were also recorded on three plants selected randomly for the three treatments: number of pods per plant (NPPLT), number of seeds per plant (NSPLT), number of branches per plant (NBrPLT) and green canopy cover (GCC). Green canopy cover expressed as the average percentage of green coverage of three plants was quantified using the Canopeo application developed by Oklahoma State University using Matlab. Under the greenhouse conditions where temperature was controlled at optimal conditions, PLHT was recorded at flowering PLHT_1 (BBCH code 60) and pod development PLHT_2 (BBCH code 70) stages 72 . The herbicide damage score (HDS) was recorded in all the four experiments using a 1-5 scale 42 (Supplementary Table S4) at flowering (HDS1) and pod development (HDS2) stages. The ratio of each quantitative trait was calculated for each plot using the following formula described by Abou-Khater et al. 42 : where RI%, the reduction index, represents the reduction or penalty in traits of herbicides treated plots compared to the control untreated plots, T́ is the average of plots treated with herbicide (metribuzin or imazethapyr); Ć is the mean of genotypes under untreated conditions. DNA extraction and genome by sequencing analysis. Genomic DNA was extracted from young leaf tissues for each tested genotype using the DNeasy 96 Plant Kit (QIAGEN, Valencia, CA, USA) Qiagen Plant DNA Preparation Kit. For the preparation of the GBS library, the two restriction enzymes, PstI and MspI, were used to generate fewer variation in the distribution of read depth and higher number of scorable SNPs. GBS libraries were prepared with 48 barcode adapters with 4-9 bp sequence 73 . The single read (100 base pairs) sequencing on an Illumina HiSeq 2500 produced approximately 4 million reads per genotype. Raw read sequences were processed using TASSEL-GBS 5.0 with the default parameters 74 . A faba bean sequence database was constructed using 223,801 genomic and transcriptomic faba bean sequences downloaded from NCBI and pulsedb databases (www. ncbi. nlm. nih. gov and https:// www. pulse db. org/ analy sis/ 136) and additional faba bean sequences constructed using the Trinity assembler from one run of the GBS files. These sequences were used as a reference to align GBS sequence tags and indexed using Bowtie2 version 2.2.4 75 Bowtie2 was used to align GBS tags to faba sequences using the-very-sensitive-local option. Resulting SNPs were filtered with 20x coverage, where SNPs with more than 15% missing data or less than 5% minor allele frequency (MAF) were removed. SNPs were named by contig base pair position.
Statistical analysis of phenotyping data. The spatial statistical model was applied for variance analysis for all quantitative data using the Automatic Spatial Analysis of Row-Column modules of Genstat 19 edition 76 . Significance of variation among genotypes and treatments was assessed in terms of P-values. The analysis of variance (ANOVA), means of genotypes, means of treatments and interactions between genotypes and treatments were estimated with standard errors using best linear unbiased prediction (BLUP) values using GenStat software. BLUPs were used to conduct all downstream analyses. Multi environment trials analysis (META) were conducted to evaluate variation among genotypes, treatments and the genotype × treatment interaction across trials for the traits recorded in more than one trial. Genotype and treatment were fitted as fixed parameters while environment (year-location) were fitted as random parameter.
Genome-wide association analysis. ADMIXTURE software 77 was used to infer population structure with the number of underlying subpopulations (K) ranges between 2 and 20. The analysis was run with 100 random replicates and 20 cross validations. The most probable K was determined at the point when the average cross validation (CV) values across the 100 replicates started to increase. Single-trait (ST) and Multi-trait (MT) GWAS was fitted using GEMMA software 21 by fitting each trait independently (for the ST analysis) or fitting all field or greenhouse records together (for the MT analysis) with the default parameters and by fitting the genomic relatedness matrix as a covariate to control for population stratification 78 . Bonferroni correction was used to determine the significant threshold at p < 0.05 but all SNPs with p < 1E−4 were presented as suggestive associations. Pairwise linkage disequilibrium (LD) between associated SNPs within each trait was estimated with the r2 statistics following Weir 79 to determine the SNPs that are associated with the same quantitative trait www.nature.com/scientificreports/ locus (QTL). Genomic sequences containing herbicide tolerance associated SNPs were aligned against the NCBI database using BLASTX tool using default parameters to annotate potential candidate genes underlying the causal variants.
Ethcial approval. The authors confirm that the study complies with local and national regulations. The seeds were collected from the genebank of the International Center for Agricultural Research in the Dry Areas (ICARDA) for research purposes according the International Treaty of Plant Genetic Resources for Food and Agriculture (ITPGRFA). For the collection of seeds, all relevant permits or permissions have been obtained.The seeds flow from ICARDA GenBank at Terbol to Morocco was made following the phytosanitary regulations of both countries and using the Standard Material Transfer Agreement (SMTA) governed by ITPGRFA. The experiments were conducted at ICARDA sites at Terbol and Marchouch in accordance to National and International regulations.

Data availability
The datasets generated and analyzed during the current study are available from the corresponding authors on request.