Fatty acid composition and genome-wide associations of a chickpea (Cicer arietinum L.) diversity panel for biofortification efforts

Chickpea is a nutritionally dense pulse crop with high levels of protein, carbohydrates, micronutrients and low levels of fats. Chickpea fatty acids are associated with a reduced risk of obesity, blood cholesterol, and cardiovascular diseases in humans. We measured four primary chickpea fatty acids; palmitic acid (PA), linoleic acid (LA), alpha-linolenic acid (ALA), and oleic acid (OA), which are crucial for human health and plant stress responses in a chickpea diversity panel with 256 accessions (Kabuli and desi types). A wide concentration range was found for PA (450.7–912.6 mg/100 g), LA (1605.7–3459.9 mg/100 g), ALA (416.4–864.5 mg/100 g), and OA (1035.5–1907.2 mg/100 g). The percent recommended daily allowances also varied for PA (3.3–6.8%), LA (21.4–46.1%), ALA (34.7–72%), and OA (4.3–7.9%). Weak correlations were found among fatty acids. Genome-wide association studies (GWAS) were conducted using genotyping-by-sequencing data. Five significant single nucleotide polymorphisms (SNPs) were identified for PA. Admixture population structure analysis revealed seven subpopulations based on ancestral diversity in this panel. This is the first reported study to characterize fatty acid profiles across a chickpea diversity panel and perform GWAS to detect associations between genetic markers and concentrations of selected fatty acids. These findings demonstrate biofortification of chickpea fatty acids is possible using conventional and genomic breeding techniques, to develop superior cultivars with better fatty acid profiles for improved human health and plant stress responses.

Fats provide significant calories and energy for human well-being and healthy living.Fatty acids have several important metabolic functions in the human body, including being stored as energy sources for use under reduced blood glucose levels, helping regulate gene expression, forming cell structures, promoting cell signaling, and helping control cholesterol levels and inflammatory responses 1 .Fatty acids are classified into saturated fatty acids (SFAs), monounsaturated fatty acids (MUFAs), and polyunsaturated fatty acids (PUFAs).Meat, seed oils, seafood, and legumes contain SFAs and MUFAs ranging from 1.5 to 52 g/100 g and 0.9 to 85 g/100 g, respectively 2 , while PUFAs vary from 0.027 to 7 g/100 g.Oleic acid is a MUFA that has anti-inflammatory properties 3 PUFAs are considered essential fatty acids (EFAs) and have two subfamilies, namely omega-6 (ω-6) and omega-3 (ω-3) fatty acids.Linoleic acid (LA; ω-6) and alpha-linolenic acid (ALA; ω-3) are important PUFAs for human health 1,4 .These essential fatty acids must be obtained through the daily diet 5 .
Pulse crops, including chickpea (Cicer arietinum L.), provide daily requirements of EFAs 6 .Chickpea originated in southeastern Turkey and is widely consumed across the world, especially in Southwest Asia 7 .Chickpea is comprised of 50-58% carbohydrates, 18-22% protein, 3.8-10% fat, and < 1% micronutrients 8 .In addition, chickpea is a rich source of prebiotic carbohydrates 9 .A human nutrition study reported that a diet rich in legumes, including chickpea, reduced weight, total cholesterol, low-density lipoprotein (LDL), and high-density originating in Asia and North America (77.7% and 21.5%, respectively); accessions from other continents only represent about 0.8% (Table 1).Concentrations of fatty acids, viz., PA, LA, ALA, and OA, were distributed normally in the chickpea germplasm panel (Fig. 1).The mean concentrations of PA, LA, ALA, and OA were 591.13 (range 450.7-912.6)mg/100 g, 2456.39 (range 1605.7-3459.9)mg/100 g, 650.99 (range 416.4-864.5)mg/100 g, and 1423.31(range 1035.5-1907.2) mg/g, respectively.The percent of the recommended dietary allowance (%RDA) provided by these accessions was higher for LA and ALA than for PA and OA (Table 2).Analysis of variance revealed significant genotypic effects only for PA and ALA at p-values < 0.01.Besides genotypic effects, all of the fatty acids varied in their replication, block, and experiment effects.The interaction effect of experiments with replication (Exp × Rep) was significant for all fatty acids.Significant interaction effects of genotype across the experiments (Exp × Geno) were only detected for PA (Table 3).Interaction effects of experiment with replication and block (Exp × Rep × Block) were significant for all fatty acids except OA.The residual mean sums of squares were high for all fatty acids.Only minor correlations were observed between concentrations of different fatty acids.PA showed a significant negative correlation with LA (r = − 0.2100; p-value < 0.001).OA showed a significant negative correlation with ALA (r = − 0.1278; p-value < 0.05).All other correlations were nonsignificant (Fig. 2).
Furthermore, prior reports for human nutrition suggest higher LA, ALA, and OA concentrations and lower PA concentrations positively impact human health 28 .In our study, the chickpea accession with the highest LA   concentration was Spanish white (3459.9mg/100 g), highest ALA concentration was W6 26,016 (864.5 mg/100 g), highest OA concentration was CA13900162C (1907.2mg/100 g), and lowest PA concentration was W6 25894 (450.7 mg/100 g).The broad range of fatty acid concentrations in the chickpea germplasm panel (Table 2) can assist in selecting genotypes with low PA concentrations and high LA, ALA, and OA concentrations.Principal component analysis (PCA) performed on the fatty acid data indicated principal components 1 (PC1) and 2 (PC2) explained 74.5% and 18.85% of the variance, respectively, or approximately 93% together.The PCA scatterplot showed indistinct clustering according to origin.Furthermore, the PCA biplot indicated PC1 primarily contained information for variation in LA, while PC2 primarily contained information for variation in OA (Supplementary Fig. 1).However, the first two PCs explained only minor variation in ALA and PA, which instead was explained by PC3 and PC4, respectively.
Population structure analysis.Population structure analysis determined seven subpopulations in the chickpea diversity panel.Subpopulation 4 (purple) had the least admixture, while subpopulation 7 (brown) had the greatest (Fig. 3a).The admixture subpopulation composition of each country of origin is shown in the map (Fig. 3b).The majority of the accessions in the chickpea germplasm panel were from India (n = 183).Consequently, most of the subpopulations were comprised either completely or partially of accessions originating from India.This is seen in subpopulations 2 (blue) and 3 (green), which exclusively include accessions of Indian origin (n = 77 combined).Similarly, of the 35 accessions in subpopulation 6 (yellow), 34 had Indian origin and one was from Iran.Sub-populations 4 (purple) and 5 (orange) comprised 23 accessions from India and seven accessions from Iran.The second highest number of accessions were from the United States (n = 52) and were found in two subpopulations: subpopulation 1 (62% of the total) and subpopulation 7 (38% of total).In subpopulation 1, almost all of the accessions originated from the United States (n = 32) except for one accession from Canada.Sub-population 7 was the most diverse ancestral group, with accessions from India (n = 49), United States (n = 20), Iran (n = 7), Peru (n = 1), and Pakistan (n = 1).Subpopulation 4 represented the smallest group, with only 14 accessions, while subpopulation 7 was the largest group, having 78 accessions.
In the genetic PCA, the first two principal components (PC1 and PC2) accounted for 15.4% and 6.69% of the total variance, respectively.The desi and kabuli types form distinct clusters (Fig. 4a).PCs also separated accessions based on their origin, especially from India and the United States (Fig. 4b).The admixture subpopulations 2, 3, 4, and 6 can be seen tightly clustered relative to other subpopulations (Fig. 4c).These admix subpopulations were comprised of accessions from India as indicated by the PCA scatterplot colored by origin (Fig. 4b).Accessions from subpopulations 1, 5, or 7 did not form distinct clusters (Fig. 4c), nor did accessions originating from Iran, Pakistan, or Peru (Fig. 4b).

Genome-wide association studies.
Of the four fatty acids examined, significantly associated SNPs were only detected for PA (Fig. 5).Five significant SNPs were found at a p-value < 0.05, having a minor allele frequency (MAF) ranging from 1.19 to 12.30% (Table 4).One significant SNP was identified on chromosome 1 with a MAF of 12.30%, while two were identified each on chromosome 2 (MAF: 3.37 and 4.76%) and chromosome 8 (MAF: 1.19 and 2.78%).Of these five significant SNPs, one SNP, SCM001765.1_7756123,on chromosome 2 was found within a gene (Supplementary Table 1).Also, one significant SNPs, SCM001756.1_7281701was found with a small positive estimate for effect sizes of 0.72.Twenty-five candidate genes were identified in linkage disequilibrium (LD) blocks for variants associated with PA (Supplementary Table 1).A significant SNP, SCM001765.1_7756123was found within a gene, cicar.CDCFrontier.Ca_18126, on chromosome 2.
In this panel, the percent mean concentrations for PA (9.79%), ALA (10.78%), and OA (23.57%) were higher than USDA estimates for PA (8.41%), ALA (1.68%), and OA (22.62%) in chickpea 42 .However, the LA mean estimate of 40.67% was lower than the USDA estimate (43.54%).Percent concentrations of LA and OA in this panel (Table 2) were 40.67 and 23.56%, respectively, which are lower than the concentrations (57.26 and 27.98%, respectively) reported in a previous study 38 .However, a comparable PA concentration (9.79%) and much higher ALA concentration (10.78%) were found in this study than previously reported (9.69% for PA; 1.59% for ALA) 38 .Another report observed higher concentrations of LA (41.25-57.25%)and OA (27.55-42.30%),and lower concentrations of PA (8.43-9.63%)and ALA (1.68-2.68%)compared to this study 40 .A recent report on conventionally grown chickpea found lower PA (9.25%), OA (24.77%,) and ALA (1.61%) concentrations, and a higher LA (59.19%) concentration than this study 39 .This wide variation in fatty acid concentrations across different studies suggests complex inheritance of these traits and the need for robust screening of these traits  www.nature.com/scientificreports/across multiple environments and years.This can help to better understand the relative magnitude of genetic, environment, and their interaction effects on fatty acid concentrations in chickpea.Chickpea fatty acid concentrations were poorly correlated, which could be attributed to a lack of genetic linkage among fatty acids or environmental influence over the genetic control of fatty acid.PA and LA were negatively correlated, which is inconsistent with a previous report in chickpea indicating a positive association 43 .However, the low correlation suggests the potential to select accessions with high overall fatty acid quality, i.e., high LA and low PA concentrations.Conversely, the significant negative correlation between OA and ALA (Fig. 2) suggests selecting a higher concentration of either fatty acid may lead to a reduced concentration of the other.This negative association between OA and ALA is consistent with prior findings 43 .A nonsignificant correlation was observed between LA and OA, and ALA, suggesting selection for higher LA concentrations will not impact OA and ALA concentrations.However, the nonsignificant correlation between LA and ALA contradicts the previous finding 44 .Correlations between fatty acids were inconsistent with the findings in soybean 45 and peanut 46,47 except for PA and LA, PA and ALA, ALA, and OA in soybean 45 .The poor correlations may be attributed to pleiotropic genetic control or loose genetic linkage among these fatty acids.The significant environmental effects can also substantially influence their genetic control and concentrations, indicating the need for further studies to confirm fatty acid correlations.In the phenotypic PCA, accessions did not form distinct clusters based on country of origin when the first two principal components were plotted (Supplementary Fig. 1).The first two components explained approximately 93% of the total variation.
Numerous GWAS have been conducted in chickpea to identify SNPs significantly associated with agronomic traits [48][49][50][51] , stress tolerance 52,53 , and nutritional traits 20,50,54,55 .However, no GWAS have been reported for fatty acids in chickpea to date.Development of genomic resources in chickpea has increased since the release of the reference genome by the International Crops Research Institute for the Semi-arid Tropics (ICRISAT), India 56 .Recently, breeding programs have begun focusing on biofortification due to the increasing demand for quality products in the markets 57 .The GWAS analysis of chickpea fatty acids found five significant SNPs associated with PA concentration (Table 4, Fig. 5).
PA, LA, ALA, and OA levels in plants control membrane changes due to biotic and abiotic stresses 11 .In response to cold stress, high concentrations of LA increase the cold tolerance of potato (Solanum tuberosum) 58 , while high ALA and OA impart tolerance to drought, salinity, and cold stress in chickpea 15 .These fatty acids ultimately regulate plant health under stress conditions by membrane structural and functional maintenance, regulating integral protein functioning and preventing ion leakage 11 .The gene cicar.CDCFrontier.Ca_20107 was associated with significant SNP SCM001771.1_13092034on chromosome 8 (Supplementary Table 1).This gene codes for a chloramphenicol acetyltransferase-like domain, which is linked to the synthesis pathways of palmitate (esters of palmitic acid) in eukaryotes 59 .This gene is also associated with mitochondrial fatty acid biosynthesis initiation in Arabidopsis thaliana 60 and Pisum sativum 61 and lipoxygenase/hydroperoxidase lyase pathways in Oryza sativa 62 .Some genes associated with significant SNPs were linked to lipid metabolism in plants.For example, the gene cicar.CDCFrontier.Ca_19124, associated with SNP SCM001769.1_7281701,was identified on chromosome 2 (Supplementary Table 1).This gene codes for a glycosyl hydrolase superfamily responsible for synthesizing sphingolipids 63 in Arabidopsis thaliana.Similarly, the gene cicar.CDCFrontier.Ca_02207, on chromosome 8, associated with SNP SCM001768.1_3705203.This gene codes for peroxidase superfamily proteins that support peroxidase activity and response to oxidative stress in Arabidopsis thaliana 64 , Pisum sativum 65 and Nicotiana tabacum 66 .Although this study only identified genomic associations for PA, future studies with increased statistical power are likely to associate additional fatty acids with genomic markers.This study demonstrates the possibility of targeting specific fatty acids for chickpea improvement using molecular markers.
Admixture analysis revealed seven ancestral subpopulations within this chickpea germplasm panel (Fig. 3a), which follows previous studies in chickpea 67 .However, some studies reported two or three subpopulations in chickpea 20,50,54,55 .These seven subpopulations represent the diverse ancestral background of chickpea.There were also completely admixed accessions indicating the blending of alleles across multiple subpopulations.Most subpopulations had higher candidacy of accessions from the same country of origin, while a few were composed of accessions from different countries (Fig. 3b).Genotypic PCA distinguished both the chickpea types, Desi and Kabuli (Fig. 4a) and also showed a close relationship of a particular type to the country of origin (Fig. 4b).This suggests the independent evolution of desi and kabuli chickpeas based on their regions of domestication as well as the geographical isolation of both types 68 .Genotypic PCA also indicated the relationships between countries of origins and admixture subpopulations (Fig. 4b and 4c).Further genomic studies targeting and dissecting the potential of chickpea for fatty acid improvement directed toward human health and plant stress response are required.

Conclusions
Chickpea fatty acids are crucial for regulating human health, including controlling blood cholesterols, obesity, cardiovascular diseases, etc.Additionally, these fatty acids help plants to cope with adverse environmental conditions including cold and drought stress.Chickpea fatty acids are important breeding targets for increasing plant stress tolerance.Therefore, exploring the genetic variation in fatty acid concentrations followed by utilization of promising accessions from the chickpea germplasm panel could benefit populations with increased risk of lifestyle diseases by providing ideal concentrations of fatty acids to improve their health.These findings may also assist breeders to increase abiotic stress resilience and develop new chickpea cultivars that are adapted to new environments and changing climates.

Methods
Experimental material.The experimental material included a chickpea germplasm panel of 256 accessions containing both Kabuli and desi types as designated by the United States Department of Agriculture, Washington, DC, United States of America (USA).The accessions in the panel were collected from different continents and countries of origin.Most accessions were from Asia (199) followed by North America (55), Europe (1), and South America (1) (Table 1).

Fatty acid extraction.
Each ground chickpea seed sample (1 g) was weighed into a glass bottle.Then 20 mL of hexane were added to each bottle to form a mixture that was sonicated at 50 °C for 30 min in a thermostatic water bath.After sonication, the supernatants were filtered under vacuum to collect hexane extracts into glass tubes.The hexane extract obtained was mixed with 10 mL of 2 M methanolic KOH and vortexed for 30 s.The samples were rested for 1 min to allow phase separation (hexane and methanol phases).The content of the hexane phase (upper phase) was diluted 100X with hexane before analysis.
The samples were analyzed using Agilent's 8860 gas chromatography system with a 5977 B GC-mass detector (MSD).An Agilent HPS-MS UI:US0347823H column with dimensions 30 m × 250 µm × 0.25 µm was used in this study.The oven, initially at constant temperature of 40 °C, was programmed to ramp up at 10 °C per min until it reached 320 °C, which was maintained for 5 min.The inlet temperature was maintained at 300 °C.The overall run time of the analysis was 34 min.Fatty acids observed in the chromatogram were identified through the NIST (National Institute of Standards and Technology) library built on Agilent's Mass hunter Workstation-Qualitative Analysis for GCMS and LCMS (version 10.1).Identified peaks of each fatty acid were quantified using Agilent's Mass hunter Workstation-Quantitative Analysis for gas chromatography/mass spectrometry (GCMS) and liquid chromatography/mass spectrometry (LCMS) (version 10.1) under automatic integration.Phenotypic data analysis.JMP Pro 16.2.0 70was used to estimate fatty acid means, ranges, and correlation coefficients; generate frequency distributions; assess distribution normality; and perform PCA.Phenotypic data for all fatty acids were evaluated for outliers using the 1.5 × Inter Quartile Range rule.Outliers were removed (n = 255) before further analysis.Means and ranges were calculated by averaging across replicates and then experiments.Fatty acid distributions were visualized using JMP software to generate box plots and frequency histograms with standard error bars (Fig. 1).Histograms were fit with normal density curves.Normality was assessed using the Shapiro-Wilk test.Pearson's correlation coefficients were estimated between fatty acids using the restricted maximum likelihood (REML) approach and were visualized on scatterplots (Fig. 2) using JMP software.Scatterplots were fit with 95% density ellipses and regression lines with 95% confidence intervals.Percent recommended dietary allowance (% RDA) was calculated for each fatty acid using average recommended dietary allowance of 13.5 g/d for PA, 7.5 g/d for LA, 1.2 g/d for ALA, and 24 g/d for OA for infants and adults (males and females) according to Centers for Disease Control and Prevention survey reports 71 with the following formula-% RDA = (x/average RDA for a particular fatty acid by Centers for Disease Control and Prevention survey reports) × 100 Here, x = low and high concentration of a particular fatty acids indicated by their range in Table 2. Effects of genotype, replication, experiment, and their interaction were evaluated by conducting analysis of variance (ANOVA) for alpha-lattice design using agricolae package 72 in R version 4.0.5.Best linear unbiased predictors (BLUPs) were estimated for each fatty acid using rstanarm package version 2.21.Genome wide association study.Each entry was grown as single plants in a greenhouse and leaf tissue from healthy seedlings were harvested and used for SNP detection.Nucleic acid extraction from leaf tissue and detection of SNPs through the Genotyping by Sequencing (GBS) approach were performed by LGC, Biosearch Technologies (https:// www.biose archt ech.com/), as previously described 51 .Processing of paired end reads (150 bp) and quality filtering was performed as previously described 51 .Filtered raw reads were aligned to the chickpea reference genome of variety CDC Frontier (Cicer arietinum v1.0) 74 using the Burrow-Wheeler aligner 75 (Li and Durbin, 2010).The Genome Analysis Toolkit 76 (GATK; https:// gatk.broad insti tute.org/) facilitated SNP calling.SNP filtering (< 20% missing data and > 5% minor allele frequency) was performed using VCFtools 77 for quality control, resulting in 15,927 high quality SNPs.After filtering, missing genotypes were imputed using Beagle version 5.4 78 .Finally, the VCF file was converted to HapMap format in Tassel software version 5.0 79 .The fatty acid GWAS were conducted using Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK) and Mixed Linear Model (MLM) in the Genome Association and Prediction Integrated Tool (GAPIT) version 3 80 package in R. Manhattan plots and QQ-plots were generated during the GAPIT analysis (Figs. 5 and 6 respectively).Linkage disequilibrium (LD) was estimated using PLINK 81 software to determine the size of LD blocks containing significant SNPs (Supplementary Table 1).Jbrowse 82 was used to identify candidate genes in local LD with significant SNPs.
Population structure analysis.The filtered and imputed VCF file was used to conduct both admixture analysis and PCA to evaluate population structure.Admixture in the chickpea germplasm panel was determined using ADMIXTURE version 1.3.0. 83.The K-value with the lowest five-fold cross validation error (K = 7) was determined to be the optimal number of ancestral populations.The analysis generated a Q-matrix, which indicates the ancestral coefficients for each accession.An admixture plot was generated using the ggplot2 84 package version 3.3.5 in R (Fig. 3a).Accessions were classified into ancestral subpopulations according to their highest ancestry coefficient (> 0.5).The rworldmap package 85 in R was used to generate pie charts depicting the admixture composition of accessions from the same country of origin (Fig. 3b).The circumferences of the pie charts are proportional to the number of accessions sharing the country of origin.PCA was conducted using GAPIT, and the first two PCs were graphed using ggplot2 84 .PCA scatterplot points correspond to accessions and are colored according to their types (Fig. 4a), country of origin (Fig. 4b) and subpopulation (Fig. 4c).

Figure 1 .
Figure 1.Histograms of chickpea accessions with mean values for chickpea fatty acids (mg/100 g of seeds).In the boxplots, the position of mean values is indicated by a rhombus (◊), the small line in the box represents the median, and possible outlier chickpea accessions are shown as points.The normal distribution of fatty acids data was tested using the Shapiro-Wilk test of normality and indicated by green normal curves fitted based on the mean values for chickpea fatty acids in accessions.Standard error bars are also shown.

Figure 3 .
Figure 3. (a) Population admixture analysis of chickpea germplasm panel (n = 252).The individual accessions in the panel are shown along the x-axis in different colors corresponding to their ancestral estimates (y-axis) for each distinct sub-population (K = 7), and (b) admixture contributions of accessions from the same country of origin are indicated in pie charts whose circumference is proportional to the number of accessions.This figure was created using the R package 'rworldmap' (Version 1.3-6).

Figure 4 .
Figure 4. (a) Principal components 1 and 2 indicating accessions as points colored according to the chickpea types, (b) principal components 1 and 2 indicating accessions as points colored according to their country of origin, and (c) principal components 1 and 2 indicating accessions as points colored according to the ancestral subpopulation.

Figure 6 .
Figure 6.QQ plots of Palmitic acid, Linoleic acid, Alpha-linoleic acid, and Oleic acid using MLM and Blink models for genome-wide association studies from GAPIT.

Table 1 .
Origin of chickpea accessions comprising the panel for fatty acid studies.The number of accessions from each origin are indicated in parentheses.

Table 3 .
Analysis of variance (ANOVA) in alpha-lattice design for chickpea fatty acids.