Genome-Wide Association Mapping of Dark Green Color Index using a Diverse Panel of Soybean Accessions

Kaler, Avjinder S.; Abdel-Haleem, Hussein; Fritschi, Felix B.; Gillman, Jason D.; Ray, Jeffery D.; Smith, James R.; Purcell, Larry C.

doi:10.1038/s41598-020-62034-7

Download PDF

Article
Open access
Published: 20 March 2020

Genome-Wide Association Mapping of Dark Green Color Index using a Diverse Panel of Soybean Accessions

Avjinder S. Kaler¹,
Hussein Abdel-Haleem²,
Felix B. Fritschi³,
Jason D. Gillman⁴,
Jeffery D. Ray⁵,
James R. Smith⁵ &
…
Larry C. Purcell¹

Scientific Reports volume 10, Article number: 5166 (2020) Cite this article

3422 Accesses
13 Citations
20 Altmetric
Metrics details

Subjects

Abstract

Nitrogen (N) plays a key role in plants because it is a major component of RuBisCO and chlorophyll. Hence, N is central to both the dark and light reactions of photosynthesis. Genotypic variation in canopy greenness provides insights into the variation of N and chlorophyll concentration, photosynthesis rates, and N₂ fixation in legumes. The objective of this study was to identify significant loci associated with the intensity of greenness of the soybean [Glycine max (L.) Merr.] canopy as determined by the Dark Green Color Index (DGCI). A panel of 200 maturity group IV accessions was phenotyped for canopy greenness using DGCI in three environments. Association mapping identified 45 SNPs that were significantly (P ≤ 0.0003) associated with DGCI in three environments, and 16 significant SNPs associated with DGCI averaged across all environments. These SNPs likely tagged 43 putative loci. Out of these 45 SNPs, eight were present in more than one environment. Among the identified loci, 21 were located in regions previously reported for N traits and ureide concentration. Putative loci that were coincident with previously reported genomic regions may be important resources for pyramiding favorable alleles for improved N and chlorophyll concentrations, photosynthesis rates, and N₂ fixation in soybean.

The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

Article Open access 15 April 2024

Genetic gains underpinning a little-known strawberry Green Revolution

Article Open access 19 March 2024

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

Article Open access 11 April 2024

Introduction

Soybean [Glycine max (L.) Merr.] is one of the most widely grown crops in the world, and the economic value is primarily derived from the high oil and protein concentrations of the seed. With a protein concentration of around 40%, soybean plants must acquire a large amount of nitrogen (N)^1,2. In the absence of inorganic N in the soil, symbiotic N₂ fixation provides N to soybean. Nitrogen fixation reduces N₂ into biologically useful ammonia (NH₃) and is carried out by Bradyrhizobium japonicum bacteria that live symbiotically in root nodules.

Nitrogen plays a key role in leaf physiology and metabolism because it is a major component of RuBisCO, Photosystems I and II, and chlorophyll; hence, N is central to both the dark and light reaction of photosynthesis³. A large amount of N is allocated to the chloroplast (approx. 75%) for synthesis of the photosynthetic apparatus⁴. Leaf N and chlorophyll concentrations are positively correlated across a large range of plant species including maize (Zea mays L.)⁵, rice (Oryza sativa L.)⁶, soybean^7,8, cotton (Gossypium hirsutum L.)⁹, and wheat (Triticum aestivum L.)¹⁰. Likewise, there are clear positive relationships between leaf N concentration and photosynthetic rate^{7,11,12,13,14,15}. On one hand, a positive correlation between leaf photosynthetic rate and chlorophyll and N concentrations indicates that greener plants would expectantly have higher photosynthesis¹². On the other hand, reduced chlorophyll concentration can be positively associated with canopy photosynthetic rates¹⁶ and leaf photosynthetic rates¹⁷. Recently, Walker et al.¹⁸ used a modelling approach to simulate canopy photosynthesis of genotypes with a range of chlorophyll concentrations, including a chlorophyll-deficient mutant, and found that while canopy photosynthesis may not increase when chlorophyll concentration is reduced, reducing chlorophyll concentration and thus leaf N should be possible while maintaining canopy photosynthetic rates. Variation in canopy greenness among genotypes may provide indirect information on the variation in chlorophyll and N concentrations, leaf photosynthetic rates, and, in legumes, N₂ fixation. Thus, it may be useful to explore genotypic variation in canopy greenness and associated genetic markers to improvement canopy photosynthesis and/or N₂ fixation.

A portable chlorophyll meter (such as a SPAD-502, Minolta Corp., Ramsey, NJ) is commonly used to determine leaf greenness and indirectly infer leaf chlorophyll concentration. An alternative method evaluates digital images. In previous research, red, green, and blue (RGB) color components have been used to infer N status of crop plants^19,20; however, Karcher and Richardson²¹ found that the intensity of red and blue may alter how green an image appears overall. As such, use of a Dark Green Color Index (DGCI) (which is derived from digital values of hue, saturation, and brightness (HSB)) avoids problems from using RGB-derived indices. The DGCI-based measurements of aerial digital images are inexpensive, need little technical expertise, are higher throughput, and allow data acquisition over a much larger area than the small sensor of a SPAD meter.

Understanding the genetic basis of canopy greenness using DGCI could be important for developing cultivars with high N concentration and N₂ fixation capability and allow increasing the frequency of favorable quantitative trait loci (QTLs) for DGCI alleles. Favorable QTLs can be identified using either genome-wide association mapping or linkage mapping (LM) methods. Major advantages of association mapping over LM include increased mapping resolution, reduced research time, and greater allele number²². Advancements in nucleotide sequencing and high-throughput genotyping technologies have facilitated the development of dense molecular-marker datasets, which are almost exclusively composed of single nucleotide polymorphism information (SNPs)²³. Genotyping diverse lines at thousands of SNPs across the genome is now routine, and permits fine-level genetic mapping through exploiting ancient recombination events²⁴. In soybean, 20,087 entries from the USDA germplasm collection (out of 22,500 active accessions, https://npgsweb.ars-grin.gov/gringlobal/taxonomydetail.aspx?id=17711; accessed 12-17-19) have been genotyped using the SoySNP50K iSelect Beadchip (accessible at https://soybase.org/snps/index.php; accessed 12-17-19). This unique soybean genetic resource is proving invaluable for assessing soybean genetic diversity and has opened the door for application of powerful genome wide association mapping methods²⁵.

To our knowledge, there has been no report of mapping canopy greenness via DGCI with either bi-parental populations through linkage mapping or association mapping in soybean. However, there are mapping studies of greenness or DGCI in other crop species (including rice⁶ and maize²⁶). In soybean, other QTL studies have mapped chlorophyll²⁷, N²⁸, and ureide concentrations (related to N₂-fixation)²⁹. Our objectives were to use genome wide association mapping to characterize variation of canopy greenness using DGCI in a panel of 200 diverse maturity group (MG) IV accessions, to explore the genetic architecture associated with DGCI, and to predict genotypes with extreme values of DGCI within each MG in the USDA soybean germplasm collection based on the presence of favorable QTLs discovered in the present research.

Materials and Methods

Field experiments

The panel of 200 MG IV soybean accessions used for this study consisted of 100 accessions, representing the most genetically diverse accessions (out of 373 accessions) used for previous mapping studies by Kaler et al.^30,31,32. An additional 100 MG IV accessions were selected from the USDA Soybean Germplasm Collection, based on the estimated breeding values for phenotypes determined from previous association mapping studies^30,31,32. These diverse accessions originated from 10 different nations including South Korea, China, Japan, North Korea, Georgia, Russia, Taiwan, India, Mexico, and Romania (Supplementary Table S1). Accessions were evaluated in three environments: the Main Arkansas Agricultural Research Center in Fayetteville, AR (36.15°N, −94.28°) (denoted as “FY”) on a Captina silt loam (Fine-silty, siliceous, active, mesic Typic Fragiudults), the Pine Tree Research Station in Colt, AR (35.12°N, −90.92°) (denoted as “PT”) on a Calloway silt loam (Fine-silty, mixed, active, thermic Aquic Fraglossudalfs), and the Rohwer Research Station in Rohwer, AR (33.80°N, −91.28°) (denoted as “RH”) on a Sharkey silty clay (Very-fine, smectitic, thermic Chromic Epiaquerts). Sowing dates were 7 June 2018 (FY and PT) and 31 May 2018 (RH). Seeds were sown at a density of 37 m⁻² at a 2.5-cm depth. At FY, plots were 4.57 m long and two rows wide with 0.76 m row spacing. At PT and RH, seeds were sown with a drill (19 cm row spacing), and plots were 1.52 m wide and 4.57 m long. At the PT and RH, the experiment was conducted as an augmented incomplete experimental design with six replications. The FY experiment was conducted with one replication.

Dark green color index (DGCI) determination

Aerial images were captured using the factory-installed camera (2.54 cm, 20 mega pixel CMOS sensor) of the DJI Phantom 4 Pro (www.dji.com/phantom-4-pro) unmanned aerial system (UAS) which was flown approximately 30.5 m above the ground. The UAS was programmed to collect images with an 80% overlap on the front and sides using Ground Station Pro software from DJI (Shenzhen, China) operating in the ‘3D Map’ mode. The shutter speed was set to ‘auto’ and was programmed to take images at equal time intervals (2 s) with the camera in the nadir position. Image resolution with these settings was approximately 0.8 cm pixel⁻¹. Measurements were made 54 (RH), 48 (PT), and 55 (FY) days after sowing when plants were in full bloom and canopies were completely closed. Flights were made between 1100 and 1400 h on days with clear skies. Images were stitched together to form an orthomosaic using Agrisoft Photoscan Professional (www.agrisoft.com). Also included in the image were boards painted with dark green or yellow circles measuring 1 m in diameter. The painted boards had known DGCI values of 0.5722 (green) and 0.0733 (yellow) and served as internal standards for DGCI determination^5,8. Orthomosaic images were analyzed using FieldAnalyzer software (https://www.turfanalyzer.com/field-analyzer), which was used to extract DGCI values for each plot. Software used the hue (H), saturation (S), and brightness (B) values from a digital image to determine the DGCI value²¹ as shown in the equation below:

$${\rm{DGCI}}\,{\rm{value}}=[({\rm{H}}-60)/60+(1-{\rm{S}})+(1-{\rm{B}})]/3$$

DGCI is a composite number on a scale from 0 to 1 with higher values related to a darker green color and lower values corresponding to a yellow color.

Statistical analysis of DGCI phenotypes

The PROC UNIVARIATE and PROC CORR procedures, (α = 0.05) of SAS version 9.4 (SAS, Institute 2013) were used for descriptive statistics and Pearson correlation analysis, respectively. We used the PROC MIXED procedure (α = 0.05) of SAS 9.4 for analysis of variance (ANOVA) using a model suggested by Bondari³³, ${y}_{ijk}=\,\mu +\,{G}_{i}+{E}_{j}+{(GE)}_{ij}+\,{B}_{k(ij)}+\,{\varepsilon }_{ijk},$ where $\mu $ is the total mean, ${G}_{i}$ is the genotypic effect of the ${i}^{th}$ genotype, ${E}_{j}$ is the effect of the ${j}^{th}$ environment, ${(GE)}_{ij}$ is the interaction effect between the ${i}^{th}$ genotype and the ${j}^{th}$ environment, ${B}_{k(ij)}$ is the effect of replication within the ${j}^{th}$ environment, and ${\varepsilon }_{ijk}$ is a random error following $N(0,\,{\sigma }_{e}^{2})$.

Broad sense heritability on an entry-mean basis was estimated using PROC VARCOMP of SAS 9.4 and the Restricted Maximum Likelihood Estimation method. For RH and PT, and across all environments, the Best Linear Unbiased Prediction (BLUP) values were estimated using the PROC MIXED procedure, and BLUP values were used in association mapping analysis. Marker-based narrow sense heritability (h²) was estimated to understand the variation and trend of predictive ability across traits³⁴ using the GAPIT R³⁵ package.

Genotyping and linkage disequilibrium

Single nucleotide polymorphism markers for all 200 accessions were obtained from Soybase (www.soybase.org), providing 42,509 SNPs^25,36. Genotypic data were cleaned to remove monomorphic markers, and markers with minor allele frequency (MAF) < 5%. Markers with a genotype missing rate >10% were also removed and remaining missing markers datasets were imputed using an LD-kNNi method, which is based on a k-nearest-neighbor-genotype method³⁷. A total of 34,680 SNPs were left for association mapping. Linkage disequilibrium (LD) between these markers was measured based on squared correlation coefficients (r²) of alleles in the TASSEL 5.0 software³⁸. A separate LD was calculated for euchromatic and heterochromatic regions. The LD decay with distance was estimated using nonlinear regression, as described by Hill and Weir³⁹. The decay rate of LD was determined as the physical distance between markers where the average r² dropped to a value of 0.25.

Genome-wide association analysis

Several statistical models are used for genome wide association mapping. A key consideration for selecting a model is how well it can effectively control false positives that arise from population structure and family relatedness. The Mixed Linear Model (MLM) has often been considered the most popular approach as it considers population structure and family relatedness^22,40. Since the first publication of MLM for genome wide association mapping²², many other MLM-based methods have been developed⁴⁰. These models fail to match the true genetic model of complex traits, which are controlled by many loci simultaneously. Because all of the MLM methods are single-locus and test one marker at a time, they are likely to increase the number of false negatives⁴¹. To overcome this problem, multi-locus models, such as FASTmrEMMAa and FASTmrMLM⁴¹, ISIS EM-BLASSO⁴², pLARmEB⁴³, pKWmEB⁴⁴, LASSO⁴⁵, and FarmCPU⁴⁶, have been developed. FarmCPU⁴⁶ uses a multi-locus, linear mixed model and iteratively uses fixed and random models with the most significant markers as covariates. This process helps avoid overfitting, reduces the number of reported significant markers and effectively controls for both false positives and false negatives. FarmCPU uses these built-in routines for controlling population structure and family relatedness and has been used successfully in previous soybean association mapping studies^30,31,32. In this study, two models, MLM and FarmCPU, were used to compare the DGCI association-mapping results averaged across all environments and to determine which model was more effective in controlling false positives and negatives. Recent research has demonstrated that Bonferroni and other correction methods are too conservative and lead to false negatives when using multi-locus mapping methods^47,48,49. Depending upon marker-based heritability⁵⁰, P-values of 0.0001⁴⁸, 0.0002⁴⁹, and 0.0003⁴⁷ have been used as appropriate cutoffs in multi-locus association mapping. To consider a SNP significantly associated with DGCI, a threshold value of −Log10 P ≥ 3.5 (equivalent to a P-value ≤ 0.0003), was used as in previous studies^30,31,32,51 and based on the formula developed by Kaler and Purcell⁵⁰. To identify the common significant SNPs present in more than one environment, a threshold value of P ≤ 0.05 was allowed but only if the representative SNP had an association of P ≤ 0.0003 in at least one additional environment. Using the GAPIT package, we estimated marker based narrow sense heritability using an MLM model as described previously⁵⁰.

Candidate gene identification and true breeding value determination

Significant SNPs were used to identify candidate genes for DGCI. Genes located within the same LD block that were near SNPs associated with DGCI were considered as potential causative candidate genes. The gene ontologies (GO) associated with candidate genes in the G. max genome assembly version Glyma. Wm82.a1.v1.1 and with NCBI RefSeq gene models were obtained from SoyBase (www.soybase.org), and three major GO categories (biological process, cellular component, and molecular function) were assessed. Genes were further classified to be associated with photosynthesis, N metabolic processes and leaf development including aging.

Allelic effect, favorable alleles, and breeding values estimation

We extrapolated DGCI breeding values for the entire soybean germplasm collection based on calculation of true breeding values as described by Kaler et al.^30,32, which were calculated using the allelic effects and favorable alleles estimated from results of our association-mapping. The difference in mean DGCI between genotypes with the major allele and those with the minor allele was taken as the allelic effect. Alleles were considered as favorable if they were associated with an increase in DGCI, regardless if they were drawn from major or minor allelic classes. SNP effects were expressed as a positive value if the allelic effect increased DGCI. Otherwise, if the allelic effect decreased DGCI then it was expressed as a negative value. All positives and negatives allelic values were summed to estimate the true breeding value of each accession. Based on true breeding values, extreme genotypes were identified from the entire genotyped USDA Soybean Germplasm Collections as having predicted very high or low DGCI values within each MG. the presence of multiple favorable QTLs is associated with a high true breeding value whereas the presence of multiple unfavorable QTLs would be associated with a low true breeding value.

Results

Phenotype descriptions

We observed a broad range of DGCI values within a single environment and when averaged across all environments (Table 1). Visually, there were large differences in the intensity of greenness among accessions (Fig. 1). DGCI had a range of 0.41 (PT), 0.28 (FY), 0.31 (RH), and 0.26 (AVG) (Table 1). The Shapiro–Wilk test of normality was performed, which indicated that DGCI data were normally distributed within each environment and when averaged across all environments (P > 0.01, data not shown); skewness and kurtosis also indicated a normal distribution (Table 1). Analysis of variance of DGCI indicated that there were significant effects for genotype, environment, and genotype by environment interactions (P < 0.05). There were significant positive correlations (P < 0.001) for DGCI between all environments ranging from r = 0.46 between PT and FY to r = 0.59 between RH and FY (data not shown).

Table 1 Broad sense heritability (H), marker-based narrow sense heritability (h²), and descriptive statistics of the dark green color index (DGCI) over 200 MG IV Plant Introductions from experiments conducted at Fayetteville, AR (FY), Pine Tree, AR (PT), Rohwer, AR (RH), and averaged across all environments.

Full size table

Broad sense heritability indicates the proportion of phenotypic variation that is explained by genetic effects as a combination of additive effects, dominant/recessive effects, and epistasis. However, marker based narrow sense heritability indicates the proportion of phenotypic variation that is explained by additive genetic effects, and, therefore, is important in plant breeding because the response to selection depends on additive genetic variance. Broad sense heritability for DGCI was moderate to high, ranging from 57% (RH) to 59% (PT) (Table 1). Averaged across all environment, broad sense heritability was 75%. Marker based narrow sense heritability was 44% (PT), 54% (FY), 17% (RH), and 37% when averaged across all environments.

Genotype data and linkage disequilibrium estimation

A total of 34,680 SNP markers were used for association mapping. These SNPs were more dense in euchromatic regions (an average of 78% of all markers) than heterochromatic regions (an average of 22% of all markers). The SNP distribution in the euchromatic region ranged from 45 SNPs per Mb (Gm19) to 68 SNPs per Mb (Gm09). In the heterochromatic region, SNP distribution ranged from 5 SNPs per Mb (Gm20) to 38 SNPs per Mb (Gm18). LD decayed to r² = 0.25 averaged across all chromosomes at 175 kb in the euchromatic region as compared to 5,100 kb in the heterochromatic region. These results were consistent with previous LD decay rates reported for soybean^{28,30,52,53,54}.

Genome-wide association analysis

Average DGCI values across all environments were used to compare the FarmCPU and MLM models (Fig. 2). In the FarmCPU model, the Q-Q plot resulted in a sharp deviation from the expected P-value distribution in the tail area, indicating that false positives and negatives were adequately controlled⁵⁰. In contrast, the Q-Q plot for the MLM model did not show a sharp deviation from the expected P-value distribution in the tail area (Fig. 2). These results are in agreement with previous results⁵⁰, which collectively demonstrate that the FarmCPU provides better control of type I and type II errors than the MLM model. Therefore, for subsequent association mapping, we only report results for FarmCPU.

Association mapping for DGCI identified 45 significant SNPs in at least one of three environments at a significance level of −Log10 (P) ≥ 3.5; P ≤ 0.0003 (Fig. 3, Supplementary Fig. S1, and Table 2). Eight out of the 45 SNPs were present in more than one environment. Association mapping identified 16 significant SNPs associated with an averaged DGCI across all environments at a significance level of −Log10 (P) ≥ 3.5; P ≤ 0.0003 (Fig. 3 and Table 2). Significant SNPs, which were closely spaced and present within the same LD block, were considered as one locus, and out of the 45 significant SNPs from three environments and 16 significant SNPs from the averaged DGCI across all environments, there were 43 putative loci (Table 2, Fig. 3).

Table 2 List of significant SNPs associated with dark green color index (DGCI) in three environments, Pine Tree (PT), Rohwer (RH), and Fayetteville (FY), and averaged across all environments (AVG) using the FarmCPU model with the threshold P value of (−Log10 (P) ≥ 3.5; P ≤ 0.0003).

Full size table

The allelic effect for the 45 significant loci from three environments and 16 significant loci for an average DGCI across all environments ranged from −0.045 to 0.109 and from −0.037 to 0.061, respectively (Table 2). Eight out of the 45 SNPs, which were present in more than one environment, had allelic effect in the same direction. The percentage change in DGCI value due to the allelic effect was calculated by dividing the absolute value of the allelic effect with the phenotypic range and then multiplying by 100. The percentage change in DGCI associated with a specific allelic effect ranged from 0.2% to 26.6% for three environments and from 0.4% to 23.5% for the average DGCI across all environments. There were 27 SNPs from three environments and 11 SNPs based on the average DGCI across all environments that had a 5% or greater change due to allelic effect.

Allelic effects of all significant loci were used to calculate the true breeding values for DGCI of the entire USDA soybean germplasm collection. Table 3 lists the two accessions from each MG that have the highest and lowest true breeding values for DGCI. These likely represent new genetic sources for improving canopy photosynthesis by optimizing canopy-level light interception in association with leaf N distribution within the canopy. To potentially improve DGCI and N status, a breeding strategy could utilize the information on the favorable alleles with the largest allelic effects (Table 2) with SNP data for specific accessions (https://soybase.org/snps/index.php) to introgress those favorable alleles into elite backgrounds.

Table 3 The top two accessions for dark green color index (DGCI) within each maturity group (MG) that have the highest and lowest true breeding values (TBVs), which were summation of all positives and negatives allelic values present in the accession.

Full size table

Candidate gene identification

Genes were considered as potential candidates when they were present within ±175 kb of a significant SNP in euchromatic regions or within ±5,100 kb in heterochromatic regions. These distances represent the distance at which LD decayed to an r² = 0.25 in the euchromatic and heterochromatic regions. There were 58 candidate genes associated with DGCI, and these genes are annotated for their gene ontologies (biological process, molecular function, and cellular components) in Supplementary Table S2⁵⁴. Among the interesting annotated biological functions associated with DGCI, there were eight genes annotated for nitrate transport, six genes annotated for chlorophyll, six genes annotated for photosynthesis, six genes annotated for purine transport, six genes annotated for leaf aging and development, three genes annotated for N metabolic processes, and three genes annotated for ammonium metabolism (Supplementary Fig. S2).

Discussion

The phenotypic variation of canopy greenness using aerial DGCI measurements was determined in a panel of 200 MG IV soybean accessions in three environments. The DGCI varied widely among genotypes, which is important for successful association mapping^24,55. Significant positive correlations for DGCI between environments and a moderate to high broad sense heritability indicated that DGCI was a relatively stable trait across environments. Marker based narrow sense heritability estimates were moderate to low, which would be expected for a trait, such as DGCI, that is controlled by multiple genes (as indicated in this study) and affected by environment. Low narrow sense heritability estimates indicate that selection for phenotypes in traditional breeding programs would be optimally carried out on pure-lined material and with testing in multiple replications and environments. However, the putative markers identified in this study for DGCI may allow for more rapid progress in breeding than would be expected from traditional approaches.

Similar to the previous studies by Kaler et al.^30,31, the distribution of SNP markers for these 200 accessions varied across genomic regions having fewer gaps in euchromatic regions than in heterochromatic regions. The extent of LD decay in euchromatic and heterochromatic regions was used in this study for gene identification, as was used previously⁵⁶ whereby genes within the same LD block as a QTL were considered as potential candidate genes.

Of the 45 SNPs significantly associated with DGCI in three environments (Fig. 3 and Table 2), 30 major alleles were linked with an increase in DGCI value (Table 2). One locus on Gm15 that had the largest positive allelic effect (0.109) was close to Glyma15g40911, which encodes a protein for 2-oxoglutarate and Fe (II)-dependent oxygenase that has a biological function associated with nitrate transport (Supplementary Table S2). Another locus on Gm05 that had the second largest positive allelic effect (0.071) was present close to a gene, Glyma05g27840, which codes for a urease annotated as involved with N compound metabolic processes (Supplementary Table S2). A total of 15 minor allele loci identified were associated with an increase in DGCI (Table 2). Of those, one locus on Gm20, with the largest negative allelic effect (−0.045), was present within the coding region of Glyma20g29850, which codes an oxalate-CoA ligase annotated as involved with nitrate transport (Supplementary Table S2).

Of the16 SNPs significantly associated with DGCI averaged across all environments, 12 major alleles and four minor alleles were associated with increased DGCI. A major allele on Gm07 that had the largest positive allelic effect (0.061) was located close to Glyma07g32010, which codes a MAC/Perforin domain-containing protein with a biological function involved with ammonium transport (Supplementary Table S2). A minor allele on Gm20 that had the largest negative allelic effect (−0.037) was located close to a gene Glyma20g23750, which codes a transmembrane transporter annotated as involved in purine nucleobase transport (Supplementary Table S2). Based on the biological functions of these genes, these identified genomic regions and genes are likely determinants of canopy greenness in soybean, and the associated accessions identified in this study with high DGCI may be important resources for incorporating these favorable alleles into new soybean cultivars.

This is the first study identifying QTLs for canopy greenness or DGCI in soybean and complements association mapping studies of chlorophyll traits⁵⁷, N traits²⁸, and ureide concentration²⁹ in soybean. Loci identified as associated with DGCI in this study were compared with previously reported genomic regions associated with N traits and ureide concentration. We found 21 chromosomal regions that coincide with previously reported genomic regions on Gm01 (1), Gm02 (1), Gm03 (1), Gm05 (1), Gm07 (2), Gm09 (1), Gm10 (2), Gm11 (1), Gm12 (2), Gm13 (1), Gm14 (2), Gm15 (1), Gm16 (1), Gm18 (1), Gm19 (2), and Gm20 (1) (Fig. 3). Interestingly, locus 33 on Gm15 (Table 2), which had the largest allelic effect (0.109) and percent change in DGCI value (26.6%) due to allelic effect, also was associated with chlorophyll a/b ratio⁵⁷ and was coincident with genomic regions identified for N traits²⁸ and ureide concentration²⁹. These genomic regions had genes with annotated biological functions associated with nitrate (loci 1, 3, 10, 24, 27, 34, 36, 43) or ammonium transport (locus 11), photosystems (loci 9, 12, 21, 37, 38, 40) or response to light (loci 6, 13, 22, 28, 35, 36), leaf senescence (loci 5, 10, 20, 23), chlorophyll biosynthetic processes (loci 27, 30, 33, 36, 39), stomatal complex morphogenesis (loci 32, 41), and purine transport (loci 17, 21, 28, 30, 42) (Supplementary Table S2). These coincident genomic regions for DGCI, ureide concentrations, and N traits may indicate the stability and importance of these loci for canopy chlorophyll and N characteristics. These regions of the genome warrant further investigation, particularly as related to optimizing canopy-level light interception and leaf N distribution to enhance canopy photosynthesis and N use efficiency.

All of our aerial DGCI measurements were collected at full bloom. We have not made comparative measurements of DGCI among genotypes in earlier vegetative stages, but this could potentially provide important information regarding early-season nitrogen acquisition through either nitrogen fixation (on soils with low organic matter and mineralized N) or nitrogen fixation (in soils with low amounts of available N). During seedfill, aerial DGCI measurements in soybean decline⁸. The decrease in DGCI values is accelerated in response to drought. Utilization of aerial DGCI measurements may provide a high throughput method of identifying soybean maturity and of characterizing a shortening of the seed fill period in response to drought⁸.

Conclusions

This was the first study to map soybean canopy greenness using aerial DGCI measurements. Moderate to high broad sense heritability indicated that DGCI was a relatively stable trait across environments and can be used in soybean breeding programs. We found 45 significant SNPs associated with DGCI in three environments and 16 significant SNPs associated with DGCI averaged across environments. These SNPs likely tagged 43 putative loci. We confirmed 21 chromosomal regions associated with DGCI that were coincident with previously reported genomic regions for chlorophyll a/b ratio, N traits, and ureide concentration. We found 58 candidate genes and 38 of these genes had biological functions associated with nitrate transport, chlorophyll, photosynthesis, purine transport, leaf aging and development, N metabolic process, and ammonium transport. Significant loci that were coincident with previously reported genomic regions, and significant loci that were present in more than one environment, may be an important resource for pyramiding favorable alleles to improve N concentration, leaf and/or canopy photosynthesis rates, and N₂ fixation ability in soybean breeding programs.

References

Sinclair, T. R. & De Witt, C. T. Analysis of carbon and nitrogen limitations to soybean yield. Agron J. 68, 319–324 (1976).
Article CAS Google Scholar
Mastrodomenico, A. & Purcell, L. C. Soybean nitrogen fixation and nitrogen remobilization during reproductive development. Crop Sci. 52, 1281–1289 (2012).
Article CAS Google Scholar
Tracy, P. W., Hefner, S. G., Wood, C. W. & Edmisten, K. L. Theory behind the use of instantaneous leaf chlorophyll measurements for determining mid-season cotton nitrogen recommendations. In: Herber, D. J. and Richter, D. A. (ed.) Proc Beltwide Cotton Conf, National Cotton Council of America, Memphis, TN. 1099–1100 (1992).
Hák, R., Rinderle-Zimmer, U., Lichtenthaler, H. K. & Nátr, L. Chlorophyll a fluorescence signatures of nitrogen-deficient barley leaves. Photosynthetica. 28, 151–159 (1993).
Google Scholar
Rorie, R. L. et al. Association of “Greenness” in corn with yield and leaf nitrogen concentration. Agron J. 103(2), 529–535 (2011).
Article Google Scholar
Bing, Y., Xue, W. Y., Luo, L. J. & Xing, Y. Z. QTL analysis for flag leaf characteristics and their relationships with yield and yield traits in rice. Acta Genetica Sinica. 33(9), 824–32 (2006).
Article Google Scholar
Lugg, D. G. & Sinclair, T. R. Seasonal changes in photosynthesis of field-grown soybean leaflets 2 Relation to nitrogen content. Photosynthetica. 15, 138–144 (1981).
CAS Google Scholar
Bai, H. & Purcell, L. C. Evaluation of soybean greenness from ground and aerial platforms in response to drought. Crop Sci. https://doi.org/10.2135/cropsci2019.03.0159 (2019).
Fridgen, J. L. & Varco, J. J. Dependency of cotton leaf nitrogen, chlorophyll, and reflectance on nitrogen and potassium availability. Agron. J. 96, 63–69 (2004).
Article Google Scholar
Reeves, D. W., Mask, P. L., Wood, C. W. & Delaney, D. P. Determination of wheat nitrogen status with a hand‐held chlorophyll meter: Influence of management practices. J Plant Nutr. 16(5), 781–796 (1993).
Article Google Scholar
Boote, K. J., Gallaher, R. N., Robertson, W. K., Hinson, K. & Hammond, L. C. Effect of foliar fertilization on photosynthesis, leaf nutrition, and yield of soybean. Agron J. 70, 787–791 (1978).
Article CAS Google Scholar
Hesketh, J. D., Ogren, W. L., Hageman, E. M. & Peters, D. B. Correlations among leaf CO₂-exchange rates, areas and enzyme activities among soybean cultivars. Photosynth Res. 2(1), 21–30 (1981).
Article CAS PubMed Google Scholar
Boon-Long, P., Egli, D. B. & Leggett, J. E. Leaf N and photosynthesis during reproductive growth in soybeans. Crop Sci. 23, 617–620 (1983).
Article Google Scholar
Buttery, B. R. & Buzzell, R. I. Soybean leaf nitrogen in relation to photosynthetic rate and yield. Can J Plant Sci. 68, 793–795 (1988).
Article Google Scholar
Evans, J. R. Photosynthesis and nitrogen relationships in leaves of C3 plants. Oecologia. 78, 9–19 (1989).
Article ADS PubMed Google Scholar
Pettigrew, W. T., Hesketh, J. D., Peters, D. B. & Woolley, J. T. Characterization of canopy photosynthesis of chlorophyll-deficient soybean isolines. Crop Sci. 29, 1025–1029 (1989).
Article Google Scholar
Slattery, R. A., VanLoocke, A., Bernacchi, C. J., Zhu, X. G. & Ort, D. R. Photosynthesis, light use efficiency, and yield of reduced-chlorophyll soybean mutants in field conditions. Front Plant Sci. 8, 549 (2017).
Article PubMed PubMed Central Google Scholar
Walker, B. J. et al. Chlorophyll can be reduced in crop canopies with little penalty to photosynthesis. Plant Phys. 176, 1215–1232 (2018).
Article CAS Google Scholar
Kawashima, S. & Nakatani, M. An algorithm for estimating chlorophyll content in leaves using a video camera. Ann Bot. 81, 49–54 (1998).
Article Google Scholar
Pagola, M. et al. New method to assess barley nitrogen nutrition status based on image color analysis, comparison with SPAD-502. Comput Electron Agric. 65, 213–218 (2009).
Article Google Scholar
Karcher, D. E. & Richardson, M. D. Quantifying turfgrass color using digital image analysis. Crop Sci. 43, 943–951 (2003).
Article Google Scholar
Zhang, Y. et al. Mapping quantitative trait loci using naturally occurring genetic variance among commercial inbred lines of maize (Zea mays L.). Genetics 169, 2267–2275 (2005).
Article CAS PubMed PubMed Central Google Scholar
Syvänen, A. C. Toward genome-wide SNP genotyping. Nat Genet. 37, S5–10 (2005).
Article CAS PubMed Google Scholar
Zhu, C., Gore, M. A., Buckler, E. S. & Yu, J. Status and prospects of association mapping in plants. Plant Genome. 1, 5–20 (2008).
Article CAS Google Scholar
Song, Q. et al. Development and evaluation of SoySNP50K, a high-density genotyping array for soybean. PLoS ONE. 8(1), e54985 (2013).
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar
Messmer, R., Fracheboud, Y., Bänziger, M., Stamp, P. & Ribaut, J. M. Drought stress and tropical maize: QTL for leaf greenness, plant senescence, and root capacitance. Field Crop Res. 124, 93–103 (2011).
Article Google Scholar
Li, G., Li, H., Cheng, L. & Zhang, Y. QTL analysis for dynamic expression of chlorophyll content in soybean. Acta Ag Sin 2010. 36(2), 242–248 (2010).
CAS Google Scholar
Dhanapal, A. P. et al. Genome-wide association analysis of diverse soybean genotypes reveals novel markers for nitrogen traits. Plant Genome. 8(3), https://doi.org/10.3835/plantgenome2014.11.0086 (2015).
Ray, J. D. et al. Genome-wide association study of ureide concentration in diverse maturity group IV soybean [Glycine max (L) Merr] accessions. G3. 5(11), 2391–2403 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kaler, A. S., Ray, J. D., King, C. A., Schapaugh, W. T. & Purcell, L. C. Genome-wide association mapping of canopy wilting in diverse soybean genotypes. Theor Appl Genet. 130, 2203–221 (2017).
Article CAS PubMed Google Scholar
Kaler, A. S. et al. Genome-wide association mapping of carbon isotope and oxygen isotope ratios in diverse soybean genotypes. Crop Sci. 57, 3085–3100 (2017).
Article CAS Google Scholar
Kaler, A. S. et al. Association mapping identifies loci for canopy temperature under drought in diverse soybean genotypes. Euphytica. 214, 135 (2018).
Article Google Scholar
Bondari, K. Statistical analysis of genotype × environment interaction in agricultural research. In: Paper SD15, SESUG: The Proceedings of the SouthEast SAS Users Group, St Pete Beach (2003).
Kruijer, W. et al. Marker-based estimation of heritability in immortal populations. Genetics. 199, 379–398 (2015).
Article PubMed Google Scholar
Lipka, A. E. et al. GAPIT: genome association and prediction integrated tool. Bioinformatics. 28, 2397–2399 (2012).
Article CAS PubMed Google Scholar
Song, Q. et al. Fingerprinting soybean germplasm and its utility in genomic research. G3. 50(10), 1999–2006 (2015).
Article Google Scholar
Money, D. et al. LinkImpute: Fast and accurate genotype imputation for non-model organisms. G3. 5(11), 23383–23390 (2015).
Article Google Scholar
Bradbury, P. J. et al. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics. 23, 2633–2635 (2007).
Article CAS PubMed Google Scholar
Hill, W. G. & Weir, B. S. Variances and covariance of squared linkage disequilibria in finite populations. Theor Popul Biol. 33, 54–78 (1988).
Article MathSciNet CAS PubMed MATH Google Scholar
Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 42, 355–360 (2010).
Article CAS PubMed PubMed Central Google Scholar
Wen, Y. J. et al. Methodological implementation of mixed linear models in multi-locus genome-wide association studies. Brief Bioinform. 19, 700–712 (2018).
Article PubMed Google Scholar
Tamba, C. L., Ni, Y. L. & Zhang, Y. M. Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies. PLoS Comput Biol. 13, e1005357 (2017).
Article CAS PubMed PubMed Central Google Scholar
Zhang, J. et al. pLARmEB: integration of least angle regression with empirical Bayes for multilocus genome-wide association studies. Heredity. 118, 517–524 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ren, W. L., Wen, Y. J., Dunwell, J. M. & Zhang, Y. M. pKWmEB: integration of Kruskal-Wallis test with empirical Bayes under polygenic background control for multi-locus genome-wide association study. Heredity. 120, 208–218 (2018).
Article CAS PubMed Google Scholar
Xu, Y., Xu, C. & Xu, S. Prediction and association mapping of agronomic traits in maize using multiple omic data. Heredity. 119, 174–184 (2017).
Article CAS PubMed PubMed Central Google Scholar
Liu, X., Huang, M., Fan, B., Buckler, E. S. & Zhang, Z. Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet. 12(2), e1005767 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kaler, A. S., Gillman, J. D., Beissinger, T. & Purcell, L. C. Statistical models and multiple testing corrections for association mapping in soybean and maize. Front. Plant Sci., https://doi.org/10.3389/fpls.2019.01794 (2020).
Steketee, C. J., Sinclair, T. R., Mandeep, K. R., Schapaugh, W. T. & Li, Z. Unraveling the genetic architecture for carbon and nitrogen related traits and leaf hydraulic conductance in soybean using genome-wide association analyses. BMC Genomics. 20, 211, https://doi.org/10.1186/s12864-019-6170-7 (2019).
Article CAS Google Scholar
Zhang, Y. M., Jia, Z. & Dunwell, J. M. Editorial: the applications of new multi-locus GWAS methodologies in the genetic dissection of complex traits. Front. Plant Sci. 10, 100, https://doi.org/10.3389/fpls.2019.00100 (2019).
Article PubMed PubMed Central Google Scholar
Kaler, A. S. & Purcell, L. C. Estimation of a significance threshold for genome-wide association studies. BMC Genomics. 20, 618, https://doi.org/10.1186/s12864-019-5992-7 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kaler, A. S. et al. Genome-wide association mapping of canopy coverage in diverse soybean genotypes. Mol Breed. 38, 50, https://doi.org/10.1007/s11032-018-0810-5 (2018).
Article CAS Google Scholar
Hwang, E. et al. A genome-wide association study of seed protein and oil content in soybean. BMC Genomics. 15, 1 (2014).
Article CAS PubMed PubMed Central Google Scholar
Hyten, D. L. et al. Highly variable patterns of linkage disequilibrium in multiple soybean populations. Genetics. 175, 1937–1944 (2007).
Article CAS PubMed PubMed Central Google Scholar
Schmutz, J. et al. Genome sequence of the palaeopolyploid soybean. Nature. 463, 178–183 (2010).
Article ADS CAS PubMed Google Scholar
McCarthy, M. I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet. 9(5), 356–369 (2008).
Article CAS PubMed Google Scholar
Kim, T. H. et al. A high-resolution map of active promoters in the human genome. Nature. 436, 876–880 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Dhanapal, A. P. et al. Genome-wide association mapping of soybean chlorophyll traits based on canopy spectral reflectance and leaf extracts. BMC Plant Biol. 16(1), 174 (2016).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge partial funding of this research from the United Soybean Board. This research was supported in part by the U.S. Department of Agriculture, Agricultural Research Service. The USDA-ARS is an equal opportunity, affirmative action employer and all agency services are available without discrimination. Mention of a trademark, vendor, or proprietary product does not constitute a guarantee or warranty of the product by the USDA and does not imply its approval to the exclusion of other products or vendors that may also be suitable. Appreciation is extended to Andy King, Marilynn Davies, Jody Hedge, and Scott Hayes for excellent technical support and to Christina Jamieson for secretarial support.

Author information

Authors and Affiliations

Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, 72704, USA
Avjinder S. Kaler & Larry C. Purcell
USDA-ARS, U.S. Arid Land Agricultural Research Center, 21881 North Cardon Lane, Maricopa, AZ, 85138, USA
Hussein Abdel-Haleem
Division of Plant Sciences, Univ. of Missouri, Columbia, MO, 65211, USA
Felix B. Fritschi
Plant Genetic Research Unit, USDA-ARS, University of Missouri, Columbia, MO, 65211, USA
Jason D. Gillman
Crop Genetics Research Unit, USDA-ARS, 141 Experimental Station Road, Stoneville, MS, 38776, USA
Jeffery D. Ray & James R. Smith

Authors

Avjinder S. Kaler
View author publications
You can also search for this author in PubMed Google Scholar
Hussein Abdel-Haleem
View author publications
You can also search for this author in PubMed Google Scholar
Felix B. Fritschi
View author publications
You can also search for this author in PubMed Google Scholar
Jason D. Gillman
View author publications
You can also search for this author in PubMed Google Scholar
Jeffery D. Ray
View author publications
You can also search for this author in PubMed Google Scholar
James R. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Larry C. Purcell
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.K., H.A., F.F., J.G., J.R., J.S. and L.P. conceived of the idea. A.K. and L.P. collected the field data, and A.K. analyzed the data. A.K. and L.P. wrote the initial manuscript draft, and H.A., F.F., J.G., J.R. and J.S. provided meaningful insights and valuable edits. All authors read and approved the final version.

Corresponding author

Correspondence to Larry C. Purcell.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kaler, A.S., Abdel-Haleem, H., Fritschi, F.B. et al. Genome-Wide Association Mapping of Dark Green Color Index using a Diverse Panel of Soybean Accessions. Sci Rep 10, 5166 (2020). https://doi.org/10.1038/s41598-020-62034-7

Download citation

Received: 15 September 2019
Accepted: 06 March 2020
Published: 20 March 2020
DOI: https://doi.org/10.1038/s41598-020-62034-7

This article is cited by

Genome-wide association mapping of the ‘super-soft’ kernel texture in white winter wheat
- Meriem Aoun
- Arron H Carter
- Craig F Morris
Theoretical and Applied Genetics (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

Genetic gains underpinning a little-known strawberry Green Revolution

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

Introduction

Materials and Methods

Field experiments

Dark green color index (DGCI) determination

Statistical analysis of DGCI phenotypes

Genotyping and linkage disequilibrium

Genome-wide association analysis

Candidate gene identification and true breeding value determination

Allelic effect, favorable alleles, and breeding values estimation

Results

Phenotype descriptions

Genotype data and linkage disequilibrium estimation

Genome-wide association analysis

Candidate gene identification

Discussion

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary information.

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Genome-wide association mapping of the ‘super-soft’ kernel texture in white winter wheat

Comments

Search

Quick links