Spatial Characterization of Soybean Yield and Quality (Amino Acids, Oil, and Protein) for United States

Continued economic relevancy of soybean is a function of seed quality. The objectives of this study were to: (i) assess the spatial association between soybean yield and quality across major US soybean producing regions, (ii) investigate the relationship between protein, oil, and yield with amino acids (AAs) composition, and (iii) study interrelationship among essential AAs in soybean seed. Data from soybean testing programs conducted across 14 US states from 2012 to 2016 period (n = 35,101 data points) were analyzed. Results indicate that for each Mg ha−1 yield increase, protein yield increased by 0.35 Mg protein ha−1 and oil yield improved by 0.20 Mg oil ha−1. Essential AA concentrations exhibit a spatial autocorrelation and there was a negative relationship between concentration of AA, protein, and oil, with latitude. There was a positive interrelationship with different degree of strength among all AAs, and the correlation between Isoleucine and Valine was the strongest (r = 0.93) followed by the correlation among Arginine, Leucine, Lysine, and Threonine (0.71 < r < 0.88). We concluded that the variability in genotype (G) x management (M) x environment (E) across latitudes influencing yield also affected soybean quality; AA, protein, and oil content in a similar manner.

A negative relationship has been documented in the scientific literature between yield and protein or AA concentration [30][31][32] . However, the concentration measures abundance of a quality trait relative to other competing components. Thus, it is not clear whether the negative relationship between protein and yield is due to a decrease in protein concentration per-se or due to an increase in the composition of other seed components when increasing yield. It is indicated that, even though there is a negative relationship between protein concentration and yield, protein yield and seed yield correlate positively 30 . Mourtzinis et al. 33 also reported a positive correlation between seed yield with protein and oil content. Variability in US soybean seed composition has been already characterized by several researchers 3,[34][35][36] and, more recently, for protein and oil from farmer-provided seed samples by 14 . For AAs, a regional US characterization for the main producing regions is still lacking. Furthermore, the interrelationship among AAs and their individual association with oil and yield is largely unknown. Thus, the objectives of this study were to: (i) assess the spatial association in concentration of AAs for the US soybean producing regions ( Fig. 1), (ii) investigate the relationship between oil and yield with protein and AAs composition, and (iii) study interrelationship among essential and conditional essential AAs in soybean.

Results and Discussion
Total essential amino acid, protein, oil, and protein + oil concentration. The distribution of the sum of 11 essential and conditional essential AAs range from 115 to 174 g kg −1 , with similar mean and median value of 145 g kg −1 (Fig. 2a). The majority of the data (~70%) was concentrated within a 13 g kg −1 range, implying a narrow variation for this seed trait. Protein concentration ranged from 266 g kg −1 to 405 g kg −1 , with both mean and median value at 345 g kg −1 . Similar to the AA, for protein the majority of the data was also concentrated (~70%) within a 24 g kg −1 range. Oil concentration ranged from 157 to 232 g kg −1 , mean and median were 189 g kg −1 , and the standard deviation was 8.4 g kg −1 (Fig. 2a). The sum of protein + oil concentration ranged from 471 to 587 g kg −1 . Concentration of total AA, protein, and oil varied over time, with AA and protein concentrations peaking in 2013 and with the lowest average oil concentration in 2014 year (Fig. 2b-e).
There was a significant spatial autocorrelation and trend across latitudes for AAs, protein, oil, and oil + protein (Tables 1, 2; Fig. 2f-i). All seed quality traits decreased as latitude increased, however, the negative correlation between latitude and oil + protein concentration was stronger because both protein and oil concentration, individually, decreased across latitude (Table 3).
Overall distribution of individual essential AA concentration. The concentration of each individual essential AA was below 35 g kg −1 (3.5% seed weight) with minimum, maximum, and average values varying by AA type (Fig. 3). The overall mean concentration of leucine, arginine, and lysine (26.8, 24.8, and 22.9 g kg −1 , respectively) was greater than those for valine, isoleucine, and threonine (17.3, 16.4, 13.4 g kg −1 , respectively). The concentration of the last three AAs was greater than the overall concentration of cysteine, methionine, and tryptophan (5.1, 4.8, 3.7 g kg −1 , respectively). Median value of each AA did not significantly differ from the mean, with AA distributions close to normal. The narrow ranges between the maximum and minimum values (Fig. 3) or the small standard deviation (<1.5 g kg −1 ) of each essential AA indicate that the impact of any environmental, management, or genetic variation on individual AA concentration of soybeans had only minor (narrow) effect. The mean concentration of essential AAs reported in this study were slightly lower than the mean AAs values described for the different US regions 10,15,36 . Mean values reported by Karr-Lilienthal et al. 12 for different US soybean collections were greater, in most cases at least 1% superior than the mean values reported in this study. Similarly, our mean values are lower than those reported for Iowa 37 and Brazil 8 but greater than those reported by Goldflus et al. 11 for Brazil. Similar to our results, the aforementioned published scientific literature depicted a narrow variation range of AAs among soybean plants depending on performance of crop.  Environment (Location × Year) impact on essential amino acid concentration. A significant correlation was observed on the concentration of essential AA in soybeans and location (Fig. 4). When a univariate correlation analysis was conducted between AAs with latitude, all essential AAs depicted a negative correlation with a Pearson correlation coefficient ranging from −0.05 for valine to −0.21 for tryptophan. All essential AAs, except valine, showed a positive correlation with longitude with a Pearson correlation coefficient ranging from 0.01 for cysteine to 0.18 for tryptophan. When a spatial autocorrelation analysis was conducted using Moran's I, a significant spatial autocorrelation was obtained for all except for cysteine and isoleucine (Table 1; Fig. 4). These results suggest a significant impact of geographical location, with relatively lower AA concentration in the North-West than in the South-East US Corn Belt region. A significant annual variation was also observed on the concentration of essential AA (Fig. 5). In most cases, the concentration of AA was the lowest in 2012 compared to the other evaluated years, attaining its maximum concentrations in 2013 and 2016. Over the years, concentration of arginine, leucine, lysine, methionine, threonine, and tryptophan tend to decrease but cysteine, isoleucine, and valine tended to increase.
Environment is a major player influencing AA concentration 8 . Variation in location or year result in variations in climatic variables such as temperature, radiation, moisture, and soil nutrients that affect soybean growth and result in different seed composition 11 . In addition, as latitude increases, maturity group gradually changes becoming an additional factor. Attempts were made to study the AA concentration of soybeans across different states within the US and Brazil 8,10-12 or among countries, US, Brazil, and China 15 but efforts to look at spatial autocorrelations are scarce in the scientific literature. After a regional comparison on essential and nonessential AA concentration in US soybean meals, Karr-Lilienthal et al. 12 concluded that essential AA concentrations were the lowest for meals sourced from northern regions. The current spatial analysis is in agreement with the literature, but adds that there is variation in the degree of spatial correlation for the AAs considered. The correlation between AAs and latitude is confounded with the effect of the environment (weather × soil and length of the growing season) and with the soybean maturity group between the different US latitudes. Therefore, both environment and genetics play a key role for the spatial trend and autocorrelation reported.

Maturity impact on essential amino acid in soybean.
Since the maturity group factor declines as latitude increases, analysis between maturity group and AA was conducted within a region with similar maturity group. Within regions 1-4, where maturity group ranged from 2.0 to 4.9, a weak negative correlation (Pearson correlation coefficient −0.02 to −0.05) was documented between maturity group and all AAs except tryptophan and cysteine (Fig. 6a). There was no correlation between maturity group and tryptophan, but a weak positive correlation (r = 0.05) between maturity group and cysteine (Fig. 6a). Within regions 5-8, where maturity groups ranged from 0.5 to 3.0, a weak positive correlation (Pearson correlation coefficient 0.04 to 0.11) was obtained between maturity group and all AAs without exception (Fig. 6b). Within regions 9-12, where maturity groups ranged from 0.0 to 2.2, a similar positive but stronger correlation (r = 0.04 to 0.21) than the observed for the regions 5-8 was detected for AA concentration and maturity group (Fig. 6c), except for cysteine. In these regions 9-12, a relatively stronger positive correlation was documented between maturity group and lysine (r = 0.21) and methionine (r = 0.21). Despite detecting correlations between AAs and maturity group within the regional characterization presented above, comparison among regions for AAs revealed a small difference (Fig. 6d). For all AAs, there was no difference across regions but a relative trend of greater AA concentration in regions at lower rather than at higher latitude.
Relationship among essential amino acid in soybean. A significant positive correlation was evident among all essential AAs (Table 3; Fig. 7a). The strongest correlation was between isoleucine and valine (r = 0.93). The correlation among arginine, leucine, lysine, tryptophan, and threonine was the next strongest (0.71 < r < 0.88); followed by the correlation between arginine, lysine, and methionine (0.66 < r < 0.70); and   Relationship between essential amino acid, oil concentration, and soybean yield. There was a significant negative correlation between oil and all essential AAs (Fig. 7b). Arginine, leucine, and methionine presented a relatively strong negative correlation (−0.44 < r < −0.56) with oil concentration relative to the other essential AAs. Similarly, isoleucine, lysine, and valine (−0.32 > r > −0.34), and cysteine, threonine, and tryptophan (−0.14 > r > −0.25) significantly decreased with increasing oil concentration. Similar negative correlation between AAs and oil was recently reported by Mourtzinis et al. 13 . For seed yield, there was a weak and mixed relationship between this trait and AA concentration ( Table 2, Fig. 7c). Amino acids such as arginine, cysteine, leucine, lysine, and threonine weakly (−0.01 > r > −0.17) and negatively correlated with yield. Amino acids such as isoleucine, methionine, tryptophan, and valine weakly (−0.01 > r > −0.17) but positively correlated with yield. Mourtzinis et al. 13 also found a weak positive correlation (r = 0.11) between essential AA concentration and yield.  Soybean yield show a significant relationship with environment and maturity group (Fig. 8). Yield was greater in the latitude range 41-43°N compared to <41 or >43°N (Fig. 8a). No significant yield differences across east-west line (longitude) were reported, but yield tend to decrease in the extreme west (−93 to −100°W) relative to the rest of the region (Fig. 8a). Unlike AA concentration, variable by year, average yield significantly increased over the years (Fig. 8b), and with maturity group (Fig. 8c) in greater magnitude than the correlation between AA and maturity group (Fig. 6a). The similar spatial trend for yield, protein, and oil concentrations across latitude suggest that yield increase does not necessarily decrease actual protein or AA content.

Grain Quality: Concentration versus Content (Yield). Unlike the weak negative correlation between
most of AAs and protein concentration with yield, the correlation between AAs and protein expressed per unit area (kg ha −1 ) with soybean seed yield was strong and positive ( Table 4). The correlation between AA yield (unit area) with oil yield was also positive and strong.
A regression analysis between seed yield with oil and protein yields indicated that for a 1 Mg ha −1 seed yield increase, protein yield increased by 0.35 Mg ha −1 and oil yield improved by 0.20 Mg ha −1 (Fig. 9a). The obtained slope (0.35 Mg protein Mg −1 seed yield) reflects the average US soybean seed protein concentration. Future challenges for agronomic programs will be to identify combination of practices increasing this slope (efficiency) or seed yield or both. The spatial trend of both protein and oil yield indicate their close similarity with the yield spatial pattern (Fig. 8a-c).
The strong correlation among seed yield and quality factors expressed in a per-unit-area basis is in agreement with previous findings 22,32,38 . Rotundo et al. 38 and Ray et al. 22 reported an increase in protein and oil yield with increased application of fertilizer but a decline in protein concentration. Since the concentration of a particular seed quality component is dependent on other factors, it does not provide the actual amount of the seed component produced per seed, per area, or per yield. A correlation between two factors such as yield and protein concentration, therefore, might misguide reflecting that yield and quality are inversely related when that is not the case. In fact, a past review on this topic conveyed concern that the negative correlation between protein concentration and yield might hamper cultivar development 38 . Thus, future studies exploring the yield-quality relationship should focused in both as content and concentration.

Conclusion
Analysis of multi-site-year dataset (n = 35, 101) with seed yield and quality data indicated a significant spatial autocorrelation for soybean yield and quality parameters. Variability in quality traits across regions was related to genetics, management, and environmental (G × E × M) factors. Despite a weak negative relationship between

Material and Methods
Data from soybean testing programs conducted across 14 US states (Fig. 1) from 2012 to 2016 period (n = 35,101 data points) was used for our analysis. Up to twelve soybean testing regions were predefined (http://www.firstseedtests.com/map-soybean.shtml) based on location and soybean maturity group. Within a region, soybean varieties were tested in four locations, selected to represent the diversity in the region. Within a location, soybean varieties were planted on farm either in four rows of 76 cm spacing or seven rows of 34 cm spacing, by 13.7 m row length. Seed companies entered their soybean varieties within specified maturity group for the region every year. Varieties were randomized and replicated at least three times. Soybean was planted and managed following region-specific recommendation. Planting date varied by location and year but ranged from early May to late June (Table 1). Plant stand, yield, seed moisture, oil, protein, and AA concentrations [all determined by near infrared (NIR) spectroscopy] were among variables measured. The essential and conditional essential AAs measured include Arginine, Cysteine, Isoleucine, Leucine, Lysine, Methionine, Threonine, Tryptophan, and Valine. Harvest date also varied by location and years, from late-September to mid-November. Yield and concentrations of AAs, oil, and protein were adjusted to 130 g kg −1 seed moisture content.
In addition to yield and seed quality composition (AAs, protein, and oil concentration), the current study included analysis on derivatives such as sum of total AA, sum of concentration of sulfur containing essential AAs (cysteine and methionine) is presented as TSAA, and the sum of the concentration of oil and protein is referred as oil + protein. The sum of total essential AAs refers to sum of the concentrations of arginine, cysteine, isoleucine, leucine, lysine, methionine, threonine, tryptophan, and valine.
As a first step, a general descriptive analysis of sum of total AA, (total AA) protein concentration, and oil + protein was conducted. Similarly, a descriptive analysis of concentration of individual AA data distribution, mean, minimum, maximum, and median values were calculated for the entire data set to highlight the variation in the data across environment and management factors.
As a second step, a spatial correlation analysis was conducted using Moran's I in R program 39 . Spatial classification of average values of each AA was conducted in ArcMap and plots are presented for visual analysis of spatial trend across the study area. Correlation analysis of concentration of AA with latitude, longitude, and years was conducted using PROC CORR procedure of SAS 40 .
For a third step, the relationship between AAs with soybean maturity group within regions with a significant similarity in maturity group was conducted. Regions 1 to 4, 5 to 8, and 8 to 12 were identified as regions with significant similarity in maturity groups tested (Fig. 1). A comparison of the concentration of AA among the three regional groups was also conducted using PROC MIXED procedure of SAS.
As the fourth step, the interrelationship among each of the essential AAs was also conducted to understand the effect of change in one AA over the other. A similar correlation analysis was also conducted for each of the essential AAs with the rest of the seed quality traits and yield. In order to determine the actual relationship between yield and AAs; the impact of location, year, and maturity group was also analyzed.
Lastly, quality factor per ha −1 (AA yield ha −1 , protein yield ha −1 , oil yield ha −1 ) were calculated by multiplying seed yield with percentages of each quality factor. A correlation analysis was conducted at the per-unit-area basis with yield of AAs, protein, oil, and seed yield among themselves and with location and year. Regression analysis was also conducted to determine changes in protein and oil yield ha −1 per change in seed yield. A summary was prepared from the overall analysis of impact of location, year, and maturity group on AAs; the interrelationship and relationship between AAs and yield, and the impact of the aforementioned variables (e.g., location, year, maturity group) on yield.