Seed yield can be explained by altered yield components in field-grown western wheatgrass (Pascopyrum smithii Rydb.)

Western wheatgrass (Pascopyrum smithii Rydb.) is an important cool-season forage and turfgrass. However, due to seed dormancy and poor seedling vigor, it is difficult to develop high seed yield production systems, and assessing these components in response to seed yield. Based on multifactor orthogonally designed field experimental plots under various field management regimes, the effects of numbers of fertile tillers m−2 (Y1), spikelets/fertile tiller (Y2), florets/spikelet (Y3), seed numbers/spikelet (Y4), and seed weight (Y5) on seed yield (Z) were determined over three successive years. Correlation analysis indicated that fertile tillers (Y1) was the most important seed yield component. And the biggest contribution of those five yield component is fertile tillers (Y1), followed by seed numbers/spikelet (Y4), spikelets/fertile tiller (Y2), florets/spikelet (Y3) and seed weight (Y5), respectively. By using ridge regression analysis, we have developed an accurate model of seed yield with its five components. Finally, the results of synergism and antagonism among these yield components on seed yield showed that fertile tillers and seed numbers/spikelet had an antagonistic effect on seed yield. Therefore, selection for high seed yield by direct selection for large values of fertile tillers and seed numbers/spikelet would be the most effective breeding strategy for western wheatgrass.

Western wheatgrass (Pascopyrum smithii Rydb.) is a native, perennial cool-season grass, found most abundantly in the southern, mixed-grass prairie region of the Great Plains of North America and is grown for livestock production throughout the temperate regions of the world 1 . Because it thrives on impoverished soils in pastoral environments, even with multiple, simultaneous stressors 2 , it is an important species for soil protection, water conservation, and vegetation protection in arid and semi-arid regions. Due to these characteristics, it is a competitive, high-yielding species, providing forage for livestock and wildlife on semi-arid rangelands in Eurasia and northwest China 3-6 . Thus, it can appear as a dense monoculture 7,8 . Seed yields in the cool-season perennial grasses are often low, and in China, the seed supply of perennial grasses has depended on imports for many years due to inadequate supplies of locally produced high quality seed. To become more self-sufficient in supplying seed for its many needs, the Chinese government has encouraged the development of increased seed production capacity 9, 10 .
There have been many studies on the factors affecting crop yields, with the aim of improving yields as much as possible 11 . Seed yield is a complex trait that is the culmination of growth and developmental processes that are influenced by multiple yield components 12,13 . Understanding the relationship between yield and yield components and the correlations between yield components is a prerequisite for building an efficient breeding program 14 . To date, no research has been conducted to develop models for seed yield and related yield components in western wheatgrass. Therefore, it is necessary to examine the relationships among various factors, especially seed yield, yield components, and their interdependence. To create a high seed yield breeding program, a mathematical model needs to be developed that accurately predicts forage and seed yield, and this must subsequently be validated through field experiments 15 . Owing to the lower complexity and lower environmental influence of

Results
Correlations between traits. Pearson correlation analysis revealed that seed yield (Z) was significantly and positively influenced (P < 0.001) by seed yield components Y 1 and Y 2 , but was negatively affected to a lesser extent by Y 4 and Y 5 (P < 0.05 or P < 0.01) ( Table 1). Y 1 had the maximum coefficient on Z (0.472, P < 0.001). The correlations among Y 1 through Y 5 showed significance (P < 0.001), while Y 1 and Y 2 were negatively correlated with Y 3 and Y 5 . However, Y 1 and Y 2 showed strong, positive correlations with Z in for all three years (P < 0.001) ( Table 2 Table 2).
Path analyses of Y 1 to Y 5 with Z. The results of the path analyses showed that all five seed yield components presented a strong, highly significant direct effect on Z in at least two of the years (Table 3). However, the direct effect of Y 1 on Z was strong and positive (highlighted in bold) for all 3 years (2003,2004 and 2005 were at P < 0.0001), where the coefficients were 0.480, 0.423 and 0.777, respectively. Therefore, Y 1 had the largest contribution to Z. Y 5 in 2004 (−0.065 at P < 0.001), Y 3 in 2003 (0.454 at P < 0.05) and Y 4 in 2003 (−0.371 at P < 0.05) had weak but statistically significant direct effects on Z.
Regarding the contributions of Y 1 -Y 5 to Z, the strongest positive indirect influence was presented by the pathway from Y 1 via Y 4 (Table 3), Y 3 had the smallest contribution to Z (Table 3). In summary, the order of contributions of the five seed yield components www.nature.com/scientificreports www.nature.com/scientificreports/ (Table 3). With Y 4 , the influence was negative, but the overall order of effects was Y 1 > Y 2 > Y 3 > Y 5 > Y 4 (1.68, 0.2667, 0.2272, 0.05 and −0.2907, respectively, Table 3).
Ridge regression models of seed yield and 5 seed yield components. Seed yield (Z) was highest in 2003 followed by 2004 and 2005 (Table 4). Y 1 was highest in 2003 and produced the highest Z. It has been suggested that the value of the ridge parameter K 11 should be determined using ridge traces (Fig. 1  year Indirect effect via   (Table 4). For a reliable model from the data of three successive years, the 380 samples of Z with Y 1 to Y 5 in the database were transformed using the natural logarithm: S = ln Z, C 1 = ln Y 1 , C 2 = ln Y 2 , C 3 = ln Y 3 , C 4 = ln Y 4 and C 5 = ln Y 5 .
Then, the ridge regression model was obtained with S and C 1 -C 5 as follows: (The variance analysis and the parameter estimates are listed in Tables 5 and 6 respectively).   Thus, Model (2) was transformed to an exponential function: Equation (3) was used to estimate the seed yield of all 380 samples, and the results were denoted as Z estimated . The actual seed yields were denoted as Z actual . To test the accuracy, the values of Z actual to Z estimated were used for linear regression (analyses of the variance is shown in Tables 5 and 6). The linear model was as follows: < . Then, via formula (4), the model was adjusted The variance test estimated that the intercept and Z were −0.033 and 1.000, respectively (Table 7), which are presented in Fig. 2, superimposed on the 1:1 line.
We determined the antagonistic and synergistic effects among the Ys on Z using pairwise models (Figs. 3 and 4). In addition, the results of synergism and antagonism among the Y 1 to Y 5 on Z are discussed (Figs. 3 and 4).   Table 7. Parameter estimates for Z estimated after adjustment by linear regression.

Discussion
In this study, precipitation, temperature, and sunlight during the crop-growing period from March to early September for the three years of the study were provided by the Jiuquan Meteorological Observatory of Gansu Province, China (Fig. S1). Temperature and precipitation during the crop growing seasons from 2003-2005 were near the average values for the past ten years in this location (Table S14). Thus, this data should accurately represent the natural field conditions.
The average Pascopyrum smithii seed yield (Z) and the yield components (Y 1 , Y 2 , Y 3 , Y 4 and Y 5 ) were very different from 2003 to 2005 (Table 8), mainly owing to climatic conditions ( Supplementary Fig. 1), e.g., large precipitation differences between 2003, 2004 and 2005. Moreover, higher rainfall in June, which occurred during the seed growth period, was partly responsible for higher seed yields because it favored pollination and grain filling. As another example, the highest recorded rainfall was in March 2005 (28.2 mm), together with a low air temperature. These factors promoted vegetative growth and significantly decreased Y 1 (Table 4) Table 8. These indicators have strong differences for improving the expression of traits and provide a good opportunity to cultivate excellent species. The variabilities in seed yield and the yield components in 2003, 2004 and 2005 may have been caused by a stand age divergence, air temperature differences, interactions between soil fertility and climatic conditions, or combinations thereof.
Correlation analysis considers a mutual association with no regard to causation, whereas path analysis specifies causes and measures their relative importance 32 . Because the total correlations between the predictor variables and the response variable, which are partitioned into direct and indirect affects, and the direct and indirect effects of the yield factors can be determined through path analysis, and knowledge on the direct and indirect correlations, especially of the yield, allows breeders to use this additional information to discard or promote genotypes of interest 33 . Alvi et al. 34 and Asghari-zakaria et al. 35 used path analysis to enable the identification   www.nature.com/scientificreports www.nature.com/scientificreports/ of traits that are useful for an evaluation standard in increasing crop yield. Cruz et al. 36 defines the path coefficient, or cause and effect analysis, as a standardized regression coefficient 37 , because path analysis is composed of an expansion of multiple regressions when complex interrelationships are involved. Thus, path analysis has been found to be a useful technique of statistical analysis specially designed to quantify direct and indirect trait association with yield 38 . In this study, path analysis indicates that the total direct effects of Y 1 , Y 2 , Y 3 and Y 5 have highly significant positive correlations with Z, but Z is negatively correlated with Y 4 ( Table 3). The explanation for why there is a weak negative correlation between the seed number (Y 4 ) and seed yield (Z) might matter to a high density cultivation and soil nutrient limitation 7 . This finding is consistent with the literature, e.g., the yield of mechanically harvested rapeseed (Brassica napus L.) can be increased by optimum plant density and row spacing 39 . Table 3 shows that Y 1 and Y 2 have the largest correlation coefficient and contribution rate, which is consistent with the natural law of plant growth and biological theory. Nevertheless, the correlation of Z with Y 3 (Table 1), which was not significant, is probably due to the effects of aging and climate during the individual years and to the field management systems that were repeated yearly 40 . However, Y 3 did not significantly contribute to Z partly because Y 3 was mostly under genetic control 40 . This discovery implies that Y 3 is the sub group that should be taken into account if high quality forage is the target of the breeding system. Nevertheless, Y 1 was the most critical and available group that significantly contributed to Z (P < 0.001): the coefficients were 0.480, 0.423, and 0.777 for 2003, 2004 and 2005, respectively. This finding is consistent with previous studies in fescues 7,41 , Russian wild rye 42,43 , Zoysia grass 1 , gramineous plants 5 , and white clover 40,44,45 . Additionally, path analysis uncovered relationships between the components and the yield that are consistent with previously reported results 45 . The interrelationships among Y 1 to Y 5 indicated that Y 4 and Y 5 had remarkable negative relationships with seed yield (Z; r = −0.03* and r = −0.192**). Y 1 and Y 2, as indicated above, had significant and positive correlations with Z. Because of the existence of positive and negative correlations between the seed yield components, simple linear relationships between two components by a correlation analysis cannot be successfully predicted by the measurements. However, with standardized variables, a path analysis can determine the relative importance of direct and indirect effects on seed yield. The results of this study further emphasize that as the plants aged during the successive experimental years, Y 1 , Y 2 and Y 3 decreased significantly, whereas Y 4 and Y 5 increased. This finding is consistent with the results of previous research 40 . This result also implies that Y 4 and Y 5 should and could be effectively improved if the values of Y 1 , Y 2 and Y 3 are lower than normal.
Ridge regression and multiple-regression analyses were applied to avoid high inter-correlation and multico-linearity among the variables [46][47][48] . The significant correlation coefficients (P < 0.001 and 0.01), path analyses, and ridge regressions of the multifactor orthogonal experimental design and large sample statistical analysis in the field experiments show that the models are reliable 48 . In addition, ridge regression effectively overcomes the problem of highly multi-correlated predictor variables (seed yield components) 48,49 . This method is the most effective and practical for the current field scientific research 43 . Unfortunately, owing to the aging of the plants, designed field management and climate conditions, the coefficients of the ridge regression models in individual years are variable, ranging from −55.349 to 19.944 (Table 4). An original exponential model was found to estimate Z via Y 1 through Y 5 . First, the final algorithm model [exponential Eq. (4)] was deduced using data for 380 plots from various growth regimes in the three successive years. Moreover, all three ridge regression models [The ridge regression models A, B and C] for the individual years were significant (P < 0.001), and they all had coefficients matching the contributions of the five Ys to Z. This result is explained through the relationship between the path analysis and ridge regression analysis, and the test methods and results are more credible. In addition, the contributions in absolute value of the five seed yield components to the seed yield are in the following order: (1.68, 0.529, −0.266, 0.195 and 0.077, Table 3). The total direct effects with Y 4 having negative results, showed that the total influence order, in absolute values, is Y 1 > Y 4 > Y 2 > Y 3 > Y 5 (1.68, −0.2907, 0.2667, 0.2272 and 0.05, Table 3). However, the two pairs of results are from the same database. The ridge analysis values analytically combine the effects of all Ys, especially the effects of aging and climate, to address the variation in Z for the three years, whereas the path analysis includes separate analytic effects of the individual three years.
The antagonistic and synergistic effects between Ys on Z were investigated using pairwise quadratic regression models (Table 9, Figs. 3 and 4). The lines for 2004 (red) and 2003-05 (blue) showed uniform orientations, except for Y 4 & Y 2, among the ten relation schema subgraphs (Figs. 3 and 4), as did the lines for 2005 (green), except for Y 5 & Y 4 . The subgraphs indicated that Y 4 & Y 5 (Fig. 3A), Y 2 & Y 5 (Fig. 3C), Y 3 & Y 4 (Fig. 3D), Y 1 & Y 5 (Fig. 4B) and Y 1 & Y 3 (Fig. 4D) had antagonistic effects on Z, as the ridgelines were at k < 0 (Table S4) 49 . Conversely, the red and blue lines that also had the same directions indicated that Y 3 & Y 5 , Y 2 & Y 4 and Y 1 &Y 4 had synergetic effects on Z (Fig. 3B,E,F) at k > 0 in the ridgelines. The more Y 2 , less Y 5 dynamic (Fig. 3C), which was also evident for Y 3 & Y 4 (Fig. 3D), were mostly caused by feedforward compensation at the biological level and by the soil nutrient limitation. In Fig. 4A, the blue and red lines almost overlap and are nearly perpendicular to the horizontal axis, whereas the green and black (2003) lines are also partly overlapped and nearly perpendicular to the longitudinal axis. This may be due to genetic constraints on Y 5 in a certain range of growth. In this range, Y 1 has the optimal value. The increase in Y 1 -derived Y 2 and the Y 3 decrease (Fig. 4B,D) were consistent with the soil nutrient limitation. For 2003For , 2004For , 2005For and 2003For -2005, the interactions between Y 2 & Y 3 gradually changed from synergetic to antagonistic (Fig. 4C). As important seed yield traits of the plant, Y 2 & Y 3 are regulated by genetics. Further investigations will be needed to verify that these changes are probably due to aging of the grass.

conclusions
Algorithmic models were developed to describe the seed yield and yield components needed to improve the seed yield of Pascopyrum smithii. Significant positive correlations were observed between seed yield (Z) and spikelets/ fertile tillers (Y 2 ) and fertile tillers m −2 (Y 1 ), and negative correlations were found with seed number/spikelet (Y 4 ). The inter-correlation among these components were significant in 2004 and 2005. (2019) 9:17976 | https://doi.org/10.1038/s41598-019-54586-0 www.nature.com/scientificreports www.nature.com/scientificreports/ The model of seed yield with its 5 components, based on a large sample size from an orthogonal experimental design in Pascopyrum smithii, was: This model can be used to accurately estimate the seed yield with five yield components. The total direct effects of fertile tillers, florets/spikelet, spikelets/fertile tiller and seed weight on seed yield were positive, and fertile tillers was the largest contributor. The contributions in decreasing order were fertile tillers(Y 1 ) > seed number/spikelet (Y4) > spikelets/fertile tiller (Y 2 ) > florets/spikelet (Y 3 ) > seed weight(Y 5 ). Fertile tillers (Y 1 ) was one of the most important factors that played a key role in seed production. Therefore, selection for high seed yield through direct selection for large fertile tillers (Y 1 ), florets/spikelet (Y 3 ) , and seed number/ spikelet (Y4) would be effective and reliable for breeding, based on the experimental data of three years of continuous field experiments. Finally, this research laid the foundation for the basic theory of restoration ecology in arid regions and for the promotion of plant carbon cycles to reduce the greenhouse effect. Further studies should be focused on changes in seed yield based on different climatic conditions and site locations.  Table 9. Coefficients of pair wised models among the seed yields components.  50 . The plots used in this experiment were planted with alfalfa (Medicago sativa L.) in the preceding season. The 6000 m 2 experimental site was tilled using a chisel plow in the fall and disk-harrowed in the spring for seedbed preparation. Pascopyrum smithii seeds were planted on April 23, 2002 at a depth of 2.5 cm. The seeding rate was 500 seeds m −2 , and the space between rows was 0.45 m. Initial fertilizer was applied in a band that was 6 cm deep and 5 cm to the side of the seed furrows at a rate of 104 kg ha −1 N and 63 kg ha −1 P 2 O 5 . There was no seed yield in autumn 2002. Initial chemical characteristics of the soil (0-20 cm) were: pH = 8.39; NH 4 + = 32.32 mg kg −1 ; NO 3 − = 20.09 mg kg −1 ; alkali hydrolysable nitrogen = 118.30 mg kg −1 ; available phosphorus = 36.56 mg kg −1 ; available potassium = 130.30 mg kg −1 ; total nitrogen = 0.764 g kg −1 ; total phosphorus = 0.814 g kg −1 ; total potassium = 12.52 mg kg −1 ; organic matter = 10.32 g kg −1 (Table S1).

Materials and Methods
Experimental design. The Orthogonal Experimental Design (OED) method is typically used to study the comparative effectiveness of multiple intervention components simultaneously; OED with both Orthogonal Array (OA) and Factor Analysis (FA) makes it possible to discover the optimum combinations with only several tests 51,52 . Thus, to simulate various growing conditions, we used six groups (A to E) (Table S2). Because OED is characterized by equilibrium dispersion, we were able to design experiments to find the best combination of treatments with a minimum of tests. Additionally, OED allowed us to transform complex multi-factor data to single factor analysis. We used a multi-factorial orthogonal design for field plots based on the six groups (Table S2) 53 , giving a total of 380 experimental plots (each with an area of 28 m 2 ), under various field management treatments (Table S2) within a total field area of 4100 m 2 . Plots were irrigated five times during the growing season at the following growth stages: vegetative phase jointing, stem formation, ear formation and flowering, respectively. Fertilization was carried out before sowing and again before spring regrowth the following year.
Data collection. Ten samples along 1 m of each row were randomly selected to measure the five seed yield components from anthesis to seed harvest from 2003 to 2005. Plants that were 1 m or less from the edge of the plot we not sampled. Seed yield components and seed yield data for each plot were collected as follows: fertile tillers m −2 (Y 1 ) were measured from ten randomly selected 1-m row samples, and 30 to 36 fertile tillers and 27 to 54 spikelets were randomly selected for measuring spikelets/fertile tillers (Y 2 ), florets/spikelet (Y 3 ) and seed numbers/spikelet (Y 4 ). When the seed heads were ripe, 4 samples from 1 m of the row length were separately threshed by hand, the yield of clean seed for each sample was weighed, and the seed moisture content was confirmed as 7 to 10% for converting into seed yield (kg hm −2 ) (Z). Ten lots of 100 seeds each were collected to determine mg seed weight (Y 5 ). The total number of samples (n) used to measure Y 1 to Y 5 and Z were 3800, 13605, 11085, 10770, 3800, and 1520, across all 3 years (Table S3). The sample size for individual years is shown in Supplementary  Table 3, and the experimental databases were established with Visio FoxPro (Version 6.0).

Statistics and analytical methods.
Path coefficient analysis helps to determine the direct effect of traits and their indirect contributions via other characters 38,54 . Correlation and path analysis were performed to determine the relationship among the yield and yield contributing characters. Thus, separate and combined analyses for the three years provided useful information 48 .In addition, ridge regression is a useful parameter estimation method for addressing the collinearity problem frequently arising in multiple linear regression. Ridge regression provides a means of addressing the problem of collinearity without removing variables from the original set of independent variables. Ridge regression analysis 55 and Duncan's multiple range tests for seed yield (Z) and yield components (Y 1 -Y 5 ) were performed. The data were transformed using logarithmic and power transformations to avoid the effects of highly inter-correlated data, which would lead to multico-linearity between Y 1 -Y 5 and Z. To establish a reliable model, all of the Z and Y 1 -Y 5 data in Visio FoxPro, representing a total of 380 samples (plots) in the three years (i.e. 105 + 129 + 146), were taken as a natural logarithm because mathematically they did not influence the essential relationships between the variables. Analyses of variance and Pearson correlation analyses were performed using SPSS Version 19.0. If S = InZ and C i = InY i (i = 1 to 5), then S and C 1 to C 5 were used for the ridge regression analyses, and the ridge regression model was: = × β + S C u (6) where S is the n × 1 vector of observations of a response variable, C is the n × p matrix of observations on p explanatory variables, β is the p × 1 vector of regression coefficients, and u is the n × 1 vector of residuals satisfying E(ū) = C' , E(uu') = δ 2 I. It is assumed that C and S have been scaled such that C' C and S' S are matrices of correlation coefficients. In this equation, n = 380, and p = 5. Thus, i i 1 5 The above logarithmic model (7) was transformed to the following exponential function: where α and β are constants.
Formula (8) was used to estimate the Z of all 380 samples, which is denoted as Z estimated ; the actual seed yield is denoted as Z actual .
A general linear regression model was used to assess the Z actual compared to the Z estimated , and an analysis of variance was used to assess the dependent variable Z actual and the parameter estimates of Z estimated . The linear regression model is: actual estimated So, via formula (9), the model was adjusted to: In addition, the ridge trace and appropriate scatter plots were graphed. The analyses and graphical procedures specified above were all performed using SAS Version 8.2 (Inc. 1988).
Quadratic two-variable regression models between Z and Y 1 to Y 5 were used as follows: Where β is a constant. The equivalent effects of Yi and Yj were determined using: where b is a constant. The presented ridgelines (13) (Figs. 3 and 4) correspond to the response surface models (11) to show the synergetic and antagonistic effects. The analyses and graphics were all performed using the SAS (v8.2) software 49 .