Introduction

Western wheatgrass (Pascopyrum smithii Rydb.) is a native, perennial cool-season grass, found most abundantly in the southern, mixed-grass prairie region of the Great Plains of North America and is grown for livestock production throughout the temperate regions of the world1. Because it thrives on impoverished soils in pastoral environments, even with multiple, simultaneous stressors2, it is an important species for soil protection, water conservation, and vegetation protection in arid and semi-arid regions. Due to these characteristics, it is a competitive, high-yielding species, providing forage for livestock and wildlife on semi-arid rangelands in Eurasia and northwest China3,4,5,6. Thus, it can appear as a dense monoculture7,8. Seed yields in the cool-season perennial grasses are often low, and in China, the seed supply of perennial grasses has depended on imports for many years due to inadequate supplies of locally produced high quality seed. To become more self-sufficient in supplying seed for its many needs, the Chinese government has encouraged the development of increased seed production capacity9,10.

There have been many studies on the factors affecting crop yields, with the aim of improving yields as much as possible11. Seed yield is a complex trait that is the culmination of growth and developmental processes that are influenced by multiple yield components12,13. Understanding the relationship between yield and yield components and the correlations between yield components is a prerequisite for building an efficient breeding program14. To date, no research has been conducted to develop models for seed yield and related yield components in western wheatgrass. Therefore, it is necessary to examine the relationships among various factors, especially seed yield, yield components, and their interdependence. To create a high seed yield breeding program, a mathematical model needs to be developed that accurately predicts forage and seed yield, and this must subsequently be validated through field experiments15. Owing to the lower complexity and lower environmental influence of yield components compared to yield, the use of these traits to practice indirect selection for yield is justified16. Several research groups have attempted to determine the association between the characters of forage yield. To obtain a high-yield forage species, several scholars have focused their attention on the agronomic traits of yield groups, which are closely related to seed yield. Annapurna et al.17 found that seed yield showed a significant positive correlation with plant height, ear diameter, number of seeds per row, and number of rows per ear. Assefa et al.18 reported correlations between yield and planting density, and other yield components. Tang et al.19 found that grain yield per plant showed a highly significant positive correlation with the 1000-grain weight, plant height and planting density. Traits related to the generative parts of the plant, such as pods per plant, number of seeds per pod, number of fertile spikelets per panicle, panicle length, spikelet density, number of filled seeds, number of effective tillers per plant, and 1000 seed weight, are the most frequently considered parameters20,21,22,23,24,25. Thus, plant morphological traits such as fertile tillers m−2 (Y1), spikelets/fertile tillers (Y2), florets/spikelet (Y3), seed numbers/spikelet (Y4), or seed weight (Y5), can be valuable in defining the best criteria for selection in biological and agronomic studies.

Since seed yield is influenced by environmental conditions, and agronomic factors26,27,28, the experimenter selects the best design based on the information available with respect to field spatial heterogeneity29. Hexi Corridor, in Gansu Province, China, is considered a key seed production centre due to plentiful mountain run-off water, groundwater and dry, warm summer conditions. It is the largest maize seed production area and one of the main vegetable and flower seed production regions in China. Thus, we chose this area to evaluate the potential seed production of cool-season grasses. The orthogonal experimental design (OED) method allows researchers to test the effectiveness of many interventions simultaneously in a single experiment (and identify their interactions) with far fewer experimental units than it would take to exhaust all possible intervention combinations using other techniques. Therefore, since our objective was to investigate relationships among seed yield traits and develop a model of seed yield and yield components, we chose the OED. The aim of this study was to confirm the direct and indirect effects of key seed yield components, including fertile tillers m−2 (Y1), spikelets/fertile tillers (Y2), florets/spikelet (Y3), seed number/spikelet (Y4), and seed weight (mg) (Y5), on seed yield (Z) of Pascopyrum smithii based on a multifactor orthogonal design under field conditions, with various managements.

Formulae representing the theoretical relationship between the seed yield components and seed yield (or potential seed yield) are represented as: ZSY = Y1 × Y2 × Y3 × Y4 × Y5. This study evaluated two hypotheses: (1) all five seed yield components and the seed yield are inter-related, and each yield factor has different direct and indirect effects on seed yield, and (2) an algorithmic model of seed yield based on these five components can be developed to accurately estimate seed yield.

Results

Correlations between traits

Pearson correlation analysis revealed that seed yield (Z) was significantly and positively influenced (P < 0.001) by seed yield components Y1 and Y2, but was negatively affected to a lesser extent by Y4 and Y5 (P < 0.05 or P < 0.01) (Table 1). Y1 had the maximum coefficient on Z (0.472, P < 0.001). The correlations among Y1 through Y5 showed significance (P < 0.001), while Y1 and Y2 were negatively correlated with Y3 and Y5. However, Y1 and Y2 showed strong, positive correlations with Z in for all three years (P < 0.001) (Table 2). The order of the strength of correlation coefficients in each year was 2004 < 2003 < 2005. Y1 had the strongest positive influence on Z (0.452, 0.534, and 0.657 for 2003, 2004 and 2005, respectively), and Y5 had the strongest negative influence on Z (−0.219, −0.209 and −0.354 for 2003, 2004 and 2005, respectively, Table 2).

Table 1 Matrix of Pearson correlation coefficients of Y1~Y5, Z (Pascopyrum smithii Schreb.) averaged over 3 years.
Table 2 Matrix of Pearson correlation coefficients of Y1~Y5, Z (Pascopyrum smithii Schreb.) for each year.

Path analyses of Y1 to Y5 with Z

The results of the path analyses showed that all five seed yield components presented a strong, highly significant direct effect on Z in at least two of the years (Table 3). However, the direct effect of Y1 on Z was strong and positive (highlighted in bold) for all 3 years (2003, 2004 and 2005 were at P < 0.0001), where the coefficients were 0.480, 0.423 and 0.777, respectively. Therefore, Y1 had the largest contribution to Z. Y5 in 2004 (−0.065 at P < 0.001), Y3 in 2003 (0.454 at P < 0.05) and Y4 in 2003 (−0.371 at P < 0.05) had weak but statistically significant direct effects on Z.

Table 3 Path analysis showing direct and indirect effect of Y1–Y5 to Z (Pascopyrum smithii Schreb.).

Regarding the contributions of Y1–Y5 to Z, the strongest positive indirect influence was presented by the pathway from Y1 via Y4 (the coefficient was 0.1740 in 2003), and the second indirect affect was demonstrated by Y2 through Y4 (0.1039 in 2003). In order of decreasing magnitude, Y3 via Y4 (0.0582 in 2005) and Y3 via Y5 (0.0447 in 2003) were observed. The strongest negative indirect influence on Z was Y2 via Y3 (−0.0799 in 2003), and the second indirect affect was demonstrated by Y1 through Y3 (−0.0517 in 2005). Y3 had a negative and marginal direct influence on yield in 2004 and positive effects during the other 2 years on seed yield (Table 3), Y3 had the smallest contribution to Z (Table 3). In summary, the order of contributions of the five seed yield components were Y1 > Y3 > Y2 > Y5 > Y4, and the total direct effects were 1.68, 0.529, 0.195, 0.077 and −0.266, respectively (Table 3). With Y4, the influence was negative, but the overall order of effects was Y1 > Y2 > Y3 > Y5 > Y4 (1.68, 0.2667, 0.2272, 0.05 and −0.2907, respectively, Table 3).

Ridge regression models of seed yield and 5 seed yield components

Seed yield (Z) was highest in 2003 followed by 2004 and 2005 (Table 4). Y1 was highest in 2003 and produced the highest Z. It has been suggested that the value of the ridge parameter K11 should be determined using ridge traces (Fig. 1). For 2003, 2004 and 2005, the curves for Y1 to Y5 showed estimated k values at 0.77, 0.64 and 0.79, respectively (Fig. 1). The ridge regression models A, B and C for 2003, 2004 and 2005, respectively (Table 4), were as follows:

Table 4 Duncan’s Multiple Range Test of the Pascopyrum smithii seed yield (Z) and yield components (Y1–Y5) for the 3 years and of the ridge regression coefficients.
Figure 1
figure 1

Ridge traces of standard partial regression coefficients for increasing values of k for five yield components for year 2003, 2004 and 2005, respectively. Y1 to Y5 are fertile tillers m−2, spikelets per fertile tillers, florets per spikelet, seed numbers per spikelet and seed weight, respectively.

A. \({\rm{Z}}=529.067+0.371\times {Y}_{1}+19.944\times {Y}_{2}+15.127\times {Y}_{3}-55.349\times {Y}_{4}-29.703\times {Y}_{5}\)

(Ridge k = 0.77; F = 8.274 Pr < 0.0001)

B. \({\rm{Z}}=444.094+0.358\times {Y}_{1}+2.075\times {Y}_{2}-2.205\times {Y}_{3}-0.920\times {Y}_{4}-4.078\times {Y}_{5}\)

(Ridge k = 0.64; F = 114.768 Pr < 0.0001)

C. \({\rm{Z}}=271.685+0.381\times {Y}_{1}+14.551\times {Y}_{2}-7.485\times {Y}_{3}+18.647\times {Y}_{4}-48.165\times {Y}_{5}\)

(Ridge k = 0.79; F = 10.797 Pr < 0.0001)

The highest absolute values of the ridge regression coefficients for Y4, Y3 and Y2 occurred in 2003, and for Y5 and Y1, the highest values occurred in 2005 (Table 4). For a reliable model from the data of three successive years, the 380 samples of Z with Y1 to Y5 in the database were transformed using the natural logarithm: S = ln Z, C1 = ln Y1, C2 = ln Y2, C3 = ln Y3, C4 = ln Y4 and C5 = ln Y5.

Then, the ridge regression model was obtained with S and C1-C5 as follows: (The variance analysis and the parameter estimates are listed in Tables 5 and 6 respectively).

$$\begin{array}{c}{\rm{S}}=5.219+0.211\times {C}_{1}+0.095\times {C}_{2}+0.005\times {C}_{3}-0.004\times {C}_{4}-0.295\times {C}_{5}\\ ({\rm{N}}=380,F=209.514,Pr < 0.0001)\end{array}$$
(1)

Thus,

$$\mathrm{ln}\,{\rm{Z}}=5.219+0.211\times \,\mathrm{ln}\,{Y}_{1}+0.095\times \,\mathrm{ln}\,{Y}_{2}+0.005\times \,\mathrm{ln}\,{Y}_{3}-0.004\times \,\mathrm{ln}\,{Y}_{4}-0.295\times \,\mathrm{ln}\,{Y}_{5}$$
(2)
Table 5 Analysis of variance for dependent variable Zactual with the 5 seed-yield components of the 380 samples.
Table 6 Parameter estimates for Zestimated.

Model (2) was transformed to an exponential function:

$${\rm{Z}}=181.272\times {Y}_{1}^{0.21}\times {Y}_{2}^{0.1}\times {Y}_{3}^{0.01}\times {Y}_{4}^{-0.01}\times {Y}_{5}^{-0.30}$$
(3)

Equation (3) was used to estimate the seed yield of all 380 samples, and the results were denoted as Zestimated. The actual seed yields were denoted as Zactual. To test the accuracy, the values of Zactual to Zestimated were used for linear regression (analyses of the variance is shown in Tables 5 and 6). The linear model was as follows:

$${\rm{Zactual}}=-\,230.174+1.568\times {Z}_{estimated}\,({\rm{N}}=380,\,{\rm{F}}=1047.004,\,{\rm{\Pr }} < 0.0001)$$
(4)

Then, via formula (4), the model was adjusted

$${\rm{Zactual}}=-\,230.174+284.235\times {Y}_{1}^{0.21}\times {Y}_{2}^{0.1}\times {Y}_{3}^{0.01}\times {Y}_{4}^{-0.01}\times {Y}_{5}^{-0.30}$$
(5)

The variance test estimated that the intercept and Z were −0.033 and 1.000, respectively (Table 7), which are presented in Fig. 2, superimposed on the 1:1 line.

Table 7 Parameter estimates for Zestimated after adjustment by linear regression.
Figure 2
figure 2

Scatter plot to fit regression line of actual and estimated seed yield adjusted by Zactual. = −230.174 + 284.235 × Y10.21 × Y20.1 × Y30.01 × Y4−0.01 × Y5−0.30 of the 3 years. It is superimposed on the 1:1 line.

We determined the antagonistic and synergistic effects among the Ys on Z using pairwise models (Figs. 3 and 4). In addition, the results of synergism and antagonism among the Y1 to Y5 on Z are discussed (Figs. 3 and 4).

Figure 3
figure 3

Ridgelines of the response surface models showed the synergism and antagonism through Y4 (A) Y3 (B) and Y2 (C) to Y5 and Y3 (D) Y2 (E) and Y1 (F) to Y4.

Figure 4
figure 4

Ridgelines of the response surface models showed the synergism and antagonism through Y1 to Y5 (A) and Y2 (B), Y2 (C) and Y1 (D) to Y3.

Discussion

In this study, precipitation, temperature, and sunlight during the crop-growing period from March to early September for the three years of the study were provided by the Jiuquan Meteorological Observatory of Gansu Province, China (Fig. S1). Temperature and precipitation during the crop growing seasons from 2003–2005 were near the average values for the past ten years in this location (Table S14). Thus, this data should accurately represent the natural field conditions.

The average Pascopyrum smithii seed yield (Z) and the yield components (Y1, Y2, Y3, Y4 and Y5) were very different from 2003 to 2005 (Table 8), mainly owing to climatic conditions (Supplementary Fig. 1), e.g., large precipitation differences between 2003, 2004 and 2005. Moreover, higher rainfall in June, which occurred during the seed growth period, was partly responsible for higher seed yields because it favored pollination and grain filling. As another example, the highest recorded rainfall was in March 2005 (28.2 mm), together with a low air temperature. These factors promoted vegetative growth and significantly decreased Y1 (Table 4), which consequently resulted in the lowest values for Z. The highest values for Z appeared in 2003 because the mild climate and adequate water supply were conducive to crop tillering and yield improvement. In comparison, the higher Z in 2004 corresponded to higher temperatures and adequate water in June and July in 2004 compared to 2005. However, Y1 and Y3 weakly decreased along with age from 2003 to 2005, which may have been due to genetic factors, while larger difference indicate the role of environment30,31. Fertile tillers m−2 (Y1), florets/spikelet (Y3), and seed numbers/spikelet (Y4) were higher in 2003 compared to 2004 and 2005, whereas spikelets/fertile tillers (Y2) was lower in 2003 than in 2004 and 2005. Seed weight (Y5), in contrast, was similar across the years. For each of the traits evaluated, descriptive statistics, including the coefficient of variation (CV), mean, minimum, maximum, Std-Dev and Std-Error values are summarized in Table 8. These indicators have strong differences for improving the expression of traits and provide a good opportunity to cultivate excellent species. The variabilities in seed yield and the yield components in 2003, 2004 and 2005 may have been caused by a stand age divergence, air temperature differences, interactions between soil fertility and climatic conditions, or combinations thereof.

Table 8 Statistics of Y1~Y5, Z (Pascopyrum smithii Schreb.) for years 2003~2005.

Correlation analysis considers a mutual association with no regard to causation, whereas path analysis specifies causes and measures their relative importance32. Because the total correlations between the predictor variables and the response variable, which are partitioned into direct and indirect affects, and the direct and indirect effects of the yield factors can be determined through path analysis, and knowledge on the direct and indirect correlations, especially of the yield, allows breeders to use this additional information to discard or promote genotypes of interest33. Alvi et al.34 and Asghari-zakaria et al.35 used path analysis to enable the identification of traits that are useful for an evaluation standard in increasing crop yield. Cruz et al.36 defines the path coefficient, or cause and effect analysis, as a standardized regression coefficient37, because path analysis is composed of an expansion of multiple regressions when complex interrelationships are involved. Thus, path analysis has been found to be a useful technique of statistical analysis specially designed to quantify direct and indirect trait association with yield38. In this study, path analysis indicates that the total direct effects of Y1, Y2, Y3 and Y5 have highly significant positive correlations with Z, but Z is negatively correlated with Y4 (Table 3). The explanation for why there is a weak negative correlation between the seed number (Y4) and seed yield (Z) might matter to a high density cultivation and soil nutrient limitation7. This finding is consistent with the literature, e.g., the yield of mechanically harvested rapeseed (Brassica napus L.) can be increased by optimum plant density and row spacing39. Table 3 shows that Y1 and Y2 have the largest correlation coefficient and contribution rate, which is consistent with the natural law of plant growth and biological theory. Nevertheless, the correlation of Z with Y3 (Table 1), which was not significant, is probably due to the effects of aging and climate during the individual years and to the field management systems that were repeated yearly40. However, Y3 did not significantly contribute to Z partly because Y3 was mostly under genetic control40. This discovery implies that Y3 is the sub group that should be taken into account if high quality forage is the target of the breeding system. Nevertheless, Y1 was the most critical and available group that significantly contributed to Z (P < 0.001): the coefficients were 0.480, 0.423, and 0.777 for 2003, 2004 and 2005, respectively. This finding is consistent with previous studies in fescues7,41, Russian wild rye42,43, Zoysia grass1, gramineous plants5, and white clover40,44,45. Additionally, path analysis uncovered relationships between the components and the yield that are consistent with previously reported results45. The interrelationships among Y1 to Y5 indicated that Y4 and Y5 had remarkable negative relationships with seed yield (Z; r = −0.03* and r = −0.192**). Y1 and Y2, as indicated above, had significant and positive correlations with Z. Because of the existence of positive and negative correlations between the seed yield components, simple linear relationships between two components by a correlation analysis cannot be successfully predicted by the measurements. However, with standardized variables, a path analysis can determine the relative importance of direct and indirect effects on seed yield. The results of this study further emphasize that as the plants aged during the successive experimental years, Y1, Y2 and Y3 decreased significantly, whereas Y4 and Y5 increased. This finding is consistent with the results of previous research40. This result also implies that Y4 and Y5 should and could be effectively improved if the values of Y1, Y2 and Y3 are lower than normal.

Ridge regression and multiple-regression analyses were applied to avoid high inter-correlation and multico-linearity among the variables46,47,48. The significant correlation coefficients (P < 0.001 and 0.01), path analyses, and ridge regressions of the multifactor orthogonal experimental design and large sample statistical analysis in the field experiments show that the models are reliable48. In addition, ridge regression effectively overcomes the problem of highly multi-correlated predictor variables (seed yield components)48,49. This method is the most effective and practical for the current field scientific research43. Unfortunately, owing to the aging of the plants, designed field management and climate conditions, the coefficients of the ridge regression models in individual years are variable, ranging from −55.349 to 19.944 (Table 4). An original exponential model was found to estimate Z via Y1 through Y5. First, the final algorithm model [exponential Eq. (4)] was deduced using data for 380 plots from various growth regimes in the three successive years. Moreover, all three ridge regression models [The ridge regression models A, B and C] for the individual years were significant (P < 0.001), and they all had coefficients matching the contributions of the five Ys to Z. This result is explained through the relationship between the path analysis and ridge regression analysis, and the test methods and results are more credible. In addition, the contributions in absolute value of the five seed yield components to the seed yield are in the following order: Y1 > Y3 > Y4 > Y2 > Y5 (1.68, 0.529, −0.266, 0.195 and 0.077, Table 3). The total direct effects with Y4 having negative results, showed that the total influence order, in absolute values, is Y1 > Y4 > Y2 > Y3 > Y5 (1.68, −0.2907, 0.2667, 0.2272 and 0.05, Table 3). However, the two pairs of results are from the same database. The ridge analysis values analytically combine the effects of all Ys, especially the effects of aging and climate, to address the variation in Z for the three years, whereas the path analysis includes separate analytic effects of the individual three years.

The antagonistic and synergistic effects between Ys on Z were investigated using pairwise quadratic regression models (Table 9, Figs. 3 and 4). The lines for 2004 (red) and 2003–05 (blue) showed uniform orientations, except for Y4 & Y2, among the ten relation schema subgraphs (Figs. 3 and 4), as did the lines for 2005 (green), except for Y5 & Y4. The subgraphs indicated that Y4 & Y5 (Fig. 3A), Y2 & Y5 (Fig. 3C), Y3 & Y4 (Fig. 3D), Y1 & Y5 (Fig. 4B) and Y1 & Y3 (Fig. 4D) had antagonistic effects on Z, as the ridgelines were at k < 0 (Table S4)49. Conversely, the red and blue lines that also had the same directions indicated that Y3 & Y5, Y2 & Y4 and Y1 &Y4 had synergetic effects on Z (Fig. 3B,E,F) at k > 0 in the ridgelines. The more Y2, less Y5 dynamic (Fig. 3C), which was also evident for Y3 & Y4 (Fig. 3D), were mostly caused by feedforward compensation at the biological level and by the soil nutrient limitation. In Fig. 4A, the blue and red lines almost overlap and are nearly perpendicular to the horizontal axis, whereas the green and black (2003) lines are also partly overlapped and nearly perpendicular to the longitudinal axis. This may be due to genetic constraints on Y5 in a certain range of growth. In this range, Y1 has the optimal value. The increase in Y1-derived Y2 and the Y3 decrease (Fig. 4B,D) were consistent with the soil nutrient limitation. For 2003, 2004, 2005 and 2003–2005, the interactions between Y2 & Y3 gradually changed from synergetic to antagonistic (Fig. 4C). As important seed yield traits of the plant, Y2 & Y3 are regulated by genetics. Further investigations will be needed to verify that these changes are probably due to aging of the grass.

Table 9 Coefficients of pair wised models among the seed yields components.

Conclusions

Algorithmic models were developed to describe the seed yield and yield components needed to improve the seed yield of Pascopyrum smithii. Significant positive correlations were observed between seed yield (Z) and spikelets/fertile tillers (Y2) and fertile tillers m−2 (Y1), and negative correlations were found with seed number/spikelet (Y4). The inter-correlation among these components were significant in 2004 and 2005.

The model of seed yield with its 5 components, based on a large sample size from an orthogonal experimental design in Pascopyrum smithii, was:

$${\rm{Z}}=-\,230.174+284.235\times {Y}_{1}^{0.21}\times {Y}_{2}^{0.1}\times {Y}_{3}^{0.01}\times {Y}_{4}^{-0.01}\times {Y}_{5}^{-0.30}$$

This model can be used to accurately estimate the seed yield with five yield components.

The total direct effects of fertile tillers, florets/spikelet, spikelets/fertile tiller and seed weight on seed yield were positive, and fertile tillers was the largest contributor. The contributions in decreasing order were fertile tillers(Y1) > seed number/spikelet (Y4) > spikelets/fertile tiller (Y2) > florets/spikelet (Y3) > seed weight(Y5). Fertile tillers (Y1) was one of the most important factors that played a key role in seed production. Therefore, selection for high seed yield through direct selection for large fertile tillers (Y1), florets/spikelet (Y3), and seed number/spikelet (Y4) would be effective and reliable for breeding, based on the experimental data of three years of continuous field experiments. Finally, this research laid the foundation for the basic theory of restoration ecology in arid regions and for the promotion of plant carbon cycles to reduce the greenhouse effect. Further studies should be focused on changes in seed yield based on different climatic conditions and site locations.

Materials and Methods

Experimental site description

The cultivar “Rosana” of P. smithii which is commonly planted was introduced from the United States in 2002[the 948 project (202009) of the Ministry of Agriculture of China]. Field experiments were conducted over three years (2003–2005) at the China Agricultural University Grassland Research Station located at the Hexi Corridor, in Jiuquan, Gansu province, northwestern China (39°37′ N latitude and 98°30′ E longitude; altitude 1480 m). Soil at the site is classified as a Mot-Cal-Orthic Aridisol in the Chinese system and as a Xeric Haplocalcid in the USDA soil classification system50. The plots used in this experiment were planted with alfalfa (Medicago sativa L.) in the preceding season. The 6000 m2 experimental site was tilled using a chisel plow in the fall and disk-harrowed in the spring for seedbed preparation. Pascopyrum smithii seeds were planted on April 23, 2002 at a depth of 2.5 cm. The seeding rate was 500 seeds m−2, and the space between rows was 0.45 m. Initial fertilizer was applied in a band that was 6 cm deep and 5 cm to the side of the seed furrows at a rate of 104 kg ha−1 N and 63 kg ha−1 P2O5. There was no seed yield in autumn 2002. Initial chemical characteristics of the soil (0–20 cm) were: pH = 8.39; NH4+ = 32.32 mg kg−1; NO3 = 20.09 mg kg−1; alkali hydrolysable nitrogen = 118.30 mg kg−1; available phosphorus = 36.56 mg kg−1; available potassium = 130.30 mg kg−1; total nitrogen = 0.764 g kg−1; total phosphorus = 0.814 g kg−1; total potassium = 12.52 mg kg−1; organic matter = 10.32 g kg−1 (Table S1).

Experimental design

The Orthogonal Experimental Design (OED) method is typically used to study the comparative effectiveness of multiple intervention components simultaneously; OED with both Orthogonal Array (OA) and Factor Analysis (FA) makes it possible to discover the optimum combinations with only several tests51,52. Thus, to simulate various growing conditions, we used six groups (A to E) (Table S2). Because OED is characterized by equilibrium dispersion, we were able to design experiments to find the best combination of treatments with a minimum of tests. Additionally, OED allowed us to transform complex multi-factor data to single factor analysis. We used a multi-factorial orthogonal design for field plots based on the six groups (Table S2)53, giving a total of 380 experimental plots (each with an area of 28 m2), under various field management treatments (Table S2) within a total field area of 4100 m2. Plots were irrigated five times during the growing season at the following growth stages: vegetative phase jointing, stem formation, ear formation and flowering, respectively. Fertilization was carried out before sowing and again before spring regrowth the following year.

Data collection

Ten samples along 1 m of each row were randomly selected to measure the five seed yield components from anthesis to seed harvest from 2003 to 2005. Plants that were 1 m or less from the edge of the plot we not sampled. Seed yield components and seed yield data for each plot were collected as follows: fertile tillers m−2 (Y1) were measured from ten randomly selected 1-m row samples, and 30 to 36 fertile tillers and 27 to 54 spikelets were randomly selected for measuring spikelets/fertile tillers (Y2), florets/spikelet (Y3) and seed numbers/spikelet (Y4). When the seed heads were ripe, 4 samples from 1 m of the row length were separately threshed by hand, the yield of clean seed for each sample was weighed, and the seed moisture content was confirmed as 7 to 10% for converting into seed yield (kg hm−2) (Z). Ten lots of 100 seeds each were collected to determine mg seed weight (Y5). The total number of samples (n) used to measure Y1 to Y5 and Z were 3800, 13605, 11085, 10770, 3800, and 1520, across all 3 years (Table S3). The sample size for individual years is shown in Supplementary Table 3, and the experimental databases were established with Visio FoxPro (Version 6.0).

Statistics and analytical methods

Path coefficient analysis helps to determine the direct effect of traits and their indirect contributions via other characters38,54. Correlation and path analysis were performed to determine the relationship among the yield and yield contributing characters. Thus, separate and combined analyses for the three years provided useful information48.In addition, ridge regression is a useful parameter estimation method for addressing the collinearity problem frequently arising in multiple linear regression. Ridge regression provides a means of addressing the problem of collinearity without removing variables from the original set of independent variables. Ridge regression analysis55 and Duncan’s multiple range tests for seed yield (Z) and yield components (Y1–Y5) were performed. The data were transformed using logarithmic and power transformations to avoid the effects of highly inter-correlated data, which would lead to multico-linearity between Y1–Y5 and Z. To establish a reliable model, all of the Z and Y1–Y5 data in Visio FoxPro, representing a total of 380 samples (plots) in the three years (i.e. 105 + 129 + 146), were taken as a natural logarithm because mathematically they did not influence the essential relationships between the variables. Analyses of variance and Pearson correlation analyses were performed using SPSS Version 19.0.

If S = InZ and Ci = InYi (i = 1 to 5), then S and C1 to C5 were used for the ridge regression analyses, and the ridge regression model was:

$${\rm{S}}={\rm{C}}\times {\rm{\beta }}+{\rm{u}}$$
(6)

where S is the n × 1 vector of observations of a response variable, C is the n × p matrix of observations on p explanatory variables, β is the p × 1 vector of regression coefficients, and u is the n × 1 vector of residuals satisfying E(ū) = C’, E(uu’) = δ2I.

It is assumed that C and S have been scaled such that C’ C and S’ S are matrices of correlation coefficients. In this equation, n = 380, and p = 5. Thus,

$${\rm{LnZ}}=(\mathop{\sum }\limits_{i=1}^{5}\,\mathrm{Ln}\,{Y}_{i})\times \beta +u$$
(7)

The above logarithmic model (7) was transformed to the following exponential function:

$${\rm{Z}}={e}^{a}\times \mathop{\prod }\limits_{i=1}^{5}({Y}_{i}^{\beta }),$$
(8)

where α and β are constants.

Formula (8) was used to estimate the Z of all 380 samples, which is denoted as Zestimated; the actual seed yield is denoted as Zactual.

A general linear regression model was used to assess the Zactual compared to the Zestimated, and an analysis of variance was used to assess the dependent variable Zactual and the parameter estimates of Zestimated. The linear regression model is:

$${{\rm{Z}}}_{{actual}}={\rm{\beta }}+{\rm{K}}\times {Z}_{estimated}$$
(9)

So, via formula (9), the model was adjusted to:

$${\rm{Z}}={\rm{\beta }}+{\rm{k}}\times {e}^{a}\times \mathop{\prod }\limits_{i=1}^{5}({Y}_{i}^{\beta }).$$
(10)

In addition, the ridge trace and appropriate scatter plots were graphed. The analyses and graphical procedures specified above were all performed using SAS Version 8.2 (Inc. 1988).

Quadratic two-variable regression models between Z and Y1 to Y5 were used as follows:

$${\rm{Z}}=\mathop{\sum }\limits_{i=1}^{2}({\beta }_{i\times j+1}{Y}_{j}^{i})+u(i=1,2;j=1,2),$$
(11)

Where β is a constant.

The equivalent effects of Yi and Yj were determined using:

$$\frac{\partial Z}{\partial {Y}_{i}}=\frac{\partial Z}{\partial {Y}_{j}}$$
(12)

which produced

$$\,{Y}_{j}={\rm{k}}\times {Y}_{i}\pm b,$$
(13)

where b is a constant. The presented ridgelines (13) (Figs. 3 and 4) correspond to the response surface models (11) to show the synergetic and antagonistic effects. The analyses and graphics were all performed using the SAS (v8.2) software49.