Introduction

Seed weight, measured as the mean mass per 100 seeds, was an important yield component of soybean and has been positively correlated with seed yield (Burton, 1987). Seed weight is mainly inherited through loci with additive effects (Brim and Cockerham, 1961). Selection for increasing 100-seed weight has been one of the main goals of many soybean breeding programs. Traditionally, plant improvement has relied on phenotypic selection of populations from crosses between cultivars and experimental lines (Stuber et al., 1992). Improvements in soybean seed weight by gene introgression has been slowed because phenotypic selection is complicated by significant genotype × environment interactions (GE) (Mansur et al., 1996; Mian et al., 1996). Consequently, selection for soybean cultivars with high and stable 100-seed weight requires evaluation in multiple environments over several years, which is expensive, time consuming and labor intensive.

Molecular markers offer a faster and more accurate approach to breeding for traits such as seed weight, as selection can be based on genotype rather than solely on phenotype (Mansur et al., 1996; Mian et al., 1996). The use of molecular markers for indirect selection of important agronomic traits, or marker-assisted selection (MAS) can improve the efficiency of traditional plant breeding (Allen, 1994). Some aspects of plant breeding that can be improved by MAS include identification and elimination of undesirable individuals in the early stages of selection; identification of individuals prior to flowering when backcrossing genes that govern the favorable expression of quantitative traits into adapted genotypes; and facilitation of selection for several traits simultaneously. Further, MAS could improve selection of traits that have low heritability by using markers with high heritability. MAS has been especially beneficial when the G × E interaction was significant, but marker × environment interaction was not significant, allowing a stable selection of genotype.

In the past decade, several studies have focused on the mapping of quantitative trait loci (QTL) associated with soybean seed weight (Mansur et al., 1996; Mian, 1996; Hoeck et al., 2003; Zhang et al., 2004; Panthee et al., 2005). In most of these reports, the trait was only measured after harvest. Consequently the earlier studies ignored the contribution to seed size of loci underlying distinct gene expressions at different developmental stages. The danger of this approach is that the loci mapped are composites of clustered or linked loci underlying different stages of seed development.

Developmental programs are very important factors controlling the development of quantitative traits such as seed composition, yield and plant height (Yan et al., 1998b; Sun et al., 2006; Li et al., 2007). The development of some quantitative traits occurs through the actions and interactions of many genes. The actions and interactions are different in different cultivars during different growth periods, and that gene expression is modified by interactions with other genes and by the interaction with environments to affect the final seed size (Atchley and Zhu, 1997). During soybean development, sets of genes are expressed selectively at different growth stages and transcript abundances are influenced by both genotype and environment (Vodkin et al., 2004).

Identification of both conditional (GE sensitive) and unconditional (GE insensitive) QTL (as defined by Sen and Churchill, 2001) would be desirable for MAS. Fortunately, QTL analysis can be adapted to include both the effects of developmental stages (Zhu, 1995) and the effects of GE interactions (Yan et al., 1998a).

It is essential, therefore, to include both the dynamics of gene expression and interactions with environment when analyzing developmentally related quantitative traits (Xu, 1997). Detailed analyses of conditional QTL will lay down the basis for map-based cloning of genes underlying the trait loci. DNA markers derived from those genes will improve the efficiency of MAS.

Zhu (1995) presented a genetic model to understand the genetic expression of the development of quantitative traits and dynamic mapping at different stages. In this study, we analyzed seed development and formation in different stages using this genetic model, although a functional mapping approach based on a logistic-mixture model, implemented with the EM algorithm, was developed to provide the estimates of QTL positions, QTL effects, and other model parameters responsible for growth trajectories (Ma et al., 2002; Wu and Lin, 2006). Wang and Wu (2004) extended the use of functional mapping to the LD (linkage disequilibrium)-based identification of host QTL from a natural population, indicating a broad application of this methodology in both natural and cross population.

The association of developmental behavior of quantitative traits with molecular markers had been reported in rice and cotton (Wu et al., 1999; Ye et al., 2003; Yan et al., 1998a, 1998b), and morphological traits and seed quality traits in soybean (Sun et al., 2006; Li et al., 2007; Xin et al., 2008). The main objective of the present study was to identify both conditional and unconditional QTL underlying the development of gain in seed weight and the QTL underlying final mean seed weight in soybean. An analysis of epistatic effects in seed weight development and formation were described in a separate paper submitted to Genetics Research (Han et al., 2008).

Materials and methods

Plant materials

The mapping population, consisting of 143 F5-derived recombinant inbred lines (RILs), was advanced by single-seed-descent from crosses between ‘Charleston’ (provided by Dr RL Nelson, NSRL, University of Illinois at Champaign-Urbana, IL, USA) and ‘Dong Nong 594’ (developed by Northeast Agriculture University, Harbin, China). The RILs were extracted at F5 generation, advanced without selection for seed size and used for this study at F5:9, F5:10 and F5:11.

The RILs and their parents were grown in a randomized complete block design with three replications at Harbin location (45°N, fine-mesic chernozen soil) in 2004, 2005 and 2006. Rows were 3 m long with a space of 6 cm between two plants, and two-row plots were used. The field location was different each year, soil types differed slightly, planting dates differed by 2 days, applications of herbicides and insecticides were the same in different years, and soybean was rotated by corn. Furthermore, mean temperatures and rainfall varied each year (23.1 °C, 477.6 mm in 2004; 24.9 °C, 569.1 mm in 2005 and 27.4 °C, 544.8 mm in 2005). Therefore, the environment in the 3 years were quite diverse.

Each plot of a single genotype provided 20 plants as seed donors per time point and there were three replications of the two-row plots. Pods were picked off from 5 to 7 nodes of main stem every 10 days since 30 days after flowering until maturity. The 30D sample represented the R2 stage and the 80D sample represented the R8 stage of growth with intervening stages at approximately 10D intervals. Seeds were dried for 30 min in oven at 105 °C and continuously dried at 50–70 °C until the seed weight was stable.

Construction of the genetic linkage map

In our previous study (data not shown), one genetic linkage map including 164 SSR markers and 35 RAPD markers was constructed using 143 F5-derived RILs from cross between ‘Charleston’ and ‘Dongnong 594’. The order of most markers is consistent to Cregan's map (Cregan et al., 1999). This genetic linkage map covered 3067.28 cM and an average distance between markers was 15.65 cM with the longest distance 48.8 cM and the shortest distance 0.5 cM. The average number of markers on each linkage group was 9.7 with an average length 153.36 cM.

Statistical analysis

QTL underlying accumulated seed weight were detected at each growth stage using mean phenotypic values and composite intervals (Zeng, 1993, 1994). The genetic effect was the net accumulation of several gene sets from the initial time of plant growth to the time point t. Phenotypic values for weight gain at time t were given by subtracting the phenotypic means measured at time t−1 from the mean at time t (Zhu, 1995). The derived genetic effect reflected changes in weight that accumulated in the 10 days prior to the measurement rather than the net genetic effect of QTL underlying total accumulated weight. QTL analysis relied on the composite interval method (Zeng, 1993, 1994) and analysis of time-independent genetic effects (Zhu, 1995).

The phenotypic performance of the jth genetic entry at time t within hth environment could be expressed by:

where μ(t) was the population mean at time t, fixed; Eh(t) was environment effect at time t, Eh(t)(0, σ2E(t)); Gj(t) was genetic main effect at time t, Gj(t)(0, σ2G(t)); GEhj(t) was GE interaction effect at time t, GEhj(t)(0, σ2GE(t)); ehjk(t) was residual effect at time t, ehjk(t)(0, σ2e(t)).

The environment effects (E), genetic main effect (G) and GE interaction were predicted by the adjusted unbiased prediction method (Zhu and Weir, 1996). The random effects were used to predict main effect data yj(G(t))(t)+Gj(t) and GE interaction effect data yhj(GE(t))(t)+Eh(t)+GEhj(t), respectively. The composite interval mapping method (Zeng, 1993, 1994) was applied to analyze the predicted yj(G(t)) that was used to search for QTL with genetic main effects at time t.

Genetic behavior measured at time t was the compound result of genes expressed before time (t−1) and effects within the period from time (t−1) to t. These kinds of gene effects were usually not independent. The net genetic main effects and GE interaction effects between time (t−1) and t could be evaluated by the conditional effects (G(tt−1) and GE(tt−1)) at time t given phenotypic mean measured at time (t−1). The mixed model approaches (Zhu, 1995) were used to obtain the conditional genetic main effects (yj(G(tt−1)=μ(tt−1)+G(tt−1)) and GE interaction effects (yhj(GE(tt−1))=μ(tt−1)+Eh(tt−1)+GEhj(tt−1)) for seed weight at different measuring stages in soybean. The composite interval mapping method (Zeng 1993, 1994) was applied to analyze the derived data.

Mapmaker/EXP version 3.0 was used for genetic linkage analysis. The analyses of QTL were performed using QTL Cartographer V 2.1 with composite interval mapping module (Basten et al., 1996). Window size was 5 and 10 cM (Haldane units), respectively. The walk speed was 1 cM. The threshold of logarithm (base 10) of odds (LOD) score for evaluating the statistical significance of QTL effects was determined by 1000 permutations using the Zmapqtl program in QTL Cartographer (Churchill and Doerge, 1994). An LOD value corresponding to an experiment-wise threshold of a=0.05 was used to declare a QTL as significant. The estimate of the QTL position was the point of maximum LOD score in the region under consideration.

Genotype by Trait (GT)-biplot methodology (Yan, 2001) was employed to analyze the interaction between QTL and different environments in all developmental stages, based on the formula: TijTj/Sj=λ1ζi1τj1+λ2ζi2τj2+ɛij, where Tij was the average value of developmental stage i for environment j; Tj is the average value of environment j over all developmental stages, Sj is the standard deviation of environment j among the developmental stages average; ζi1 and ζi2 are the PC1 (first principle component) and PC2 (second principle component) score, respectively, for stage average i; τj1 and τj2 are the PC1 and PC2 score, respectively, for environment j; and ɛij is the residual of the model associated with the developmental stage i, challenged with environment j.

Results

Phenotypic variation

For the two parental cultivars mean seed weights at different developmental periods showed significant differences in all 3 years (Table 1). The mean seed weights of Dongnong 594 were consistently higher than those of Charleston. The difference was most pronounced at the early time points. Individual RILs also differed significantly in their mean seed weights. Some RILs had higher and others had lower mean seed weights than the parents. These transgressive segregants may provide useful germplasm for breeding. In contrast, 100-seed weight variation within each of the 143 RILs across 3 years was not significantly judged by means or ranks (data not shown). Therefore, RIL performance was consistent and G × E was limited. Both skewness and kurtosis values of 100-seed weight were less than 1.0 at all growth stages measured in the three environments. Therefore, the segregation pattern of mean seed weight appeared to fit a normal distribution model suitable for QTL identification.

Table 1 Statistical analysis of mean 100-seed weight (g) at different days after pollination (D) for the parental cultivars and the F5-derived RIL population

Unconditional QTL at different developmental stages

A total of 94 unconditional QTL that influenced seed weight at different developmental stages were identified. The QTL mapped onto 12 linkage groups (MLG A1, MLG A2, MLG B1, MLG C1, MLG C2, MLG D1b, MLG D2, MLG E, MLG F, MLG G, MLG L and MLG M) (data not shown). Of them, three QTL (swC2_1 at 50D and 80D stages; swC2_2 at 30D, 50D, 60D, 70D and 80D stages; swC2_3 at 60D and 80D stages) were consistently detected in all 3 years. Six QTL (swC2_1 at 40D, 60D and 70D stages; swC2_2 at 40D stage; swC2_3 at 40D and 50D stages; swD1b_2 at 80D stage; swE_1 at 80D stage and swM_1 at 80D stage) were detected in 2 of the 3 years. QTL swC2_1 and swC2_2 was detected in each of the six consecutive measurements from 30D to 80D across 3 years. QTL swC2_3 was detected among five consecutive measurements from 40D to 80D across 3 years (Table 2).

Table 2 Unconditional and conditional QTL underlying mean seed weight at different days after pollination (D) for the F5-derived RIL population

Twenty-one unconditional QTL were detected at the final stage measured (80D) (data not shown). QTL swC2_1 among them in 2005 accounted for the largest amount of total phenotypic variation by 80D stage (29%). This accumulation of the genetic effects appeared to be the sum of the QTL actions from 30D to 80D (Table 2).

Genetic main effects were detected for all unconditional QTL (QTLswC2_2 at 30D with highest value and swC2_3 at 40D with lowest value). QTL swC2_1, swC2_2 and swC2_3 showed distinct genetic main effects across the developmental stages (Table 2). QTL swD1b_1, which was identified only in 2006, displayed significant GE interactions. The numbers of unconditional QTL and their genetic effects varied at different periods of seed development in different years. The expression of gene sets controlling mean seed weight was inferred to be time dependent and affected by years and/or environment (Table 2).

GT-biplot analysis (Yan, 2001) for 3 years (in 2004, 2005 and 2006) against unconditional QTL of six developmental stages explained 92% of the total variation. Performance of different QTL at different developmental stages of each environment was evaluated. When 30D in 2004, 70D in 2004, 30D in 2005, 40D in 2005 and 60D in 2005 were set as corner stages, QTL swC2_1 and swC2_2 fell into the sector in which 60D in 2005 was the best stage for measuring the portion of 100-seed weight underlain by these two unconditional QTL. The QTL swC2_3 fell in the sector in which 70D in 2004 was the best stage for measuring 100-seed weight by this unconditional QTL (Figure 1).

Figure 1
figure 1

GT-biplot analysis for best unconditional QTL in different developmental stage of each tested environment. PC1: first principle component; PC2: second principle component; 2004 Harbin; □ 2005 Harbin; 2006 Harbin.

Unconditional QTL swC2_1 swC2_2 and swC2_3 were testified for association with 100-seed weight accumulation from the initial time of measurement (30D) to the time point t by GT-biplot analysis (Figure 1). Phenotypic values of 10 typical inbred lines with high or low 100-seed weight at different developmental periods were shown in Table 3. The relationship between phenotypic values and associated molecular markers indicated that the accuracy of markers OPK14_70, Satt202, Satt460, Satt134 and Satt289 (linked to QTL swC2_1, swC2_2 and swC2_3) for selecting seed size in soybean were 70, 100, 100, 90 and 90%, respectively. Moreover, the beneficial alleles for the three QTL, swC2_1, swC2_2 and swC2_3, were all derived from cultivar Dongnong 594 the large seeded parent.

Table 3 The major unconditional QTL (by LOD score) associated with mean seed weight, the markers used for selection, and their accuracy of application in a breeding program

Conditional QTL at different developmental stages

A total of 68 conditional QTL, underlying 100-seed weight, were identified and mapped onto 12 linkage groups (MLG A1, MLG A2, MLG B1, MLG C1, MLG C2, MLG D1b, MLG D2, MLG E, MLG F, MLG G, MLG L and MLG M) at different developmental stages in different years (data not shown). Of them, two QTL (swC2_1 at 30D and 50D, swC2_2 at 60D) were consistently detected in 3 years. Four QTL (swA2_1 at 40D, swC1_1 at 60D, swC2_1 at 40D, 70D and 80D; swC2_2 at 30D, 40D, 70D and 80D) were detected in 2 years. QTL swC2_1 and swC2_2 were detected among six consecutive measurements from 30D to 80D across 3 years. QTL swC2_1 at 30D in 2005 accounted for the largest amount of total phenotypic variation (34%) (Table 2).

Genetic main effects were detected for all conditional QTL (swC2_1 at 30D with highest value and swM_1 at 60D with lowest value). QTL swC2_1 and swC2_2 showed distinct genetic main effect across the developmental stages (Table 2). QTL swD1b_1 displayed significant GE interactions. The numbers of conditional QTL and their genetic effects varied with the developmental stages in different years (Table 2).

GT-biplot analysis for 3 years (2004, 2005 and 2006) against conditional QTL of six developmental stages explained 86% of the total variation. When 30D in 2005, 60D in 2005, 70D in 2005, 80D in 2005, 70D in 2006 and 80D in 2006 were determined as the corner stages, QTL swC2_1 and swC2_2 fell into the sector in which 30D in 2005 and 60D in 2005 were the best stage for measuring 100-seed weight by these two conditional QTL (Figure 2). QTL swF_1 fell into the sector in which 70D in 2006 was the best stage for determining 100-seed weight by this conditional QTL (Figure 2).

Figure 2
figure 2

GT-biplot analysis for best conditional QTL in different developmental stage of each tested environment. PC1: first principle component; PC2: second principle component; 2004 Harbin; □ 2005 Harbin; 2006 Harbin.

Conditional QTL swC2_1 and swC2_2 were shown to be associated with mean seed weight gain within the period from time (t−1) to t by GT-biplot analysis. Phenotypic increment values of 10 typical inbred lines with high or low mean seed weight at different developmental periods were shown in Table 4. The relationship between phenotypic increment values and the associated molecular markers indicated that accuracy of markers OPK14_70, Satt202 and Satt460 (QTL swC2_1 and swC2_2) in selecting for seed size in soybean were 60, 100 and 100%, respectively. Moreover, beneficial alleles at these two QTL (swC2_1 and swC2_2) were both derived from cultivar Dongnong 594.

Table 4 The major conditional QTL (by LOD score) associated with mean seed weight, the markers used for selection, and their accuracy of application in a breeding program

Discussion

The development of seed traits is time dependent and dynamic. QTL mapping can detect loci underlying the development of a trait. Two kinds of dynamic QTL mapping strategies were used in the present study. The first strategy is to map QTL by time-specific measurement, which is refered as total seed weight QTL mapping. These total seed weight QTL will reveal cumulative gene expression from initial time to t. The second strategy is to map QTL by predicted conditional genetic effects for time (t/t−1).

To date, QTL analysis of seed traits concentrated on unconditional QTL measured at the harvest stage (Mansur et al., 1996; Mian et al., 1996; Hyten et al., 2004). But no information (by early 2008) was available for QTL analysis of the behavior of seed weight gain over growing seasons in soybean. Here, the numbers of QTL related to soybean mean seed weight and their genetic effects were shown to vary at different developmental stages, especially at earlier stages.

QTL underlying weight gain may have controlled gene expressions that occurred in a specific period of plant growth. QTL underlying total seed weight reflected cumulative genetic effects up to that point. Some total seed weight QTL may be detected as weight gain QTL. For example, QTL swC2_1, swC2_2 and swC2_3 that significantly affected seed weight were detected continuously from 30D to 80D, as well as in each composite interval (Table 2). Moreover, some total seed weight QTL may have been placed in the wrong interval by the summing of effects of two linked loci. For example, the conditional QTL swF_2 in the interval Satt335-Sat_120 that was found from 50–60D may contribute to the weight gain QTL swF_1 in the interval Sct_188-Satt335 in 40–50D because of summing with the QTL swF_3 in the interval Sat_120-Sat_103 from 60–70D. It is possible that the QTL are erroneously located and/or represent the sum of many conditional loci of small effect. Positional cloning of such loci would be difficult or impossible.

In the present study, the growth rate of mean seed weight among RILs was different and showed continuous variation. Genetic differences in the seed growth rate of soybean were reported to be largely related to cotyledon cell size (Egli et al., 1980; Hirshfield et al., 1992). However, there was a positive correlation between soybean cotyledon cell number and the ability of the seed to accumulate dry matter (Guldan and Brun, 1985). There was also evidence that plant hormones were involved in determining both sink size and capacity (Liu et al., 2006). A rapid increase in the fresh and dry weight of soybean seed was found to be correlated with a peak in the rate of abscisic acid accumulation in the developing seed (Quebedeaux et al., 1976). The genes that underlie seed weight development are unknown. In this study, 94 unconditional QTL and 68 conditional QTL were identified to be associated with mean seed weight of soybean in different developmental stages. Though the gene networks affecting mean seed weight in soybean remained unknown, analysis of intervals among genetic markers might provide insights.

The QTL swC2_1, swC2_2 and swC2_3 obviously influenced soybean seed weight in various developmental stages, and in most environments (years). Stable QTL found in the present study may be due to one or combination of the following factors: (1) the stable QTL were responsible for major genetic effects and were associated with high LOD scores. As suggested by Tanksley (1993) and Zhuang et al. (1997), QTL with major effects are more likely to be stable across multiple environments. (2) Highly heritable traits tend to be more repeatable and stable across multiple environments (Paterson et al., 1991). Though swC2_1, swC2_2 and swC2_3 effect were very small, increasingly more evidences have been observed that the accumulation of minor gene expressions may have played an important role in the ultimate formation of a complex trait (Wu et al., 2007). Thus, we believed that MAS with QTL swC2_1, swC2_2 and swC2_3 have good potential to increase the efficiency of breeding programs seeking higher 100-seed weight genotypes (Tables 3 and 4).

QTL detected only in a single environment might indicate the presence of QTL × environment interaction (Veldboom and Lee, 1996). Paterson et al. (1991) analyzed three traits in tomato through F2 lines, and indicated that only 4 of 29 QTL were detected in all three environments. Lu et al. (1996) analyzed six important agronomic traits in rice through DH lines, and found that only 7 of 22 QTL were significant in all three environments. Zhuang et al. (1997) analyzed yield components and plant height in rice F2 lines, and the results suggested that only 17 of 44 QTLs were detected in more than one environment. These studies revealed that individual QTL was sensitive to the environment and QTL × environment interaction played an important role in affecting quantitative traits. QTL × environment interactions profoundly affect plant development, especially those changes that represent quantitative traits. Yan et al. (1998a) and Cao et al. (2001) reported that the obvious QTL × environment interaction influenced plant height development of rice. Genetic main effects and QTL × environment effects of QTL for 100-seed weight at different growth periods were detected in this study by Zhu's method (Zhu, 1998). Here, most QTL obtained in the three environments were found to have genetic main effects. Some QTL detected in single environment showed QTL × environment interaction effects, for example, QTL swD1b_1.

In the past, the phenotypic values of 100-seed weight were only measured at the final stage for QTL analysis in soybean (Mansur et al., 1996; Mian, 1996; Hoeck et al., 2003; Hyten, 2004; Zhang et al., 2004; Panthee et al., 2005). In contrast, the phenotypic values were measured at six different developmental stages for unconditional and conditional QTL analysis in this study. Though the QTLs were detected in different stages of plant development, there were still some similar results between present and past experiments. Hoeck et al. (2003) used three populations to identify seed weight QTL and found that Satt277 in MLG C2 was associated with 100-seed weight. The QTL swC2_4 in the present study were located at chromosomal locations similar to those identified by Hoeck et al. (2003). Panthee (2005) identified QTL associated with 100-seed weight near Satt002 in MLG D2, which was similar to swD2_1 in our study. Zhang et al. (2004) detected one QTL of 100-seed weight near Satt509 on MLG B1 using 184 RILs, which were similar to QTL swB1_1 in the present study. QTL swB1_1 was identified at 60D or 70D after flowering in this study was similar as the one reported by Zhang et al. (2004) at the harvest stage. Each of the QTL could be a composite trait and positions should be re-examined by mapping loci during seed development before positional cloning is contemplated.