Introduction

Crop growth models have been developed during the last decades by integrating knowledge across disciplines such as crop ecophysiology, micrometeorology and soil science (Loomis et al., 1979). These models were conventionally used to predict the performance of given cultivars under various environmental conditions, and are now increasingly being used in breeding programmes (Aggarwal et al., 1997), for example to assist in the design of new plant types (Haverkort & Kooman, 1997).

Many crop growth models use as input physiological parameters that account for differences among cultivars. These parameters, often regarded as environment-independent genotype coefficients, allow the models to predict performance of diverse cultivars under different growing conditions. The models can therefore potentially resolve genotype-by-environment interactions into underlying physiological parameters (Hunt et al., 1993). Because these parameters are genotypic, information from genetic studies can be incorporated into crop models to guide prediction of genotypic differences in growth or development. For instance, White & Hoogenboom (1996) presented a gene-based model for the growth of the common bean (Phaseolus vulgaris), in which they applied linear regression to estimate values for more than 20 physiological model-input traits from allelic information of seven known genes. Their results and a subsequent evaluation study of the gene-based model (Hoogenboom et al., 1997) highlighted the potential use of genetic information to represent cultivar differences in crop models. This makes the biological meaning of model parameters more explicit, and simplifies calibration of new cultivars in crop models, a recognized difficulty with existing models (Hunt et al., 1993). However, the approach of White & Hoogenboom (1996) implies that all the model-input traits were controlled by pleiotropic effects of the seven genes, ignoring effects of possible additional trait-specific genes.

It is now possible to dissect variation of a trait exhibiting quantitative inheritance into the effects of discrete genetic loci — quantitative trait loci (QTLs) — linked to markers on a molecular marker map (e.g. Paterson et al., 1988). The use of molecular markers to identify QTLs can help to elucidate the inheritance of specific model-input traits (Hoogenboom et al., 1997; Stam, 1998). Using such techniques, Yin et al. (1999b) have shown that input traits in a model for predicting barley biomass and yield production were controlled by separate QTLs, though many of them were associated with the same major gene. The QTL mapping of physiological model-input traits might open up opportunities to predict the yield potential of a specific genotype in various environments on the basis of individual genetic factors (Aggarwal et al., 1997). The objective of this study was to assess the ability of a crop growth model with QTL-based estimates of input parameters to predict the yield of recombinant inbred lines (RILs) of barley (Hordeum vulgare L.).

Materials and methods

The crop growth model

The QTL-based crop growth model used in this study was derived from the model SYP-BL as described by Yin et al. (2000). It quantifies barley growth as affected by radiation, temperature and plant nitrogen status. Leaf photosynthesis is estimated based on radiation flux, leaf nitrogen concentration (LNC) and specific leaf area (SLA). Total daily crop photosynthesis is calculated by integrating instantaneous photosynthesis rates over the leaf area index and over the day. Daily growth rate is estimated after subtraction of dark respiration. The biomass produced is distributed among organs based on partitioning coefficients that depend on development stages (DS). The DS is defined as 0 at emergence, 1 at flowering and 2 at maturity, and is calculated as the accumulation of daily development rates, which increase proportionally with the effective temperature between 0°C and 26°C. The model calculates yield as spike biomass accumulated over the postflowering period. The detrimental effect of lodging on yield is simulated through reduction in canopy photosynthesis, for which the completely lodged canopy is assumed to be a single large horizontal leaf.

Major physiological model-input traits were pre-flowering duration (Pre-F), post-flowering duration (Post-F), LNC, SLA, fraction of total biomass partitioned to roots and shoots, and fraction of shoot biomass partitioned to leaves and spikes (FPleaf and FPspike). The fraction of shoot biomass partitioned to stems is 1 − FPleaf − FPspike. Environmental model inputs were daily radiation and temperature. For a given set of physiological and environmental input values, the model produces predictions of yield (spike biomass) and shoot biomass.

Field experiments

Field experiments were conducted to allow identification of QTLs for model input parameters and to measure model-output traits (yield and shoot biomass). Ninety-four RILs, produced by eight generations of single-seed descent from a cross of two two-row spring barley cultivars, Prisma and Apex, were grown in 1996 and 1997 on alluvial clay soil at Wageningen (52°N latitude), The Netherlands. A randomized incomplete block design was used, with two replicates for each RIL. Crops were grown under conditions to ensure that plants were free of pests, disease, and weeds. Nitrogen application was greater in 1997 (104 kg ha−1) than in 1996 (50 kg ha−1), originally for the purpose of evaluating the performance of the SYP-BL model in different nitrogen environments. A complete set of physiological model-input traits were determined for each RIL in 1997; only Post-F and LNC were measured in 1996. The model output traits, yield (at 14% moisture content) and total shoot biomass, were measured in both years. Some RILs lodged after flowering in 1997 and the lodging severity was scored during the postflowering period. Additional details of the field trials are given by Yin et al. (2000).

QTLs for model-input traits

QTLs for the physiological model-input traits were identified as described in earlier reports (Yin et al., 1999a,b), using an approximate multiple QTL mapping method assuming no epistasis (Jansen, 1993; Jansen & Stam, 1994), as implemented in the software MapQTL 3.0 (Van Ooijen & Maliepaard, 1996). Because root weight was not measured in our experiments, biomass partitioning between root and shoot was ignored. QTLs for the six other physiological model-input traits (Pre-F, Post-F, LNC, SLA, FPleaf, and FPspike), identified using the 1997 data, are summarized in Table 1. Of these traits, SLA and biomass partitioning coefficients were DS-dependent; QTL analysis was therefore conducted at several DS for each of these traits (Table 1). Most of the traits were found to be associated with the major dwarfing gene, denso (also designated as sdw1, Franckowiak, 1996), which was segregating in our population. The parental variety Prisma carried the mutant (dwarf) denso allele. This gene was mapped at 126.4 cM on chromosome 3 (3H) of our marker map (Yin et al., 1999b), by segregation analysis of the distinctive prostrate juvenile growth habit (Haahr & Von Wettstein, 1976).

Table 1 QTLs for physiological model-input traits, identified using the 1997 data. Traits are defined in Abbreviations. Estimated QTL positions are relative to the marker map described by Yin et al. (1999b). QTLs for SLA were reported by Yin et al. (1999a); those for other traits by Yin et al. (1999b). Only LOD-profile peaks of at least 2.5 are included. No QTLs were identified for traits SLA at DS = 0.20, DS = 0.50, and DS = 0.80, and FPleaf at DS = 0.10. Additive effect = (mean of lines with Prisma allele − mean of lines with Apex allele)/2. The percentage of phenotypic variation (R2adj) accounted for by the joint effects of QTLs was estimated by two methods: Method Q, weighted regression on QTL genotypes; Method M, regression on genotypes of markers closest to the QTLs. h2, estimated broad-sense heritability of line mean

Predicting model-input trait values

We could not readily obtain predicted trait values for each RIL from the original QTL-mapping analyses because neither fitted values nor all quantities needed for computing them are included in the output of MapQTL. More importantly, interactions among QTLs were not modelled in the mapping stage. However, once attention was restricted to a limited set of putative QTL positions, we could estimate joint genotype probabilities for the QTLs readily and use these to model simultaneously the effects of multiple QTLs. From these models, we obtained predicted trait values while allowing for possible interactions among QTLs.

Estimating joint QTL-genotype probabilities

We used information on a framework of 190 AFLP markers (Yin et al., 1999b) to estimate the joint genotype probabilities for a set of previously identified QTLs for each trait. Let Xj be a binary random variable taking value 1 if a RIL has genotype QQ and 0 if the RIL has genotype qq at QTL j (j=1, 2, …, J). In addition, let h represent the complete set of marker genotypes in a RIL. Then we require, for each RIL, probabilities of the form, pr(X1=x1,…,XJ=xJ | h), where xj ∈ {0, 1}.

For a single QTL, the probability, pr(Xj=xj | h), of each of the possible QTL genotypes in each RIL, conditional on the observed marker genotypes in that RIL, was estimated using the recombination frequencies between the QTL position and the markers flanking that position. If the score for a flanking marker was missing in a particular RIL, the scores for additional markers linked to that position were used to supply the missing information. Such an ‘all-marker’ procedure has been described previously by Jansen & Stam (1994) and equivalently, by Haley et al. (1994)). Because we are working with RILs, r′, the probability of an odd number of observable crossover events in an interval over eight generations of single-seed descent, must be used in these calculations in place of r, the usual recombination frequency per meiosis. We assumed no crossover interference, so that recombination in one interval was taken to be independent of recombination in any other interval. For computational simplicity, we made two additional approximations. First, we computed r′ as 2r/(1 + 2r), a relation that holds strictly only for an infinite number of generations of single-seed descent. Second, if a flanking-marker score was missing, we used at most the five most closely linked markers to the left of the QTL or the five to the right of the QTL to supply additional information for estimating the QTL genotype probabilities.

For independently segregating QTLs, or QTLs separated by at least one marker having a non-missing genotype score, the joint probability, pr(X1=x1,…, XJ=xJ | h), can be obtained by multiplication of the marginal probabilities, pr(Xj=xj | h), j=1, 2,…,J. Otherwise, the joint probability is obtained by relations such as

and obvious extensions for more than two QTLs. Conditional probabilities such as pr(X1=x1 | X2= x2, h) were computed by adding QTL 2 to a marker map as if it were a marker, once separately for each possible genotype at QTL 2, and then estimating pr(X1=x1 | h′) as before, where h′=(X2, h). All calculations were done using custom-written functions in S-PLUS (Data Analysis Products Division, MathSoft, 1997).

Using estimated QTL-genotype probabilities to predict trait values

We obtained predicted trait values by (a) replicating the trait value for each RIL so that there was one copy (case) for each multiple-QTL genotype with nonzero probability; and (b) performing weighted multiple regression of the observed trait values on the set of possible QTL genotypes, using the joint QTL-genotype probabilities as weights.

From one to four putative QTLs were identified for each of the traits used in this study (Table 1). For traits with two or three QTLs, all interaction terms were included in the regression models. For the one trait (Pre-F) with four QTLs, only two-way interactions were included in the regression model used for prediction. The predictions obtained using the full model containing all interaction terms differed little from those obtained from the restricted model containing only up to two-way interactions.

The fraction of total variance explained jointly by the QTLs identified for a trait was estimated by the adjusted multiple-correlation coefficient (R2) for weighted multiple regression on the QTL genotypes (Method Q). Most statistical computer packages do not use the correct value for the degrees of freedom when computing the mean-square values, and thus R2 values, for a weighted regression model. The total degrees of freedom in our context should be the sum of the weights, not the total number of observations after replication of RILs. Therefore, care must be taken when obtaining an adjusted R2 value for such a model. This estimate was compared with a more commonly used estimate, the adjusted R2 for multiple regression on the genotypes of the marker nearest to each QTL (Method M).

Evaluation of the QTL-based crop growth model

Physiological model-input trait values predicted by the above method were used in the SYP-BL model to replace the original measured model-input values. This generated a QTL-based crop growth model for barley, QTL-BL.

Performance of both the SYP-BL and the QTL-BL models was evaluated first against the observed output (yield and shoot biomass) from the 1997 experiment; this was the experiment from which the QTLs for model-input traits had been identified. For this evaluation, traits measured in 1997 provided all the physiological input to SYP-BL; traits predicted by QTL genotypes provided all the physiological input to QTL-BL. Because lodging, which occurred in 1997 for many tall RILs, was not considered as a physiological trait, observed lodging scores were used as input to both models.

To assess external validity of the models, performance was next evaluated against the observed output from the independent 1996 experiment. Values of the traits Pre-F, SLA, FPleaf, and FPspike measured in 1997 provided input to SYP-BL; values for these four traits predicted by QTL genotypes provided input to QTL-BL. However, since nitrogen application differed between the years, values of LNC from 1997 were not appropriate for 1996. Therefore, predictions from both the SYP-BL and QTL-BL models for 1996 were obtained using measured values of LNC from 1996. Similarly, due to the lower nitrogen application in 1996, Post-F was shorter in 1996 than in 1997 (Yin et al., 2000), as a result of nitrogen translocation from vegetative organs to meet nitrogen requirements for grain growth (Sinclair & De Wit, 1975). Therefore, values of Post-F measured in 1996 also were used when obtaining predictions from both the models for 1996. One replicate per RIL from the 1996 field trial was excluded from all analyses because heterogeneity in soil fertility in the blocks containing those replicates led to large differences in plant nitrogen status among RILs (Yin et al., 2000), which obscured the influence of the physiological model-input traits.

Results

Joint QTL effects

Estimates of broad-sense heritability for the mean over replicate plots of physiological model-input traits for which QTLs were identified ranged from 0.31 to 1.00 (Table 1). The heritability of shoot biomass was 0.73 and that of yield was 0.93.

The percentage of phenotypic variance explained by segregation at a set of QTLs for each physiological model-input trait is given in Table 1. The estimates based on marker genotypes (Method M) were very similar to, and never less than, those based on QTL-genotype probabilities (Method Q). The two methods gave substantially different results only for the trait LNC (Method Q: 19%, Method M: 32%). The proportion of phenotypic variance explained ranged from 12 to 89% for the various traits.

Performance of crop growth models

The 1997 yields predicted by the SYP-BL model correlated well with the observed values (Fig. 1a, Table 2), although clearly there remained substantial variation in the observed values that was not explained by the model. The correlation was due largely to the accurately predicted effect on yield of segregation at the denso locus. An analysis in which we fit a separate slope for each denso genotype class showed no significant correlation between observed and predicted yield within either class (Table 2). The correlation between the 1997 yields predicted from the QTL-BL model and the observed yields was slightly higher than that between the SYP-BL predictions and the observed values (Fig. 1b, Table 2). Again, this correlation was due largely to the accurately predicted effect of segregation at the denso locus. The range of predicted values from the QTL-BL model for each denso genotype class was smaller than that from the SYP-BL model (Fig. 1a,b).

Fig. 1
figure 1

Comparison between observed values of yield (in tons/hectare) and those predicted by the SYP-BL and QTL-BL models, for the 1997 and 1996 experiments. •, Prisma denso genotype; ○, Apex denso genotype. Simple linear regression lines for all RILs combined (– – – –), the Prisma denso-genotype class (——), and the Apex denso-genotype class (——) are shown.

Table 2 Correlation between observed trait values (yield and shoot biomass) and those predicted by models SYP-BL and QTL-BL. Correlation Coefficient, estimated Pearson correlation coefficient; F, F-statistic to test whether there was a significant linear relationship between observed and predicted trait values; P, P-value for the F-test; a + b, value for all lines, regardless of denso genotype; a, value for lines with Prisma denso allele; b, value for lines with Apex denso allele. The model and residual degrees of freedom for all F-tests are 1 and 45, respectively

The correlations between observed yield and yield predicted by SYP-BL and QTL-BL for all lines in the independent 1996 experiment were much less than those for 1997 (Fig. 1c, Table 2). Here, however, the correlation was not due to the effect of segregation at the denso locus, since we observed no such effect. We observed, rather, significant correlation between observed and predicted yield for lines with the Prisma denso allele but not for those with the Apex allele (Table 2). As in 1997, the correlation between the yields predicted from the QTL-BL model and the observed yields (Fig. 1d, Table 2) was slightly higher than that between the SYP-BL predictions and the observed values.

Compared to yield, the 1997 shoot biomass predicted by the SYP-BL model correlated less well with the observed values (Fig. 2a, Table 2). We detected no significant correlation between observed and predicted shoot biomass within each denso genotype class (Table 2), indicating that this correlation also was due solely to the accurately predicted effect on shoot biomass of segregation at the denso locus. The correlation between the 1997 shoot biomass values predicted from the QTL-BL model and the observed values (Fig. 2b, Table 2) was slightly higher than that between the SYP-BL predictions and the observed values. Again, this correlation was due solely to the accurately predicted effect of segregation at the denso locus.

Fig. 2
figure 2

Comparison between observed values of total shoot biomass (in tons/hectare) and those predicted by the SYP-BL and QTL-BL models, for the 1997 and 1996 experiments. •, Prisma denso genotype; ○, Apex denso genotype. Simple linear regression lines for all RILs combined (– – – –), the Prisma denso-genotype class (——), and the Apex denso-genotype class (——) are shown.

In contrast to 1997, we observed no significant correlation between observed shoot biomass and predicted values from either the SYP-BL or QTL-BL models for all lines in the independent 1996 experiment (Table 2). Just as for yield, we observed a significant correlation between observed and predicted shoot biomass for lines with the Prisma denso allele only (Table 2). Shoot biomass predictions from the QTL-BL and SYP-BL models were equally well correlated with observed values. The effect on observed shoot biomass of segregation at the denso locus in 1996, although much less than that in 1997, appeared to be in the opposite direction (Fig. 2). In 1997 observed shoot biomass for the lines with the Prisma (dwarf) denso allele was on average higher than that for lines with the Apex allele (mean 13.37 t/ha vs. 12.53 t/ha). In 1996, in contrast, it was on average slightly lower (mean 8.87 t/ha vs. 9.05 t/ha). Neither SYP-BL nor QTL-BL was able to account adequately for this interaction between denso effect and year. Hence, predicted shoot biomass was on average higher for the lines with the Prisma denso allele in both 1997 (mean 13.00 t/ha vs. 12.21 t/ha for SYP-BL) and 1996 (mean 9.82 t/ha vs. 8.85 t/ha for SYP-BL). This had the net effect of reducing the correlation between observed and predicted values in 1996 when all lines, irrespective of denso genotype, were considered together.

We directly compare the predictions of the two crop growth models in Fig. 3. For both yield and shoot biomass in 1997, the predictions of the two models were highly correlated (Fig. 3a,b). This was also the case within each of the denso genotype classes, although the correlation coefficients are reduced. The slopes of the regression lines are less than one, reflecting that on average the QTL-BL model produced less extreme predictions than did the SYP-BL model (Fig. 3a,b). In 1996, the correlations between predictions of the two models were extremely high (Fig. 3c,d). Correlations based on all values and those based on the values for the individual denso genotype classes were almost the same.

Fig. 3
figure 3

Comparison between predictions of yield and total shoot biomass from the QTL-BL model and those from the SYP-BL model. For the 1997 experiment, each point represents the mean of two replicates for a single RIL. For the 1996 experiment, each point represents a single plot of a RIL. •, Prisma denso genotype; ○, Apex denso genotype. Simple linear regression lines for all RILs combined (– – – –), the Prisma denso-genotype class (——), and the Apex denso-genotype class (——) are shown. Pearson correlation coefficients are shown for all points (denso genotype a + b), for only RILs with the Prisma denso allele (genotype a), and for only RILs with the Apex denso allele (genotype b).

Discussion

Prediction of input-trait values from estimated QTL effects

We introduced in this study a methodology for developing QTL-based crop growth models, by using both additive and epistatic effects of QTLs identified for individual model-input traits. We used a two-stage procedure for predicting model-input trait values from QTL effects. First, we identified putative QTL positions by mapping analyses that allowed for multiple QTLs in an approximate manner, but not for interactions among QTLs. Second, we estimated QTL-genotype probabilities for putative QTLs identified in the first stage, and used these probabilities to predict trait values, while allowing for interactions among QTLs. Possible failure to detect some QTLs was the price we paid for ignoring interactions in the first stage.

In principle, predicted trait values could have been obtained directly from first-stage multiple-QTL mapping analyses that did include interactions among QTLs. However, the difficulties of model selection (e.g. deciding which interactions to include) and the computational burden would be exacerbated in the context of a full optimization in multiple dimensions, in which attention were not restricted to a preselected set of map positions.

A common way of inferring a QTL genotype is to use the genotype of its nearest neighbouring marker (e.g. Bachmann & Hombergen, 1997). Such an approach assumes no recombination between markers and a QTL; violation of this assumption will lead to less accurate trait predictions. Moreover, it is not clear what to do in such an approach if scores for markers linked to a QTL are not available in a particular line. Missing marker scores are no problem for our method; if no marker information is available, QTL-genotype probabilities are estimated to be 0.5.

For all model-input traits except LNC, regression on marker scores (Method M) yielded predicted trait values almost identical to those from the weighted regression on QTL genotypes (Method Q) (results not shown). This is consistent with the similarities between the adjusted R2 values for the two methods (Table 1). This presumably is because there was at least one marker closely linked to each of the QTLs for those traits, and thus recombination between a QTL and its nearest marker was negligible. For LNC, however, the two methods yielded quite different predicted trait values, and different adjusted R2 values (Table 1). This probably is because two of the QTLs identified for LNC were located near the centre of 40- and 10-cM marker intervals, respectively. For such a case, our method Q should provide more accurate predictions of trait values; it is unclear which of the methods, M or Q, yields a more accurate estimator of explained variance. Inaccuracies in the method-M estimator arise from ignoring recombination between markers and QTLs. However, Xu (1995, 1998) has shown that R2 estimators from regression mapping of QTLs tend to underestimate the true value. The method-Q estimates reported here always exceeded the corresponding estimates from regression mapping (results not shown), but the properties of this estimator remain to be studied in more detail.

Given that QTL-genotype probabilities have been obtained by some suitable method, there also are various possibilities for how they are used to predict trait values. We used a so-called ‘expansion’ method (Chasalow & Dourleijn, 1997), but regression on the probabilities themselves, such as is done in regression mapping of QTLs (Haley & Knott, 1992; Martinez & Curnow, 1992), or iteratively reweighted least squares (Xu, 1998) may be reasonable alternatives. How all these possible approaches for predicting trait values based on QTL genotypes compare when judged by a criterion such as prediction error requires further study.

Performance of the QTL-based crop growth model

For both yield and shoot biomass, the high correlations between predicted values from the SYP-BL and QTL-BL models (Fig. 3) suggest that input traits estimated using QTL information can successfully replace measured input parameters. The range of output values predicted by the QTL-BL model was smaller than that for the SYP-BL model (Fig. 3) because the range of estimated input traits was smaller than the range of the measured values (results not shown). The difference in performance between QTL-BL and SYP-BL for 1996 was smaller than that for 1997 (Fig. 3). The close agreement of the two models in 1996 was probably due, at least in part, to the fact that in 1996 (but not in 1997) measured values for the model input traits LNC and Post-F were used in both models. These two traits varied substantially between the years for a given RIL, due to different nitrogen applications in the two years (Yin et al., 2000). Thus the QTL-based estimates of values for these traits were not appropriate for 1996. The dependence on environment of some model-input traits suggests an inadequacy of the model. Since model-input traits are considered as genetic coefficients (Hunt et al., 1993), the model should be improved such that its physiological input traits are solely genetically determined (Yin et al., 2000).

Discrepancies between observed and predicted values can arise from errors in the model-input parameters. The measured input traits used in the SYP-BL model included measurement and environmental errors. Random errors in measured values of physiological traits, relative to morphological and agronomic traits, can be especially large when, as is often the case, measurements require several steps and are obtained from a small sample of plants in a plot. This also is a likely reason that the heritabilities of model-input traits were less than those of the output traits yield and shoot biomass. The QTL-based input parameters used in the QTL-BL model contained less measurement and environmental error than did the measured values. This might explain why the QTL-BL model performed slightly better than the SYP-BL model (Table 2). However, values of the QTL-based input parameters included additional errors caused by ignoring some genetic effects. Moreover, in using input parameters estimated from QTL effects in 1997 to predict output traits for 1996, we assumed an absence of QTL–year interactions. The QTL analysis of Yin et al. (1999b) showed some evidence for such interactions: the magnitudes of the additive effects of some QTLs differed between the years. It might be informative to test the model performance in 1997 using the effects of QTLs identified in 1996. Measurements in 1996, however, do not allow full estimation of all the required model input traits.

Storms and high nitrogen conditions in 1997 caused severe lodging, giving advantage to the short (mutant) denso genotypes (Yin et al., 2000). This was predicted adequately by both the SYP-BL and QTL-BL models (Figs 1a, 1b, 2a, 2b). However, within each denso genotype group, there was no significant association between observed and predicted values for the high-yielding conditions of 1997 (Table 2). This suggests that the models are not sufficiently robust for practical use in breeding, given the desire of breeders to select for high yields within short-strawed lines. For the low-yielding conditions of 1996, in contrast, there was little effect of denso genotype on yield (Fig. 1c,d). There was an effect on shoot biomass in that year: mutant denso genotypes had on average smaller values than did the nonmutant genotypes (Fig. 2c,d), an effect opposite to that seen in 1997. However, both models predicted that these genotypes would have larger values in both years. Because the models differ in physiological model-input trait values but not in model structure, such dependencies of the performance of both models on environmental conditions most likely reflect deficiencies in model structure.

Because the measurement of root mass was not feasible for such a large number of individual RILs under field conditions, the model-input trait for root-shoot partitioning was assumed not to be RIL-specific (Yin et al., 2000). Invalidity of this assumption might have been an important source of error in the model predictions of differences among RILs in different environments. Extension of crop growth models to account for genotypic differences in the effects of environment on root/shoot partitioning would be an important advance. Furthermore, our models use only the well-tested routines for predicting biomass production (source) developed over the last decades by the Wageningen crop modelling group. An earlier analysis (Yin et al., 2000) found that most source-determining input traits (including LNC, SLA, FPleaf, and Post-F) were not significantly related to yield. Recent evidence from other studies indicates that processes not included in our model, especially those related to sink capacity, also are important for determining yield potential (e.g. Bindraban, 1997). Yield differences among RILs within each denso allelic group may have been due to their differences in sink capacity. Further physiological studies are needed to quantify root–shoot relations and identify additional important factors that have not yet been incorporated into the model.

Future prospects

Results of this study and an earlier report (Yin et al., 2000) provide clues to improving the structure of the crop growth model, so that it can better explain yield differences among the relatively similar lines in a single segregating population. Development of improved crop growth models is an important challenge for the future. Using such an improved model, combined with the approach described in this report for incorporating QTL information, we should be able to identify the most promising multiple-QTL genotype, whether or not this genotype already exists in our population. This is the genotype for which the predicted model-input trait values would give the highest predicted yield. For instance, for the 12 genome regions harbouring segregating QTLs for yield-determining traits used in the existing crop growth model (Table 1), a total of 4096 (212) joint QTL genotypes are possible. It is therefore extremely unlikely that the most favourable genotype was present in the set of 94 RILs we used. However, we could pyramid the favourable QTL alleles, thus creating desirable genotypes, by crossing a number of suboptimal RILs that are complementary with respect to their QTL genotypes. Van Berloo & Stam (1998) have described a procedure to identify the most promising pairs of RILs; those which, upon crossing, have the largest probability of producing the most favourable genotype among their offspring.

In general, extrapolation of QTL mapping results to other segregating populations is not straightforward. In another cross not only may a different set of QTLs segregate, the alleles of QTLs that are in common between the crosses also may differ. In addition, as the genetic background in another cross will, in general, be different, the effect of specific QTL alleles may vary when moving from one cross to the next, particularly when epistatic effects are important. Nevertheless, Virk et al. (1996) showed that quantitative variation of many agronomic traits in the rice germplasm is associated with allelic variation of DNA markers. This indicates that marker-trait associations not only may be present in segregating populations, but can also be manifest across a germplasm collection of a crop species. At present it is not clear to what extent this phenomenon generalizes to crops other than rice. If it turns out to be more general, the approach described in this paper may possibly be applicable not only to a segregating population, but to a germplasm collection as well.