Introduction

Given the importance of cereal grain seeds as the staple food and nutrition resources for humans and animals, and raw materials for food industry, understanding the genetic architecture underlying the development of crop seed traits becomes increasingly demanding in crop breeding program (Benner et al., 1989; Mazur et al., 1999; van der Meer et al., 2001). Seed development starts at double fertilization of the embryo sac in which two sperm cells fuse with a female gametophyte, the egg and central cell, respectively, giving rise to the diploid (2n) embryo and the triploid (3n) endosperm. One sperm cell fuses with the egg cell to produce a zygote and then divide asymmetrically to form embryo and its suspensor, respectively. The other sperm cell merges with the central cell to form a triploid endosperm nucleus that possibly develops into endosperm in most flowering plants such as rice and wheat through two steps: a coenocytic stage followed by a cellularized and differentiated stage (Olsen, 1998; Sundaresan, 2005). Although the diploid embryo and the triploid endosperm are enclosed in the maternal integument that is a part of seed, development of seed depends on the allocation of nutrient and other physiological active substances from the mother plant. Proper development of a seed trait requires coordinated communications among the involved components, embryo, endosperm and maternal plant. Thus, in order to thoroughly understand how seed traits are genetically determined, it is provident to consider the integrity of these components in the genetic models (Bazzaz, 2001).

It is well evidenced that additive and/or epistatic maternal effects, embryo effects, endosperm effects and gene–environment interaction effects are involved in the genetic development and evolution of crop seed (Zhang et al., 1999; Shi et al., 2000; Cui et al., 2004; Cui and Wu, 2005a,2005b Cui et al., 2006). A considerable body of genetic models and statistical methods have been developed and applied to the analysis of real data. On the basis of genetic features of seed characters, Mo (1995) advocated a statistical genetic model that focuses on partitioning the phenotypic variance of the endosperm traits into various genetic and environmental factors. Zhu and Weir (1994) further proposed the mixed linear model approach to analyze the maternal, embryo, endosperm and cytoplasm effects, and their interactions with environment underlying seed traits collected from a diallel cross experiment. The above approaches can only dissect the genetic variation of seed traits into several cumulative components, but fail to provide detailed information at the individual gene level such as the positions and effects of quantitative trait loci (QTLs) as all the genes controlling the seed traits were analyzed as a whole. Advancements in molecular markers and statistical methods enable us to partition the total genetic variation into the effects of individual QTLs, based on co-segregation between molecular markers and the putative QTL (Lander and Botstein, 1989; Zeng, 1994). A number of researchers applied traditional diploid genetic model to map QTLs underlying endosperm traits (Sourdille et al., 1996; Parker et al., 1998; Araki et al., 1999; Sene et al., 2001; Tan et al., 2001), until tailored QTL mapping methods were developed for seed traits. Genetically, endosperm represents the next generation developed in maternal plants, with a more complex segregation patterns and genetic mechanism than diploid tissues. For example, a biallelic locus (Q-q) has four possible genotypes, namely, QQQ, QQq, Qqq and qqq, and three kinds of endosperm genetic effects can be involved, namely additive effect (a), the first dominance effect (d1) and the second dominance effect (d2). Bases on such genetic basis, Kao (2004) proposed a statistical method using multiple-interval mapping that took triploid nature of endosperm into account. It can estimate all three kinds of endosperm effect. However, one more important feature ignored in this method and others is that endosperm is a tissue developed in its maternal plant, implying that its phenotypes may be affected not only by its own triploid endosperm genotypes, but also by the diploid maternal genotypes via supplies of nutrition and other physiological active substances. This point of view has been supported by several studies on model species such as Arabidopsis and maize (Letchworth and Lambert, 1998; Garcia et al., 2005; Ohto et al., 2005). To take maternal effects into account, it is theoretically desirable to integrate both maternal genome and offspring genome into one model. Hu and Xu (2005) proposed a mapping method to characterize genetic effects of maternal genome on offspring traits that incorporates both the quantitative genetic model for diploid maternal traits and that for triploid endosperm traits into a unified QTL mapping framework. Wen and Wu (2007) further proposed a method of interval mapping of endosperm traits based on a two-stage hierarchical design that considers both the endosperm effects and the maternal effects. In addition, Cui and Wu (2005a) and Cui et al. (2006) proposed an epistasis model for mapping endosperm QTLs in a double backcross population. The above approaches do not include genetic effects of all QTLs in the model and can only give the information about individual or one-paired QTLs, such as position, the maternal additive/dominance effect, the endosperm additive effect and the first and the second dominance effect. Therefore, these methods still possibly suffer from limited power and precision in crop seed QTL mapping, because only one or one-paired putative QTLs are tested at a time, and the genetic background control has not been considered. If there is more than one QTL on a chromosome, the testing of one QTL at a position will be distorted by all other QTLs and thus the positions and effects of the QTL are likely to be biased. Moreover, QTL-by-environment (QE) interactions, another important factor for complex traits, are ignored in these methods, potentially leading to a biased estimation of QTL parameters.

In the present study, we propose a new method for systematically mapping QTLs underlying crop seed traits. We integrate maternal and offspring genome as well as the QE interaction effects into one unified QTL model. The Monte Carlo simulations were conducted to investigate the effectiveness, the stability and the efficiency of our model and methods. A case application from a cotton breeding experiment was demonstrated for usefulness of the proposed method. A computer software, QTLNetwork-Seed (http://ibi.zju.edu.cn/software), was developed for mapping QTL of seed traits.

Materials and methods

Genetic model

For a specific mapping population derived from two pure lines (P1 and P2), the genetic experiment is conducted in t different environments each with b blocks. Suppose the surveyed quantitative trait of seed is controlled by the genetic effects of maternal and embryo of s QTLs (Q1,Q2,……,Qs), the phenotype (yhij) of the i-th strain in the j-th block of the h-th environment can be expressed by the following QTL model,

where μ is the population mean; and are the maternal additive and dominance effects of Qk, with coefficients and , respectively; and are the embryo additive and dominance effects of Qk, with coefficients and , respectively; is the random effect of the h-th environment, eh∼(0, ); and are the interaction effects of the additive and the dominance of maternal QTL with environments, respectively, ∼(0, ) and ∼(0, ); and are the interaction effects of the additive and the dominance of embryo QTL with environments, respectively, ∼(0, ) and ∼(0, ); is the j-th block effect within the h-th environment; and ɛhij is the random residual effect, ɛhij∼(0,).

If the trait is controlled simultaneously by the genetic effects of maternal and endosperm QTLs, the model can be represented as follows,

where , and are additive effect, first-order dominance effect (interaction between QkQk and qk) and second-order dominance effects (between Qk and qkqk) of endosperm QTL, respectively, with coefficients , and ; , and are the interaction effects of the additive, the first- and the second-order dominance effects with environments, respectively, ∼(0, ), ∼(0, ), and ∼(0, ); the other terms have the same definitions as those in model (1). It should be pointed out that when a QTL is involved in both maternal and endosperm effects, and cannot be distinguished in this full model, and thus we need to reparameterize them with the constraint of equal to , as well as their interaction effects with environment with the constraint of equal to .

In the above models, all coefficients of QTLs will be determined by the QTL genotype. If QTL genotypes are QQ, Qq and qq, the coefficients are 1, 0 and −1, respectively, for maternal or embryo additive effects, and 0, 0.5 and 0 for maternal or embryo dominance effects. The coefficients of endosperm additive, the first and the second dominance effects are 1.5, 0 and 0 for the endosperm QTL genotype QQQ, 0.5, 1 and 0 for the QQq, −0.5, 0 and 1 for the Qqq, −1.5, 0 and 0 for the qqq, respectively. In practice, the QTL genotype is usually unknown; their probabilities can be inferred based on the flanking marker genotype.

Mating designs

Several mapping populations are applicable to dissect various effect components in seed traits. We suggest to use the populations derived from an immortalized F2 (IF2) of which genetically identical individuals can be regenerated for replicates and be phenotyped in different environments for detection of gene–environment interactions.

Double back-cross of IF2

In this design, IF2 lines, as maternal parent, are crossed with their two homozygous parents (P1 and P2) as paternal parents. Marker information of IF2 and its offspring as well as the phenotype of offspring are employed to perform QTL mapping. There are eight types of QTL genotype combinations in maternal and in embryo (endosperm) and its offspring plants: QQ:QQ(QQQ), Qq:QQ(QQQ), Qq:Qq(Qqq), qq:Qq(Qqq), QQ:Qq(QQq), Qq:Qq(QQq), Qq:qq(qqq) and qq:qq(qqq); the first four and the last four combinations derive from back-cross of IF2 line with P1, and with P2, respectively. We denote pk|ij as the conditional probability of the k-th QTL genotype combination on the i-th marker type in maternal plant and the j-th marker type in offspring generation and the expectations of QTL effect coefficients can be calculated based on the formula in Table 1, where k indexes the maternal and offspring QTL genotype combination (k=1, 2, …, 8), i indexes the maternal genotype at the flanking markers (i=1, 2, …, 9) and j indexes the offspring genotype at the flanking markers (j=1, 2, …, 9). The conditional probabilities of eight QTL genotype combinations, pk|ij, are summarized in Table 2 for the IF2 from random mating of double haploid (DH) lines. If the IF2 is from random mating of recombinant inbred lines (RILs), instead of the recombination rates, the proportions of recombinant zygote should be used in Table 2. As shown in Table 2, p1|ij+p2|ij+p3|ij+p4|ij=1 and p5|ij=p6|ij=p7|ij=p8|ij=0 in BC1 (IF2 × P1) lines and p5|ij+p6|ij+p7|ij+p8|ij=1 and p1|ij=p2|ij=p3|ij=p4|ij=0 in BC2 (IF2 × P2) lines.

Table 1 Coefficients of QTL effects based on the conditional probabilities of QTL combinations on two flanking marker genotype for selfing and double back-cross populations of IF2
Table 2 Conditional probabilities of QTL genotype combination on the genotypes of flanking markers in both IF2 and its double-cross offspring plants

Selfing of IF2

In this design, we use marker information of two generations, parental generation (IF2) and its offspring (F3), and the phenotype of seed (F3) to perform QTL mapping. There are totally six types of combination for QTL genotypes in maternal genome (IF2), embryo genome and endosperm genome (F3): QQ:QQ(QQQ), Qq:QQ(QQQ), Qq:Qq(QQq), Qq:Qq(Qqq), Qq:qq(qqq) and qq:qq(qqq). According to the formula for calculating conditional probabilities by Wen (Wen et al., 2007), we denote pk|ij as the conditional probability of QTL genotype combination on the two flanking marker genotypes in IF2 and F3 plants, in which i stands for the index of marker genotype in IF2 plant, j stands for the index of marker genotype in F3 plant (i, j=1, 2, …, 9) and k stands for the index of QTL genotype combination (k=1, 2, 3, 6, 7, 8); then, the expectations of coefficients for QTL effects in models (1) and (2) could be obtained by using Table 1.

Random mating of IF2

In QTL mapping for seed traits, random mating design is also a common choice. For the random mating of IF2 lines, we could also apply mixed linear model approach to analyze the QTL effect, in which those coefficients are more complicated than those in the above systematic or planned mating designs. Suppose a cross between two lines, sampled at random from the IF2 with the i-th marker genotype in maternal genome and the j-th marker genotype in paternal genome for one marker interval (Table 3; i, j=1, 2, … 9), respectively, we denote pk|i as the conditional probability of the k-th QTL genotype (k=1, 2, 3) on the i-th marker genotype of the maternal genome and pl|j as the conditional probability of the l-th QTL genotype (l=1, 2, 3) on the j-th marker genotype of the paternal genome. Let pk|i˙l|j=pk|ipl|j, then the expectations of the coefficients of QTL effects could be calculated as follows: (p1|i−p3|i) is the expectation of coefficient for maternal additive effect; p2|i for the maternal dominance effect; for the embryo additive effect; for the embryo dominance effect; for the endosperm additive effect; for the endosperm first-order dominance effect; for the second-order dominance effect.

Table 3 Conditional probability of QTL genotype given a pair of flanking marker genotypes in the populations of IF2 from random mating of DH or RIL lines

Mapping strategy

In QTL mapping, the QTL genotype at a given locus is not observed; we could not determine the exact coefficients for the QTL effects in the above models. One reasonable method is to substitute them with their expectations based on the flanking marker genotypes. On the other hand, we still need to know the number of QTLs associated with the seed trait of study and their positions on the chromosome(s). Therefore, we first conduct QTL position scanning across the whole genome using a walking step of 1 cM with prescreened significant marker intervals as cofactors, and then construct the QTL full model to analyze QTL effects. The procedure is outlined as follows, and for more details please refer to the literature of mapping QTL of complex traits (Yang et al., 2007).

QTL position scanning

Similar to the composite interval mapping (Zeng, 1994), we perform scanning on the QTL position using marker information. First, we treat each interval consisting of paired adjacent makers as an integrated effect in the regression model and conduct regression analysis of phenotype on each marker interval. F-statistic was calculated and used to screen potential candidate marker intervals by the permutation method. Based on the screened potential candidate intervals, multivariate regression was performed again and significant candidate marker intervals harboring QTLs were finally screened via the conventional stepwise method (Yang et al., 2007). Second, with the selected candidate intervals as cofactors, QTL position scanning is performed with a walking step such as 1 cM across the whole genome. When one putative QTL is tested, all other predetected significant QTLs will be included in the model to control genetic noise from background QTLs. Significant threshold is specified by permutation testing. Without loss of generality, we use the model (2) to illustrate the analysis. The following model is used to identify the significant marker intervals on the whole genome,

where yhij is the phenotype of the i-th line in the j-th block under the h-th environment; μh is the population mean under the h-th environment; t (t=1,…, T) stands for the t-th marker interval to be tested in T intervals; () and () stand for the maternal additive and dominance effects for the right marker (left marker) in h-th environment, with coefficients () and () respectively; (), () and () stand for the endosperm additive effect, first-order dominance effect and second-order dominance effect for the right marker (left marker) in the h-th environment with coefficients (), () and () respectively; the other parameters have the same definition as model (2). These ζs corresponding to different marker effects could be determined by the same way for the coefficients of QTL effects after replacing Q and q alleles in QTL genotype with M and m marker alleles. F-statistic based on Henderson III method (Henderson, 1953; Searle, 1968) is calculated to test the significance of marker interval effects, and permutation method is employed to specify the threshold of significance.

Within each marker interval, we scan QTL with a step size of 1 cM along the chromosome. The selected marker intervals are added into the mapping model to control the background genetic effects when scanning QTL position. To test QTL k with all s significant markers considered, we apply the following model,

where and are the maternal additive and dominance effects of the QTL k in the h-th environment; , and are the endosperm additive, the first-order dominance and the second-order dominance effects, respectively; all the other parameters have the same definition in model (2) and model (3). Similar to scanning marker intervals, F-statistic is calculated for each putative QTL position, the F-statistic profile along chromosome will be obtained and the peaks, which reach the criteria of significance by permutation, are selected as candidate QTLs. Finally, we construct a QTL full model that consists of the detected candidate QTLs, and then conduct model selection to eliminate the false positive QTLs. The number of QTLs and their positions will be determined in the final model for estimation of QTL effects.

Genetic effect estimation

With all positions of QTLs identified, we can estimate the maternal QTL effects and endosperm QTL effects or the maternal QTL effects and embryo QTL effects, as well as their corresponding QE interaction effects based on the final QTL full model. To fit the mixed linear model, we first obtain a set of initial estimations of QTL parameters including the variances of random effects by the minimum norm quadric unbiased estimation (Rao, 1971), the fixed effects by the ordinary least square estimation and the random effects by the adjusted unbiased prediction, and then input them as prior values of parameters into the process of the Markov chain Monte Carlo for the mixed linear model by applying Gibbs sampling (Smith and Roberts, 1993). After we get the distribution of each parameter in the mixed linear model, the mean value is set as the final estimation and its significance is tested by t-statistic.

Simulation scenarios

To investigate the efficiency and accuracy of the proposed methods, we performed Monte Carlo simulations to verify the unbiasedness and robustness of our model. Here, double back-cross of IF2 was simulated with 400 genotypes and two environments. The genetic map is set as 5 chromosomes, with 11 markers evenly distributed on each chromosome. The distance between two consecutive markers is set to 10 cM. Six QTLs (Q1, Q2, Q3, Q4, Q5 and Q6) controlling a seed quantitative trait are scattered on the five chromosomes, in which Q5 and Q6 are on the same chromosome. One QTL (Q4) is set with only maternal effects; two QTLs (Q1 and Q5) are set with only endosperm effects; three QTLs (Q2, Q3 and Q6) are set with both maternal and endosperm effects. Different heritabilities and population structures (that is, the ratio of the number of IF2 line and the number of offspring each line) were used in simulations to investigate the influence of heritability and population structure on estimation of QTL parameter. Each simulation scenario was repeated 500 times.

A worked example

To demonstrate the use of the proposed method, we analyzed a set of real data from a cotton (Gossypium hirsutum L.) breeding experiment. A set of 188 recombinant inbred lines were developed from an intraspecific hybrid cross using two upland cotton germplasms HS46 and WARKCBUCAG8US-1-88 as the parents that have wide genetic differences in yield, fiber quality, disease resistance and seed quality traits. In this study, every two lines among the 188 recombinant inbred lines were randomly crossed during flowering to produce 376 IF2 lines in 2009 and 2010 that were used for QTL analysis. Seeds of the IF2 population and two parents were manually harvested at maturity. All the seeds were measured for fiber percentage. The genetic map includes 388 molecular markers resided on 30 linkage groups. It covers a total length of 1946.22 cM, accounting for 41.55% of the whole genome, with an average distance of 5.03 cM between adjacent markers.

As the cotton seed consists of no triploid endosperm tissue, the cotton fiber percentage is mainly governed by the maternal effects and the embryo effects. Model (1) with inclusion of QTL maternal additive effect, embryo additive and dominance effects and their interaction effects with environments and its reduced model with exclusion of QE interaction effects were used in statistical analysis. The critical F-values were calculated to prescreen candidate intervals by permutation testing at the genome-wide significance level of 0.05. The full model was fit by the Markov chain Monte Carlo method.

Results

Estimation of QTL parameter

A summary of results is presented in Tables 4, 5, 6, 7. Under the heritability of 25% and the population structure of 200:2 (200 IF2 lines and 2 offspring each IF2 line), it is clear that all QTL positions could be unbiasedly estimated, the largest bias estimated value−Parameter value (Est.−Par.) is only 0.3 cM, and all the six ratios of bias to true value were <1% (Table 4). The majority of powers of QTL detection reached 100%, and even the lowest still reached 94% for Q4. For the estimation of QTL effects, most estimates were close to the true values of parameter and the estimation accuracies of QTL effects are reasonably good, especially for the QTL main effects (Tables 4 and 6). Most ratios of biases to true values were within 5% and all were <10%. We also found that the estimation accuracies of QE interaction effects appeared to be a little bit worse than those of QTL main effects (Table 6). The interaction effects of maternal and endosperm additive with environment (aem and aee) exhibited relatively higher estimation accuracy than maternal and endosperm dominance interaction with environment (dem and dee); all the ratios of bias to true value were <15% for aemand aee, but >15% for dem and dee.

Table 4 Mapping power and estimates of QTL positions and main genetic effects
Table 5 Estimates of QE interaction effects
Table 6 Estimates of QTL positions and effects under three different heritabilities
Table 7 Estimates of QTL position and effects under three different population structures

Influence of heritability and population structure

The simulations, under other three heritabilities of 75%, 50 and 10% with population structure of 200:2 (200 IF2 lines and 2 offspring each line), showed that the estimates of QTL positions and QTL effects were very close to each other among the three levels of heritabilites (Table 6). All the estimates of QTL position were unbiased. The estimates of maternal additive and endosperm additive effects were also robust for different heritabilities. The s.d. of parameter estimation became a little bit larger, and the power decreased when the heritability decreased from 75 to 10%; especially for the Q4, its power decreased to 76.8% at 10% heritability.

The influence of population structure was also investigated with three population structures 200:2, 100:4 and 50:8 under the heritability of 75%. Similar to the results for different heritabilities, the estimates of QTL positions and of effects still keep unbiased. The estimates of position or effects of QTL were also close to each other among three population constitutions, whereas the s.d. exhibited increasing tendency with changing of structure from I to III (Table 7). For the power, as the population structure changes, it seemed that the most unstable was the QTL with only maternal effect (Q4). With the number of IF2 lines decreasing from 200 to 50, the power of detecting Q4 changes from 97.8 to 39.2%. This result is expected because the sample, to estimate maternal effects, decreases with small number of IF2.

Application to the real data

The genetic main effects and environment interaction effects of all the QTLs are presented in Table 8. Two significant QTLs were located on chromosomes 19 and 21, respectively. Both the QTLs are sensitive to environment. The estimated relative contribution of Q2 is 16.03%. In addition, we found that nearly 6.19% of the phenotypic variation is attributed to the maternal effects of QTL and ∼10.83% to the QE interaction. Furthermore, we used a reduced model without QE interaction effect to reanalyze the data. Although the Q2 was still detected by the reduced model, the corresponding estimated relative contribution is decreased to 8.19%, clearly showing the adequacy of the proposed analysis. Hence, in practical mapping, we suggest to take maternal effect and environment interactions into account when we have no knowledge available on the real genetic architecture of surveyed seed traits.

Table 8 Estimates of QTL position and effects for cotton fiber percentage in a IF2 population from random mating of a set of RIL lines

Discussion

In the present study, we have proposed a new statistical method that takes multiple QTLs and QE interactions into account for mapping QTLs underlying seed traits. It has been well recognized that QE interaction is one important component of the total genetic effect that may cause the phenotype to vary across various environments. QE interactions have been considered in QTL mapping for diploid plant traits (Jansen, 1992; Jansen et al., 1995; Jiang and Zeng, 1995; Cockerham and Zeng, 1996; SariGorla et al., 1997; Yan et al., 1998), but have not been considered for diploid (maternal plus embryo) or triploid (maternal plus endosperm) seed traits. When QE interactions do play a significant role for endosperm traits of interest, the methods that ignore such interactions may lead to biased estimation. The new method integrates the effects of QTLs that are expressed in maternal and/or offspring genome and QE interactions into one mapping framework, and the detected QE interactions can be further dissected into subfactors, such as interaction effects of maternal additive by environment, maternal dominance by environment and so on, that are valuable information for selecting or breeding environment-specific or environment-insensitive varieties by marker-assisted selection.

A major difficulty in developing a powerful statistical approach for mapping QTLs with QE interactions is to handle an increased number of parameters in the statistical model, especially for the random effects. To tackle this problem, we adopted a mixed linear model instead of more commonly used regression models. In spite of more burdensome computation, the mixed linear model approach is able to keep nondecreasing or even increasing statistical power even if the number of parameters is nearly doubled because of inclusion of QE interactions.

Unlike the previous methods (Wu et al., 2002; Xu et al., 2003), most of which are based on one QTL model and thus may have a potential bias in parameter estimation when two or more linked QTLs exist in genome, the proposed method uses a QTL full model to estimate all possible effects of QTLs. False positive QTL(s) could be eliminated by model selection, while higher power and more accurate estimation of QTL effects could be achieved, benefiting from smaller and more accurate estimation of the residual effects because of background control. It should be noted that we have integrated the maternal genome into the model in addition to the offspring genome (Foolad and Jones, 1992; Cui et al., 2004; Hu and Xu, 2005; Wen and Wu, 2007), based on the view that development of embryo/endosperm traits partially depends on its own maternal plant.

It is well documented that epistasis is extensively involved in genetic variation of complex traits. Epistatic effects and their interaction effects with environment have been included in the model for mapping QTL of general crop agronomic traits (Wang et al., 1999; Yang et al., 2007). For mapping epistatic effects of QTL for seed traits, Cui and Wu (2005a) and Cui et al. (2006) proposed a model including interaction effects of one paired QTL in the same genome (either of maternal genome, embryo genome or endosperm genome) or two different genomes based on the double back-cross design. Although our models (1) and (2) currently do not include the epistatic interactions among genes in the different genomes, under current framework of our method, it is in principle feasible to extend the proposed method to cover such epistatic effects and their interaction effects with environments. As such, an extension involves a more complicated QTL full model and also requires intensive work in the development of methodology and software for various experimental designs, and this will be our follow-up study.

Monte Carlo simulations demonstrate that the method could precisely locate the positions of QTLs, and well estimate the genetic effects, even if the environment or gene–environment interaction exists. A satisfactory power of QTL detection is still obtained, despite increased number of parameters in the model due to inclusion of QE interactions as compared with current available methods. According to the simulation results under different heritabilities (Table 6), even in the heritability of 10%, most QTLs (5 in 6) could be detected with power of >98%. As the population structure changes, the power of QTL except Q4 are >99% (Table 7). For different population structures with the same sample size, we can easily conclude that increasing maternal plant number will improve the mapping precision in the estimation of QTL parameter. This is reasonable as variation of trait ascribed to maternal effect decreases with a reduction in the number of maternal plants.

Understanding the genetic architecture of seed development helps us to answer many fundamental questions in higher plants. There is a straightforward application of our model in breeding practice or genetic studies. For example, if the estimated maternal effects surpass offspring effect, more attention should be paid to selection of maternal effects for improving seed traits; if the offspring effects dominate, it is better to directly select the seed for reproduction. But there are some other issues that should be noted. Our method is based on a QTL full model that can control the false positive rate and improve estimation precision of QTL parameter, but for the double backcross of IF2 design, linear dependency will occur between endosperm genetic effects of any pair of QTLs for the model (2), because of xAe−2+2=1.5 or −1.5 for any individual QTL. The simplest way to tackle this problem is to reparameterize, for example, to merge the first-order and the second-order endosperm dominance effects for each QTL, that has been employed in our simulation study. Another potential strategy is to keep the first-order and the second-order endosperm dominance effects of one QTL but to merge the two dominance effects as well as their interaction effects with environment for other endosperm QTL(s). However, the potential strategy will increase computational complexity as the genetic effects of all QTLs cannot be simultaneously estimated, and the QTL full model has to be adjusted for each QTL; on the other hand, the effectiveness of this strategy still need investigating in further study. For the selfing of IF2 design, the p3|ij and p4|ij (Table 1) is compound because of inability to separate the Qq:Qq(QQq) and Qq:Qq(Qqq) based on maker information. Therefore, the first-order and the second-order dominance effects need to merge in the model (2).

The QTL genotypes are unobserved and need to be inferred from the flanking markers. In the traditional fixed models, there are two ways to infer the missing QTL genotypes: the mixture distribution method and the conditional expectation method used in interval mapping of Lander and Botstein (1989) and interval regression approach of Haley and Knott (1992), respectively. Both the methods are considerably effective for QTL parameter estimation. The comparisons between the two methods also support that there exists only a slight difference in most cases (Xu, 1995, 1996; Gessler and Xu, 1996; Kao, 2000). Different from those in a fixed model, the observations are no longer independent of each other in a mixed model that is considered in this study. Although feasible in theory, it is not easy to handle the joint mixture distribution of a set of correlated observations. Therefore, we choose the expectation method in inferring the coefficients associated with the QTL genotypes. Our estimation method is, in essence, an extension of the regression approach of Haley and Knott (1992).

Based on the models and methods proposed in the present study, the computer software named QTLNetwork-Seed was developed in C++ programming language. This software can be run on the most commonly used operation systems. Double back-cross of IF2, selfing of IF2 and random mating for IF2, double haploid or recombinant inbred line designs can be handled by this software for mapping QTLs underlying seed traits.

Data archiving

Data are available from our own public Internet website http://ibi.zju.edu.cn/software/ and from the Dryad Digital Repository doi:10.5061/dryad.8311n.