Abstract
The crop seed is a complex organ that may be composed of the diploid embryo, the triploid endosperm and the diploid maternal tissues. According to the genetic features of seed characters, two genetic models for mapping quantitative trait loci (QTLs) of crop seed traits are proposed, with inclusion of maternal effects, embryo or endosperm effects of QTL, environmental effects and QTLbyenvironment (QE) interactions. The mapping population can be generated either from double backcross of immortalized F_{2} (IF_{2}) to the two parents, from randomcross of IF_{2} or from selfing of IF_{2} population. Candidate marker intervals potentially harboring QTLs are first selected through onedimensional scanning across the whole genome. The selected candidate marker intervals are then included in the model as cofactors to control background genetic effects on the putative QTL(s). Finally, a QTL full model is constructed and model selection is conducted to eliminate false positive QTLs. The genetic main effects of QTLs, QE interaction effects and the corresponding Pvalues are computed by Markov chain Monte Carlo algorithm for Gaussian mixed linear model via Gibbs sampling. Monte Carlo simulations were performed to investigate the reliability and efficiency of the proposed method. The simulation results showed that the proposed method had higher power to accurately detect simulated QTLs and properly estimated effect of these QTLs. To demonstrate the usefulness, the proposed method was used to identify the QTLs underlying fiber percentage in an upland cotton IF_{2} population. A computer software, QTLNetworkSeed, was developed for QTL analysis of seed traits.
Introduction
Given the importance of cereal grain seeds as the staple food and nutrition resources for humans and animals, and raw materials for food industry, understanding the genetic architecture underlying the development of crop seed traits becomes increasingly demanding in crop breeding program (Benner et al., 1989; Mazur et al., 1999; van der Meer et al., 2001). Seed development starts at double fertilization of the embryo sac in which two sperm cells fuse with a female gametophyte, the egg and central cell, respectively, giving rise to the diploid (2n) embryo and the triploid (3n) endosperm. One sperm cell fuses with the egg cell to produce a zygote and then divide asymmetrically to form embryo and its suspensor, respectively. The other sperm cell merges with the central cell to form a triploid endosperm nucleus that possibly develops into endosperm in most flowering plants such as rice and wheat through two steps: a coenocytic stage followed by a cellularized and differentiated stage (Olsen, 1998; Sundaresan, 2005). Although the diploid embryo and the triploid endosperm are enclosed in the maternal integument that is a part of seed, development of seed depends on the allocation of nutrient and other physiological active substances from the mother plant. Proper development of a seed trait requires coordinated communications among the involved components, embryo, endosperm and maternal plant. Thus, in order to thoroughly understand how seed traits are genetically determined, it is provident to consider the integrity of these components in the genetic models (Bazzaz, 2001).
It is well evidenced that additive and/or epistatic maternal effects, embryo effects, endosperm effects and gene–environment interaction effects are involved in the genetic development and evolution of crop seed (Zhang et al., 1999; Shi et al., 2000; Cui et al., 2004; Cui and Wu, 2005a,2005b Cui et al., 2006). A considerable body of genetic models and statistical methods have been developed and applied to the analysis of real data. On the basis of genetic features of seed characters, Mo (1995) advocated a statistical genetic model that focuses on partitioning the phenotypic variance of the endosperm traits into various genetic and environmental factors. Zhu and Weir (1994) further proposed the mixed linear model approach to analyze the maternal, embryo, endosperm and cytoplasm effects, and their interactions with environment underlying seed traits collected from a diallel cross experiment. The above approaches can only dissect the genetic variation of seed traits into several cumulative components, but fail to provide detailed information at the individual gene level such as the positions and effects of quantitative trait loci (QTLs) as all the genes controlling the seed traits were analyzed as a whole. Advancements in molecular markers and statistical methods enable us to partition the total genetic variation into the effects of individual QTLs, based on cosegregation between molecular markers and the putative QTL (Lander and Botstein, 1989; Zeng, 1994). A number of researchers applied traditional diploid genetic model to map QTLs underlying endosperm traits (Sourdille et al., 1996; Parker et al., 1998; Araki et al., 1999; Sene et al., 2001; Tan et al., 2001), until tailored QTL mapping methods were developed for seed traits. Genetically, endosperm represents the next generation developed in maternal plants, with a more complex segregation patterns and genetic mechanism than diploid tissues. For example, a biallelic locus (Qq) has four possible genotypes, namely, QQQ, QQq, Qqq and qqq, and three kinds of endosperm genetic effects can be involved, namely additive effect (a), the first dominance effect (d_{1}) and the second dominance effect (d_{2}). Bases on such genetic basis, Kao (2004) proposed a statistical method using multipleinterval mapping that took triploid nature of endosperm into account. It can estimate all three kinds of endosperm effect. However, one more important feature ignored in this method and others is that endosperm is a tissue developed in its maternal plant, implying that its phenotypes may be affected not only by its own triploid endosperm genotypes, but also by the diploid maternal genotypes via supplies of nutrition and other physiological active substances. This point of view has been supported by several studies on model species such as Arabidopsis and maize (Letchworth and Lambert, 1998; Garcia et al., 2005; Ohto et al., 2005). To take maternal effects into account, it is theoretically desirable to integrate both maternal genome and offspring genome into one model. Hu and Xu (2005) proposed a mapping method to characterize genetic effects of maternal genome on offspring traits that incorporates both the quantitative genetic model for diploid maternal traits and that for triploid endosperm traits into a unified QTL mapping framework. Wen and Wu (2007) further proposed a method of interval mapping of endosperm traits based on a twostage hierarchical design that considers both the endosperm effects and the maternal effects. In addition, Cui and Wu (2005a) and Cui et al. (2006) proposed an epistasis model for mapping endosperm QTLs in a double backcross population. The above approaches do not include genetic effects of all QTLs in the model and can only give the information about individual or onepaired QTLs, such as position, the maternal additive/dominance effect, the endosperm additive effect and the first and the second dominance effect. Therefore, these methods still possibly suffer from limited power and precision in crop seed QTL mapping, because only one or onepaired putative QTLs are tested at a time, and the genetic background control has not been considered. If there is more than one QTL on a chromosome, the testing of one QTL at a position will be distorted by all other QTLs and thus the positions and effects of the QTL are likely to be biased. Moreover, QTLbyenvironment (QE) interactions, another important factor for complex traits, are ignored in these methods, potentially leading to a biased estimation of QTL parameters.
In the present study, we propose a new method for systematically mapping QTLs underlying crop seed traits. We integrate maternal and offspring genome as well as the QE interaction effects into one unified QTL model. The Monte Carlo simulations were conducted to investigate the effectiveness, the stability and the efficiency of our model and methods. A case application from a cotton breeding experiment was demonstrated for usefulness of the proposed method. A computer software, QTLNetworkSeed (http://ibi.zju.edu.cn/software), was developed for mapping QTL of seed traits.
Materials and methods
Genetic model
For a specific mapping population derived from two pure lines (P_{1} and P_{2}), the genetic experiment is conducted in t different environments each with b blocks. Suppose the surveyed quantitative trait of seed is controlled by the genetic effects of maternal and embryo of s QTLs (Q_{1},Q_{2},……,Q_{s}), the phenotype (y_{hij}) of the ith strain in the jth block of the hth environment can be expressed by the following QTL model,
where μ is the population mean; and are the maternal additive and dominance effects of Q_{k}, with coefficients and , respectively; and are the embryo additive and dominance effects of Q_{k}, with coefficients and , respectively; is the random effect of the hth environment, e_{h}∼(0, ); and are the interaction effects of the additive and the dominance of maternal QTL with environments, respectively, ∼(0, ) and ∼(0, ); and are the interaction effects of the additive and the dominance of embryo QTL with environments, respectively, ∼(0, ) and ∼(0, ); is the jth block effect within the hth environment; and ɛ_{hij} is the random residual effect, ɛ_{hij}∼(0,).
If the trait is controlled simultaneously by the genetic effects of maternal and endosperm QTLs, the model can be represented as follows,
where , and are additive effect, firstorder dominance effect (interaction between Q_{k}Q_{k} and q_{k}) and secondorder dominance effects (between Q_{k} and q_{k}q_{k}) of endosperm QTL, respectively, with coefficients , and ; , and are the interaction effects of the additive, the first and the secondorder dominance effects with environments, respectively, ∼(0, ), ∼(0, ), and ∼(0, ); the other terms have the same definitions as those in model (1). It should be pointed out that when a QTL is involved in both maternal and endosperm effects, and cannot be distinguished in this full model, and thus we need to reparameterize them with the constraint of equal to , as well as their interaction effects with environment with the constraint of equal to .
In the above models, all coefficients of QTLs will be determined by the QTL genotype. If QTL genotypes are QQ, Qq and qq, the coefficients are 1, 0 and −1, respectively, for maternal or embryo additive effects, and 0, 0.5 and 0 for maternal or embryo dominance effects. The coefficients of endosperm additive, the first and the second dominance effects are 1.5, 0 and 0 for the endosperm QTL genotype QQQ, 0.5, 1 and 0 for the QQq, −0.5, 0 and 1 for the Qqq, −1.5, 0 and 0 for the qqq, respectively. In practice, the QTL genotype is usually unknown; their probabilities can be inferred based on the flanking marker genotype.
Mating designs
Several mapping populations are applicable to dissect various effect components in seed traits. We suggest to use the populations derived from an immortalized F_{2} (IF_{2}) of which genetically identical individuals can be regenerated for replicates and be phenotyped in different environments for detection of gene–environment interactions.
Double backcross of IF_{2}
In this design, IF_{2} lines, as maternal parent, are crossed with their two homozygous parents (P_{1} and P_{2}) as paternal parents. Marker information of IF_{2} and its offspring as well as the phenotype of offspring are employed to perform QTL mapping. There are eight types of QTL genotype combinations in maternal and in embryo (endosperm) and its offspring plants: QQ:QQ(QQQ), Qq:QQ(QQQ), Qq:Qq(Qqq), qq:Qq(Qqq), QQ:Qq(QQq), Qq:Qq(QQq), Qq:qq(qqq) and qq:qq(qqq); the first four and the last four combinations derive from backcross of IF_{2} line with P_{1}, and with P_{2}, respectively. We denote p_{kij} as the conditional probability of the kth QTL genotype combination on the ith marker type in maternal plant and the jth marker type in offspring generation and the expectations of QTL effect coefficients can be calculated based on the formula in Table 1, where k indexes the maternal and offspring QTL genotype combination (k=1, 2, …, 8), i indexes the maternal genotype at the flanking markers (i=1, 2, …, 9) and j indexes the offspring genotype at the flanking markers (j=1, 2, …, 9). The conditional probabilities of eight QTL genotype combinations, p_{kij}, are summarized in Table 2 for the IF_{2} from random mating of double haploid (DH) lines. If the IF_{2} is from random mating of recombinant inbred lines (RILs), instead of the recombination rates, the proportions of recombinant zygote should be used in Table 2. As shown in Table 2, p_{1ij}+p_{2ij}+p_{3ij}+p_{4ij}=1 and p_{5ij}=p_{6ij}=p_{7ij}=p_{8ij}=0 in BC_{1} (IF_{2} × P_{1}) lines and p_{5ij}+p_{6ij}+p_{7ij}+p_{8ij}=1 and p_{1ij}=p_{2ij}=p_{3ij}=p_{4ij}=0 in BC_{2} (IF_{2} × P_{2}) lines.
Selfing of IF_{2}
In this design, we use marker information of two generations, parental generation (IF_{2}) and its offspring (F_{3}), and the phenotype of seed (F_{3}) to perform QTL mapping. There are totally six types of combination for QTL genotypes in maternal genome (IF_{2}), embryo genome and endosperm genome (F_{3}): QQ:QQ(QQQ), Qq:QQ(QQQ), Qq:Qq(QQq), Qq:Qq(Qqq), Qq:qq(qqq) and qq:qq(qqq). According to the formula for calculating conditional probabilities by Wen (Wen et al., 2007), we denote p_{kij} as the conditional probability of QTL genotype combination on the two flanking marker genotypes in IF_{2} and F_{3} plants, in which i stands for the index of marker genotype in IF_{2} plant, j stands for the index of marker genotype in F_{3} plant (i, j=1, 2, …, 9) and k stands for the index of QTL genotype combination (k=1, 2, 3, 6, 7, 8); then, the expectations of coefficients for QTL effects in models (1) and (2) could be obtained by using Table 1.
Random mating of IF_{2}
In QTL mapping for seed traits, random mating design is also a common choice. For the random mating of IF_{2} lines, we could also apply mixed linear model approach to analyze the QTL effect, in which those coefficients are more complicated than those in the above systematic or planned mating designs. Suppose a cross between two lines, sampled at random from the IF_{2} with the ith marker genotype in maternal genome and the jth marker genotype in paternal genome for one marker interval (Table 3; i, j=1, 2, … 9), respectively, we denote p_{ki} as the conditional probability of the kth QTL genotype (k=1, 2, 3) on the ith marker genotype of the maternal genome and p_{lj} as the conditional probability of the lth QTL genotype (l=1, 2, 3) on the jth marker genotype of the paternal genome. Let p_{ki˙lj}=p_{ki}p_{lj}, then the expectations of the coefficients of QTL effects could be calculated as follows: (p_{1i}−p_{3i}) is the expectation of coefficient for maternal additive effect; p_{2i} for the maternal dominance effect; for the embryo additive effect; for the embryo dominance effect; for the endosperm additive effect; for the endosperm firstorder dominance effect; for the secondorder dominance effect.
Mapping strategy
In QTL mapping, the QTL genotype at a given locus is not observed; we could not determine the exact coefficients for the QTL effects in the above models. One reasonable method is to substitute them with their expectations based on the flanking marker genotypes. On the other hand, we still need to know the number of QTLs associated with the seed trait of study and their positions on the chromosome(s). Therefore, we first conduct QTL position scanning across the whole genome using a walking step of 1 cM with prescreened significant marker intervals as cofactors, and then construct the QTL full model to analyze QTL effects. The procedure is outlined as follows, and for more details please refer to the literature of mapping QTL of complex traits (Yang et al., 2007).
QTL position scanning
Similar to the composite interval mapping (Zeng, 1994), we perform scanning on the QTL position using marker information. First, we treat each interval consisting of paired adjacent makers as an integrated effect in the regression model and conduct regression analysis of phenotype on each marker interval. Fstatistic was calculated and used to screen potential candidate marker intervals by the permutation method. Based on the screened potential candidate intervals, multivariate regression was performed again and significant candidate marker intervals harboring QTLs were finally screened via the conventional stepwise method (Yang et al., 2007). Second, with the selected candidate intervals as cofactors, QTL position scanning is performed with a walking step such as 1 cM across the whole genome. When one putative QTL is tested, all other predetected significant QTLs will be included in the model to control genetic noise from background QTLs. Significant threshold is specified by permutation testing. Without loss of generality, we use the model (2) to illustrate the analysis. The following model is used to identify the significant marker intervals on the whole genome,
where y_{hij} is the phenotype of the ith line in the jth block under the hth environment; μ_{h} is the population mean under the hth environment; t (t=1,…, T) stands for the tth marker interval to be tested in T intervals; () and () stand for the maternal additive and dominance effects for the right marker (left marker) in hth environment, with coefficients () and () respectively; (), () and () stand for the endosperm additive effect, firstorder dominance effect and secondorder dominance effect for the right marker (left marker) in the hth environment with coefficients (), () and () respectively; the other parameters have the same definition as model (2). These ζs corresponding to different marker effects could be determined by the same way for the coefficients of QTL effects after replacing Q and q alleles in QTL genotype with M and m marker alleles. Fstatistic based on Henderson III method (Henderson, 1953; Searle, 1968) is calculated to test the significance of marker interval effects, and permutation method is employed to specify the threshold of significance.
Within each marker interval, we scan QTL with a step size of 1 cM along the chromosome. The selected marker intervals are added into the mapping model to control the background genetic effects when scanning QTL position. To test QTL k with all s significant markers considered, we apply the following model,
where and are the maternal additive and dominance effects of the QTL k in the hth environment; , and are the endosperm additive, the firstorder dominance and the secondorder dominance effects, respectively; all the other parameters have the same definition in model (2) and model (3). Similar to scanning marker intervals, Fstatistic is calculated for each putative QTL position, the Fstatistic profile along chromosome will be obtained and the peaks, which reach the criteria of significance by permutation, are selected as candidate QTLs. Finally, we construct a QTL full model that consists of the detected candidate QTLs, and then conduct model selection to eliminate the false positive QTLs. The number of QTLs and their positions will be determined in the final model for estimation of QTL effects.
Genetic effect estimation
With all positions of QTLs identified, we can estimate the maternal QTL effects and endosperm QTL effects or the maternal QTL effects and embryo QTL effects, as well as their corresponding QE interaction effects based on the final QTL full model. To fit the mixed linear model, we first obtain a set of initial estimations of QTL parameters including the variances of random effects by the minimum norm quadric unbiased estimation (Rao, 1971), the fixed effects by the ordinary least square estimation and the random effects by the adjusted unbiased prediction, and then input them as prior values of parameters into the process of the Markov chain Monte Carlo for the mixed linear model by applying Gibbs sampling (Smith and Roberts, 1993). After we get the distribution of each parameter in the mixed linear model, the mean value is set as the final estimation and its significance is tested by tstatistic.
Simulation scenarios
To investigate the efficiency and accuracy of the proposed methods, we performed Monte Carlo simulations to verify the unbiasedness and robustness of our model. Here, double backcross of IF_{2} was simulated with 400 genotypes and two environments. The genetic map is set as 5 chromosomes, with 11 markers evenly distributed on each chromosome. The distance between two consecutive markers is set to 10 cM. Six QTLs (Q_{1}, Q_{2}, Q_{3}, Q_{4}, Q_{5} and Q_{6}) controlling a seed quantitative trait are scattered on the five chromosomes, in which Q_{5} and Q_{6} are on the same chromosome. One QTL (Q_{4}) is set with only maternal effects; two QTLs (Q_{1} and Q_{5}) are set with only endosperm effects; three QTLs (Q_{2}, Q_{3} and Q_{6}) are set with both maternal and endosperm effects. Different heritabilities and population structures (that is, the ratio of the number of IF_{2} line and the number of offspring each line) were used in simulations to investigate the influence of heritability and population structure on estimation of QTL parameter. Each simulation scenario was repeated 500 times.
A worked example
To demonstrate the use of the proposed method, we analyzed a set of real data from a cotton (Gossypium hirsutum L.) breeding experiment. A set of 188 recombinant inbred lines were developed from an intraspecific hybrid cross using two upland cotton germplasms HS46 and WARKCBUCAG8US188 as the parents that have wide genetic differences in yield, fiber quality, disease resistance and seed quality traits. In this study, every two lines among the 188 recombinant inbred lines were randomly crossed during flowering to produce 376 IF_{2} lines in 2009 and 2010 that were used for QTL analysis. Seeds of the IF_{2} population and two parents were manually harvested at maturity. All the seeds were measured for fiber percentage. The genetic map includes 388 molecular markers resided on 30 linkage groups. It covers a total length of 1946.22 cM, accounting for 41.55% of the whole genome, with an average distance of 5.03 cM between adjacent markers.
As the cotton seed consists of no triploid endosperm tissue, the cotton fiber percentage is mainly governed by the maternal effects and the embryo effects. Model (1) with inclusion of QTL maternal additive effect, embryo additive and dominance effects and their interaction effects with environments and its reduced model with exclusion of QE interaction effects were used in statistical analysis. The critical Fvalues were calculated to prescreen candidate intervals by permutation testing at the genomewide significance level of 0.05. The full model was fit by the Markov chain Monte Carlo method.
Results
Estimation of QTL parameter
A summary of results is presented in Tables 4, 5, 6, 7. Under the heritability of 25% and the population structure of 200:2 (200 IF_{2} lines and 2 offspring each IF_{2} line), it is clear that all QTL positions could be unbiasedly estimated, the largest bias estimated value−Parameter value (Est.−Par.) is only 0.3 cM, and all the six ratios of bias to true value were <1% (Table 4). The majority of powers of QTL detection reached 100%, and even the lowest still reached 94% for Q_{4}. For the estimation of QTL effects, most estimates were close to the true values of parameter and the estimation accuracies of QTL effects are reasonably good, especially for the QTL main effects (Tables 4 and 6). Most ratios of biases to true values were within 5% and all were <10%. We also found that the estimation accuracies of QE interaction effects appeared to be a little bit worse than those of QTL main effects (Table 6). The interaction effects of maternal and endosperm additive with environment (ae^{m} and ae^{e}) exhibited relatively higher estimation accuracy than maternal and endosperm dominance interaction with environment (de^{m} and de^{e}); all the ratios of bias to true value were <15% for ae^{m}and ae^{e}, but >15% for de^{m} and de^{e}.
Influence of heritability and population structure
The simulations, under other three heritabilities of 75%, 50 and 10% with population structure of 200:2 (200 IF_{2} lines and 2 offspring each line), showed that the estimates of QTL positions and QTL effects were very close to each other among the three levels of heritabilites (Table 6). All the estimates of QTL position were unbiased. The estimates of maternal additive and endosperm additive effects were also robust for different heritabilities. The s.d. of parameter estimation became a little bit larger, and the power decreased when the heritability decreased from 75 to 10%; especially for the Q_{4}, its power decreased to 76.8% at 10% heritability.
The influence of population structure was also investigated with three population structures 200:2, 100:4 and 50:8 under the heritability of 75%. Similar to the results for different heritabilities, the estimates of QTL positions and of effects still keep unbiased. The estimates of position or effects of QTL were also close to each other among three population constitutions, whereas the s.d. exhibited increasing tendency with changing of structure from I to III (Table 7). For the power, as the population structure changes, it seemed that the most unstable was the QTL with only maternal effect (Q_{4}). With the number of IF_{2} lines decreasing from 200 to 50, the power of detecting Q_{4} changes from 97.8 to 39.2%. This result is expected because the sample, to estimate maternal effects, decreases with small number of IF_{2}.
Application to the real data
The genetic main effects and environment interaction effects of all the QTLs are presented in Table 8. Two significant QTLs were located on chromosomes 19 and 21, respectively. Both the QTLs are sensitive to environment. The estimated relative contribution of Q_{2} is 16.03%. In addition, we found that nearly 6.19% of the phenotypic variation is attributed to the maternal effects of QTL and ∼10.83% to the QE interaction. Furthermore, we used a reduced model without QE interaction effect to reanalyze the data. Although the Q_{2} was still detected by the reduced model, the corresponding estimated relative contribution is decreased to 8.19%, clearly showing the adequacy of the proposed analysis. Hence, in practical mapping, we suggest to take maternal effect and environment interactions into account when we have no knowledge available on the real genetic architecture of surveyed seed traits.
Discussion
In the present study, we have proposed a new statistical method that takes multiple QTLs and QE interactions into account for mapping QTLs underlying seed traits. It has been well recognized that QE interaction is one important component of the total genetic effect that may cause the phenotype to vary across various environments. QE interactions have been considered in QTL mapping for diploid plant traits (Jansen, 1992; Jansen et al., 1995; Jiang and Zeng, 1995; Cockerham and Zeng, 1996; SariGorla et al., 1997; Yan et al., 1998), but have not been considered for diploid (maternal plus embryo) or triploid (maternal plus endosperm) seed traits. When QE interactions do play a significant role for endosperm traits of interest, the methods that ignore such interactions may lead to biased estimation. The new method integrates the effects of QTLs that are expressed in maternal and/or offspring genome and QE interactions into one mapping framework, and the detected QE interactions can be further dissected into subfactors, such as interaction effects of maternal additive by environment, maternal dominance by environment and so on, that are valuable information for selecting or breeding environmentspecific or environmentinsensitive varieties by markerassisted selection.
A major difficulty in developing a powerful statistical approach for mapping QTLs with QE interactions is to handle an increased number of parameters in the statistical model, especially for the random effects. To tackle this problem, we adopted a mixed linear model instead of more commonly used regression models. In spite of more burdensome computation, the mixed linear model approach is able to keep nondecreasing or even increasing statistical power even if the number of parameters is nearly doubled because of inclusion of QE interactions.
Unlike the previous methods (Wu et al., 2002; Xu et al., 2003), most of which are based on one QTL model and thus may have a potential bias in parameter estimation when two or more linked QTLs exist in genome, the proposed method uses a QTL full model to estimate all possible effects of QTLs. False positive QTL(s) could be eliminated by model selection, while higher power and more accurate estimation of QTL effects could be achieved, benefiting from smaller and more accurate estimation of the residual effects because of background control. It should be noted that we have integrated the maternal genome into the model in addition to the offspring genome (Foolad and Jones, 1992; Cui et al., 2004; Hu and Xu, 2005; Wen and Wu, 2007), based on the view that development of embryo/endosperm traits partially depends on its own maternal plant.
It is well documented that epistasis is extensively involved in genetic variation of complex traits. Epistatic effects and their interaction effects with environment have been included in the model for mapping QTL of general crop agronomic traits (Wang et al., 1999; Yang et al., 2007). For mapping epistatic effects of QTL for seed traits, Cui and Wu (2005a) and Cui et al. (2006) proposed a model including interaction effects of one paired QTL in the same genome (either of maternal genome, embryo genome or endosperm genome) or two different genomes based on the double backcross design. Although our models (1) and (2) currently do not include the epistatic interactions among genes in the different genomes, under current framework of our method, it is in principle feasible to extend the proposed method to cover such epistatic effects and their interaction effects with environments. As such, an extension involves a more complicated QTL full model and also requires intensive work in the development of methodology and software for various experimental designs, and this will be our followup study.
Monte Carlo simulations demonstrate that the method could precisely locate the positions of QTLs, and well estimate the genetic effects, even if the environment or gene–environment interaction exists. A satisfactory power of QTL detection is still obtained, despite increased number of parameters in the model due to inclusion of QE interactions as compared with current available methods. According to the simulation results under different heritabilities (Table 6), even in the heritability of 10%, most QTLs (5 in 6) could be detected with power of >98%. As the population structure changes, the power of QTL except Q_{4} are >99% (Table 7). For different population structures with the same sample size, we can easily conclude that increasing maternal plant number will improve the mapping precision in the estimation of QTL parameter. This is reasonable as variation of trait ascribed to maternal effect decreases with a reduction in the number of maternal plants.
Understanding the genetic architecture of seed development helps us to answer many fundamental questions in higher plants. There is a straightforward application of our model in breeding practice or genetic studies. For example, if the estimated maternal effects surpass offspring effect, more attention should be paid to selection of maternal effects for improving seed traits; if the offspring effects dominate, it is better to directly select the seed for reproduction. But there are some other issues that should be noted. Our method is based on a QTL full model that can control the false positive rate and improve estimation precision of QTL parameter, but for the double backcross of IF_{2} design, linear dependency will occur between endosperm genetic effects of any pair of QTLs for the model (2), because of x^{Ae}−2+2=1.5 or −1.5 for any individual QTL. The simplest way to tackle this problem is to reparameterize, for example, to merge the firstorder and the secondorder endosperm dominance effects for each QTL, that has been employed in our simulation study. Another potential strategy is to keep the firstorder and the secondorder endosperm dominance effects of one QTL but to merge the two dominance effects as well as their interaction effects with environment for other endosperm QTL(s). However, the potential strategy will increase computational complexity as the genetic effects of all QTLs cannot be simultaneously estimated, and the QTL full model has to be adjusted for each QTL; on the other hand, the effectiveness of this strategy still need investigating in further study. For the selfing of IF_{2} design, the p_{3ij} and p_{4ij} (Table 1) is compound because of inability to separate the Qq:Qq(QQq) and Qq:Qq(Qqq) based on maker information. Therefore, the firstorder and the secondorder dominance effects need to merge in the model (2).
The QTL genotypes are unobserved and need to be inferred from the flanking markers. In the traditional fixed models, there are two ways to infer the missing QTL genotypes: the mixture distribution method and the conditional expectation method used in interval mapping of Lander and Botstein (1989) and interval regression approach of Haley and Knott (1992), respectively. Both the methods are considerably effective for QTL parameter estimation. The comparisons between the two methods also support that there exists only a slight difference in most cases (Xu, 1995, 1996; Gessler and Xu, 1996; Kao, 2000). Different from those in a fixed model, the observations are no longer independent of each other in a mixed model that is considered in this study. Although feasible in theory, it is not easy to handle the joint mixture distribution of a set of correlated observations. Therefore, we choose the expectation method in inferring the coefficients associated with the QTL genotypes. Our estimation method is, in essence, an extension of the regression approach of Haley and Knott (1992).
Based on the models and methods proposed in the present study, the computer software named QTLNetworkSeed was developed in C++ programming language. This software can be run on the most commonly used operation systems. Double backcross of IF_{2}, selfing of IF_{2} and random mating for IF_{2}, double haploid or recombinant inbred line designs can be handled by this software for mapping QTLs underlying seed traits.
Data archiving
Data are available from our own public Internet website http://ibi.zju.edu.cn/software/ and from the Dryad Digital Repository doi:10.5061/dryad.8311n.
References
Araki E, Miura H, Sawada S. (1999). Identification of genetic loci affecting amylose content and agronomic traits on chromosome 4A of wheat. Theor Appl Genet 98: 977–984.
Bazzaz FA. (2001). Plant biology in the future. Proc Natl Acad Sci USA 98: 5441–5445.
Benner MS, Phillips RL, Kirihara JA, Messing JW. (1989). Geneticanalysis of methionine rich storage protein accumulation in maize. Theor Appl Genet 78: 761–767.
Cockerham CC, Zeng ZB. (1996). Design III with marker loci. Genetics 143: 1437–1456.
Cui Y, Casella G, Wu R. (2004). Mapping quantitative trait loci interactions from the maternal and offspring genomes. Genetics 167: 1017–1026.
Cui Y, Wu R. (2005a). Mapping genomegenome epistasis: a highdimensional model. Bioinformatics 21: 2447–2455.
Cui YH, Wu JG, Shi CH, Littell RC, Wu RL. (2006). Modelling epistatic effects of embryo and endosperm QTL on seed quality traits. Genet Res 87: 61–71.
Cui YH, Wu RL. (2005b). Statistical model for characterizing epistatic control of triploid endosperm triggered by maternal and offspring QTLs. Genet Res 86: 65–75.
Foolad M, Jones R. (1992). Models to estimate maternally controlled genetic variation in quantitative seed characters. Theor Appl Genet 83: 360–366.
Garcia D, Fitz Gerald JN, Berger F. (2005). Maternal control of integument cell elongation and zygotic control of endosperm growth are coordinated to determine seed size in Arabidopsis. Plant Cell 17: 52–60.
Gessler DD, Xu S. (1996). Using the expectation or the distribution of the identity by descent for mapping quantitative trait loci under the random model. Am J Hum Genet 59: 1382–1390.
Haley CS, Knott SA. (1992). A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69: 315–324.
Henderson CR. (1953). Estimation of variance and covariance components. Biometrics 9: 226–252.
Hu ZQ, Xu CW. (2005). A new statistical method for mapping QTLs underlying endosperm traits. Chinese Sci Bull 50: 1470–1476.
Jansen RC. (1992). A general mixture model for mapping quantitative trait loci by using molecular markers. Theor Appl Genet 85: 252–260.
Jansen RC, Vanooijen JW, Stam P, Lister C, Dean C. (1995). Genotypebyenvironment interaction in geneticmapping of multiple quantitative trait loci. Theor Appl Genet 91: 33–37.
Jiang CJ, Zeng ZB. (1995). Multipletrait analysis of geneticmapping for quantitative trait loci. Genetics 140: 1111–1127.
Kao CH. (2000). On the differences between maximum likelihood and regression interval mapping in the analysis of quantitative trait loci. Genetics 156: 855–865.
Kao CH. (2004). Multipleinterval mapping for quantitative trait loci controlling endosperm traits. Genetics 167: 1987–2002.
Lander ES, Botstein D. (1989). Mapping Mendelian factors underlying quantitative traits using Rflp linkage maps. Genetics 121: 185–199.
Letchworth MB, Lambert RJ. (1998). Pollen parent effects on oil, protein, and starch concentration in maize kernels. Crop Sci 38: 363–367.
Mazur B, Krebbers E, Tingey S. (1999). Gene discovery and product development for grain quality traits. Science 285: 372–375.
Mo HD. (1995). Identification of genetic control for endosperm traits in cereals. Acta Genetica Sinica 22: 126–132.
Ohto M, Fischer RL, Goldberg RB, Nakamura K, Harada JJ. (2005). Control of seed mass by APETALA2. Proc Natl Acad Sci USA 102: 3123–3128.
Olsen OA. (1998). Endosperm developments. Plant Cell 10: 485–488.
Parker GD, Chalmers KJ, Rathjen AJ, Langridge P. (1998). Mapping loci associated with flour colour in wheat (Triticum aestivum L.). Theor Appl Genet 97: 238–245.
Rao CR. (1971). Minimum variance quadratic unbiased estimation of variance components. J Multivariate Anal 1: 445–456.
SariGorla M, Calinski T, Kaczmarek Z, Krajewski P. (1997). Detection of QTL x environment interaction in maize by a least squares interval mapping method. Heredity 78: 146–157.
Searle SR. (1968). Another look at Henderson’s methods of estimating variance components. Biometrics 24: 749–787.
Sene M, Thevenot C, Hoffmann D, Benetrix F, Causse M, Prioul JL. (2001). QTLs for grain dry milling properties, composition and vitreousness in maize recombinant inbred lines. Theor Appl Genet 102: 591–599.
Shi CH, Zhu J, Wu JG, Fan LJ. (2000). Genetic and genotype x environment interaction effects from embryo, endosperm, cytoplasm and maternal plant for rice grain shape traits of indica rice. Field Crop Res 68: 191–198.
Smith AF, Roberts GO. (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. J Roy Stat Soc B 55: 3–23.
Sourdille P, Perretant MR, Charmet G, Leroy P, Gautier MF, Joudrier P et al. (1996). Linkage between RSLP markers and genes affecting kernel hardness in wheat. Theor Appl Genet 93: 580–586.
Sundaresan V. (2005). Control of seed size in plants. Proc Natl Acad Sci USA 102: 17887–17888.
Tan YF, Sun M, Xing YZ, Hua JP, Sun XL, Zhang QF et al. (2001). Mapping quantitative trait loci for milling quality, protein content and color characteristics of rice using a recombinant inbred line population derived from an elite rice hybrid. Theor Appl Genet 103: 1037–1045.
van der Meer IM, Bovy AG, Bosch D. (2001). Plantbased raw material: improved food quality for better nutrition via plant genomics. Curr Opin Biotech 12: 488–492.
Wang DL, Zhu J, Li ZK, Paterson AH. (1999). Mapping QTLs with epistatic effects and QTL × environment interactions by mixed linear model approaches. Theor Appl Genet 99: 1255–1264.
Wen Y, Wu W. (2007). Interval mapping of quantitative trait loci underlying triploid endosperm traits using F3 seeds. J Genet J Genet 34: 429–436.
Wu RL, Lou XY, Ma CX, Wang XL, Larkins BA, Casella G. (2002). An improved genetic model generates highresolution mapping of QTL for protein quality in maize endosperm. Proc Natl Acad Sci USA 99: 11281–11286.
Xu C, He X, Xu S. (2003). Mapping quantitative trait loci underlying triploid endosperm traits. Heredity 90: 228–235.
Xu S. (1995). A comment on the simple regression method for interval mapping. Genetics 141: 1657–1659.
Xu S. (1996). Computation of the full likelihood function for estimating variance at a quantitative trait locus. Genetics 144: 1951–1960.
Yan JQ, Zhu J, He CX, Benmoussa M, Wu P. (1998). Molecular dissection of developmental behavior of plant height in rice (Oryza sativa L.). Genetics 150: 1257–1265.
Yang J, Zhu J, Williams RW. (2007). Mapping the genetic architecture of complex traits in experimental populations. Bioinformatics 23: 1527–1536.
Zeng ZB. (1994). Precision mapping of quantitative trait loci. Genetics 136: 1457–1468.
Zhang AH, Xu CW, Mo HD. (1999). Genetic expression of several quality traits in IndicaJaponic hybrids. Acta Agronomica Sinica 25: 401–407.
Zhu J, Weir BS. (1994). Analysis of cytoplasmic and maternal effects. 2. Genetic models for triploid endosperms. Theor Appl Genet 89: 160–166.
Acknowledgements
This work was supported in part by the National Basic Research Program of China (973 Program 2011CB109306, 2010CB126006), the National Natural Science Foundation 31271608 and the National Institutes of Health Grant DA025095, the Bill and Melinda Gates Foundation Project.
Author information
Affiliations
Institute of Bioinformatics, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, PR China
 T Qi
 , B Jiang
 , Z Zhu
 , C Wei
 , S Zhu
 & H Xu
Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, PR China
 Y Gao
Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
 X Lou
Authors
Search for T Qi in:
Search for B Jiang in:
Search for Z Zhu in:
Search for C Wei in:
Search for Y Gao in:
Search for S Zhu in:
Search for H Xu in:
Search for X Lou in:
Competing interests
The authors declare no conflict of interest.
Corresponding authors
Rights and permissions
To obtain permission to reuse content from this article visit RightsLink.
About this article
Further reading

Dissection of complicate genetic architecture and breeding perspective of cottonseed traits by genomewide association study
BMC Genomics (2018)

Dominance and Epistasis Interactions Revealed as Important Variants for Leaf Traits of Maize NAM Population
Frontiers in Plant Science (2018)

Comparing GWAS Results of Complex Traits Using Full Genetic Model and Additive Models for Revealing Genetic Architecture
Scientific Reports (2017)

Dissecting Genetic Architecture Underlying Seed Traits in Multiple Environments
Genetics (2015)