Introduction

In some crop species such as maize, rice and wheat, germplasm collections that include accessions consisting of traditional landraces, modern cultivars and wild species have recently been established (Flint-Garcia et al., 2005; Kojima et al., 2005; Crossa et al., 2007), and used for evaluation of the genetic diversity present in a species. Such collections are also useful as stocks of genes for breeding programs. Understanding the genetic variation in agronomic traits at the QTL level in collections is required to utilize these collections as breeding materials. Although association studies using some accessions sampled from a collection are a straightforward way to evaluate QTL diversity within the collection, whole genome association analysis requires the development of high-density markers that cover the whole genome and is generally prohibited by the enormous cost of developing and genotyping a large number of markers.

One effective way to extract the QTL information in a crop collection would be to utilize the segregating multiple families derived from crossing some accessions sampled from the collection to a single reference line such as a standard commercial variety for QTL mapping. This mating design was recently adopted by Yu et al. (2008) to reinforce the association study in founder lines of maize, where the segregating multiple families of recombinant inbred lines (RILs) were derived from crosses between 25 diverse founders and a reference founder line. They showed that, by projecting high-density marker information from the founder lines to the RILs, more accurate association mapping was made possible in a cost-effective way using the RILs with a moderate number of the selected markers genotyped. They also showed that the effect of population structure present in the founder lines on association mapping, causing the frequent false positives, was minimized by the multiple RILs because of reshuffling of genomes between two parental lines.

This mating design is also useful for linkage-based QTL mapping to investigate the diversity of QTL affecting the agricultural traits in the germplasm collections of the crops for which whole genome association studies are unrealistic at present due to the limited availabilities of a sufficient number of SNP markers and/or high throughput genotyping systems. For the future association studies of such crops, the targets to be analyzed can be specified by QTL mapping in the multiple families and development and genotyping of SNPs can effectively be confined to the specified regions, not on a whole genome. Linkage-based QTL mapping using the multiple families derived from the founder lines can accurately identify the QTL regions with lower false-positive rate than association mapping using the founder lines in which the unknown population structure might be present although the specified regions are relatively broader. Moreover, with this mating design, we can detect the QTL at which any accession possesses a different allele from that of a reference line, which would provide some useful information for breeding of the crop. However, statistical methods of QTL mapping to effectively analyze a large number of multiple families with a common parental line remain to be developed.

In this paper, we develop a Bayesian method for jointly analyzing QTL for such interconnected multiple families with a common parental reference line; the families are derived by crossing a set of inbred lines (referred to as the ‘tested lines’ hereafter) sampled from a collection with a common parent that serves as the reference line, to detect QTL segregating between any of the tested lines and the reference line. It would be desirable to analyze as many tested lines as possible for understanding of QTL diversity in the collections. Accordingly, a large number of families, each of which is derived from each tested line crossed to a reference line, should be treated. As the number of families increases, however, each family would necessarily be confined to smaller size owing to limitations on the available space and cost, and this might decrease the accuracy in the estimation of the effect of the specific QTL allele derived from each tested line. Therefore, we treated the effects of alleles from the tested lines as random effects, but treated the effect of the allele from the reference line as a fixed effect. Here we discriminate ‘random effect’ and ‘fixed effect’ from a frequentist stand point although all effects included in the model are random in a Bayesian framework. When an effect is treated as random in a frequentist framework, a probability distribution is considered for the effect by a frequentist, which can be regarded as a prior distribution by a Bayesian. An effect with such a probability distribution provided by frequentist consideration is termed as ‘random effect’ whereas an effect to which no probability distribution is assigned by a frequentist is as ‘fixed effect’ in this Bayesian study.

Information about accessions that possess QTL alleles different from that of the reference line will be very useful in future breeding programs. We therefore incorporated a variable that indicates a segregation of each QTL in each family into the statistical model to infer which of the tested lines possess QTL alleles different from that of the reference line.

Our consideration was confined to multiple families of RILs derived from crosses between a considerable number of tested lines and a common reference line. However, the statistical model would easily be applicable to other families such as F2 or backcross with slight modification. The method was then evaluated for its efficiency in detecting QTL and identifying families that segregate for each QTL using simulated data sets.

Materials and methods

Analyzed families

We consider multiple families of RILs derived from crosses between a considerable number of tested lines and a common reference line, where the tested lines are crossed to the reference line, followed by selfing, to generate segregating F2 populations, from each of which RILs are derived through single-seed descent with repeated cycles of selfing. The number of families, equal to the number of tested lines used for crossing with the reference line, is m and the size of the ith RIL family derived from the ith tested line is ni.

Statistical model

We assume that observations of the phenotype of a trait are available for individuals in the multiple families of RILs, as is marker information, including genotypic data at markers for the tested lines, the reference line and individuals in the multiple families of RILs and a linkage map of the markers, where all individuals in the RILs are assumed to be homozygous at all QTL and markers. We assumed that there is no epistatic interaction between QTL in this study although this assumption can be relaxed without difficulty. The phenotypic value of the jth individual in the ith RIL family is denoted by yij (i=1, 2,…, m; j=1, 2,…, ni), for which we can apply the following linear model,

In this model, μ is the intercept of the model, which is a mean of the genotypic values obtained by omitting segregating QTL in the multiple families and N is the number of QTL affecting the phenotypic value. The variable ulij indicates the genotype of the individual at the lth QTL, where the alleles at the QTL are denoted by Ql and qli for the reference line and the ith tested line, respectively, and ulij=1 for the genotype QlQl and 0 for qliqli. The genotypic contributions of the QTL corresponding to QlQl and qliqli are denoted by al and bli, respectively, and eij is the residual error following a normal distribution with mean 0 and variance σe2. In multiple families that share a single parental line (the reference line), the effects of QTL alleles derived from the reference line are well estimated by a large number of degrees of freedom allocated for the estimation, but the instability in the estimation of the effects of the alleles from each tested line might be caused by the limited size of each family. For the QTL effects, we thus treated al as a fixed effect and bli as a random effect sampled from a normal distribution with mean 0 and variance σbl2. It is noted that the variance of bli is indexed by ‘l’ because the QTL effect has a specific distribution for each QTL.

Moreover, we incorporate a variable, sli, that indicates whether each QTL is segregating or not in each family, where sli=1 if the lth QTL is segregating in the ith family and sli=0 otherwise. When sli=0, the ith tested line has the same allele at the lth QTL as the reference line; accordingly, the genotypic values at the lth QTL are expressed as al for all individuals in the ith family. Denoting the genotypic contribution from the lth QTL to the phenotypic value by Δlij, we can write Δlij=sli{ulijal+(1−ulij)bli}+(1−sli)al. Therefore, considering a segregation variable sli, model (1) can be modified as

Prior and posterior distributions of parameters and variables

The parameters and variables included in model (2) and the locations of N QTL, denoted as λ1, λ2, …, λN, are collectively written as θ and are referred to as unobservables. The observed phenotypic values are denoted by y={yij} for i=1, 2,…, m and j=1, 2,…, ni for each i. The likelihood is written as

where nT = ∑i = 1m ni is total number of individuals in the whole families. Denoting the joint prior distribution and the joint posterior distribution of θ with p(θ) and a p(θy), respectively, we can write

where p(μ), p(σe2), p(al), p(bliσbl2, sli), p(σbl2sli,sl2,…,slm), p(sli), p(ulijλl), p(λl) and p(N) are the priors of components of θ. For μ, σe2 and al, we chose the following prior distributions, p(μ)1, p(σe2)1/σe2 and p(al)1.

It should be noted that bli is not included in the likelihood f(yθ) when sli=0, meaning that the lth QTL is not segregating in the ith family, whereas bli is included in the likelihood f(yθ) when sli=1. Therefore, the full conditional posterior distribution of bli is independent of the data y and equated to the prior p(bliσbl2, sli) when sli=0. Such priors as p(bliσbl2, sli=0) were referred to as ‘pseudo-priors’ by Carlin and Chib (1995) in the context of Bayesian model choice. We assumed that p(bliσbl2, sli)=φ(bli0, σbl2) for both sli=1 and sli=0, where φ(yc,d) denotes the normal density function with mean c and variance d. For p(σbl2sli) which is also a pseudo-prior, we assumed p(σbl2sli)1/σbl2 (Xu, 2003) for both sli=0 and sli=1 although this form of a prior of σbl2 leads to the improper posteriors of σbl2 and bli (ter Braak et al., 2005). We would give some consideration to the problem of improper posteriors in Discussion.

As the prior distribution of ulij, we adopted the conditional probability of a QTL genotype given linked marker genotypes near the QTL location as described by Jiang and Zeng (1997) for a biparental Ft population. The prior probabilities of sli=0 and 1 were given as 0.5 for QTL segregation. The prior distribution of λl is assumed uniform across the whole chromosomal region. The prior probability of N was a Poisson distribution with a pre-specified mean δ. In the following simulation experiments, we assumed that δ=2.

We estimate θ by using a Markov chain Monte Carlo (MCMC) algorithm. After the initial values are given to θ, MCMC cycles are repeated for updating the values. A Gibbs sampling scheme is applied to the update of θ except for N and λl (l=1, 2,…, N), which are updated based on Metropolis–Hastings algorithm (Metropolis et al., 1953; Hastings, 1970) including a reversible-jump MCMC (RJ-MCMC) sampling (Green, 1995) for N. Details of the updating process for θ are given in the Appendix A.

Simulation experiments

Simulation settings

We evaluated the proposed Bayesian method for the efficiency of detecting QTL segregating in any family and identifying families segregating at each QTL with the analyses of simulated data sets. We considered multiple F8 families, where a set of tested lines were crossed to a reference line to generate multiple F2 families from each of which F8 families were derived through single-seed descent with repeated cycles of selfing. In our simulation, we assumed that the family size, denoted by n, is equal for all families (that is, n=n1=n2=…=nm). We assumed three combinations for the number of families, m, and family size, n, as (m,n)=(50,40), (100,20) and (200,10) with total number of individuals in all families fixed as 2000.

The simulated genome consisted of four chromosomes, Chr1, Chr2, Chr3 and Chr4, each of length 100 cM, on which 21 markers per chromosome were located every 5 cM. We assumed that there were five alleles with equal frequencies at each marker in the founder generation. Accordingly, each allele was randomly allocated to each marker of the reference line and tested lines with probability 0.2 in our simulations. We generated three QTL, QTL1, QTL2 and QTL3, located at 23 cM on Chr1, 72 cM on Chr2 and 12 cM on Chr3; Chr4 harbored no QTL, and was used to investigate the false-positive rate (FPR), which is described in more detail in the next section. The numbers of QTL alleles existing in all tested lines were three for QTL1, two for QTL2 and five for QTL3. We denoted the kth allele at the lth QTL as Alk. We assumed that the reference line had the first allele at each QTL (that is, A11, A21 and A31). The allele frequency of Alk was denoted by flk and the QTL effect of the homozygote with Alk was denoted by αlk, which is referred to hereafter as the allelic effect of Alk. These frequencies and effects were set to the values shown in Table 1 for our simulations. The proportions of families segregating at each QTL, which were derived from the tested lines possessing the alleles other than Al1 at the lth QTL (l=1,2,3) were assumed as 0.3, 0.2 and 0.8 for QTL1, QTL2 and QTL3, respectively, as shown in Table 1. When generating each data set, the QTL alleles were randomly allocated to m tested lines such that the allele frequencies were those given in Table 1, where the allele allocation in the tested lines was recorded and used for summarizing the results of simulation analyses. In addition, we considered 10 unlinked biallelic additive QTL, each with equal frequency of two homozygous in founder lines and with effects of 0.1 and −0.1 for two homozygous, to include polygenic effects whose variances were summed to be 0.1. The phenotypic values of individuals in the F8 generation were determined by the sum of genotypic effects corresponding to the genotypes at the three QTL and 10 unlinked QTL and environmental effects sampled from a normal distribution with mean 0 and variance 1. The proportion of phenotypic variance explained by each QTL (referred to as PVQ) was also shown in Table 1, as this might affect the power of detecting each QTL.

Table 1 Summary of QTL and parameter values used to generate simulated data sets of multiple families

We generated 100 data sets for each of the three settings for (m,n). The power of the QTL detection and the accuracy in identifying the families that were segregating at each QTL were evaluated through analyses of the 100 data sets for each setting of (m,n). For comparison, the same data sets were also analyzed using a method based on interval mapping for multiple families proposed by Xu (1998), referred to as IM, treating QTL effect as a random effect due to the large number of families (that is, m50). Moreover, to evaluate the incremental efficiency obtained by incorporating a segregation variable, sli, we applied an additional Bayesian method based on model (1), without consideration of the segregation variable, to the analyses of simulated data sets. Hereafter, the Bayesian methods based on models (1) and (2) are referred to as Bayes1 and Bayes2, respectively. For each of the Bayesian methods, we performed 50 000 cycles of MCMC and sampled the values of the unobservables every 20 cycles during the last 40 000 cycles with the first 10 000 cycles discarded as burn-in.

In our Bayesian methods, the posterior QTL intensity (Sillanpää and Arjas, 1998) for each small interval with 1 cM length on the genome was calculated for QTL detection. We obtained a summed QTL intensity, referred to as SQI (Hayashi and Awata, 2008), by summing the posterior QTL intensity over all intervals on each chromosome, and used SQI as a test statistic for detecting QTL on a chromosome. Thresholds of SQI were determined from the empirical null distributions of the maximum of SQI over all chromosomes obtained by analyses of 100 null data sets that were generated on the assumption of no QTL in each setting of (m,n). The empirical null distributions of the maximum of SQI over all chromosomes were established by analyzing 100 null data sets for Bayes1 and Bayes2. The values of maximized SQI corresponding to 5% significant level of the empirical null distributions were regarded as the thresholds for SQI. When SQI exceeded the thresholds for any chromosome, detection of a QTL on the chromosome is declared. The Bayesian estimates of the positions and effects of the detected QTL were given in the analysis of each data set as described in Hayashi and Awata (2008), where the positions and effects of the QTL fitted in the model were averaged over the chromosome, with the QTL intensity of intervals that harbored the QTL used as a weight. Such a weighted average for the posterior probabilities of QTL segregation in each family (that is, sli=1) was also considered to identify the families that were segregating at QTL in Bayes2.

In IM, the likelihood-ratio test statistic (LRT) was adopted for QTL detection. Thresholds for LRT were determined similarly to the approach used for SQI. In IM, the position of the peak of LRT was regarded as the estimated QTL position.

Results of simulation experiments

Table 2 shows the powers of QTL detection and the estimates of the QTL position and effect of allele from a reference line at each QTL for Bayes1 and Bayes2 as well as IM in which the estimated of QTL effects were not given as variances of QTL effects were treated in IM with a random effect model (Xu, 1998). The averages and s.d. for the estimated QTL positions and QTL effects were calculated over the repetitions that successfully detected the QTL. In the simulation, Chr4, which harbored no QTL, was used to evaluate FPR, for QTL detection, where FPR was defined as the number of repetitions that falsely detected a QTL on Chr4 in the analyses of 100 data sets. For (m,n)=(50,40), (100,20) and (200,10), the respective FPRs were 1, 2 and 2 in IM; 1, 2 and 3 in Bayes1; and 2, 2 and 2 in Bayes2. Therefore, the thresholds corresponding to the genome-wide 5% significance level empirically determined by the analyses of 100 null data sets appropriately controlled the FPR for all three methods, such that the powers of these methods were suitably compared.

Table 2 Results of simulation experiments

The powers of QTL detection were decreased as the number of families (m) was increased with family size (n) decreased in all three methods. At a given (m,n), the Bayesian methods showed higher powers of detecting QTL than IM whereas the powers were comparable between Bayes1 and Bayes2. The powers of detection for QTL1 were much lower than those for QTL2 and QTL3, which were 38 and 41% for (m,n)=(50,40) and decreased to 26% for (m,n)=(100,20) and to 6 and 14% for (m,n)=(200,10) with Bayes1 and Bayes2, respectively. As shown in Table 1, PVQ of QTL1 was considerably smaller than that of the other QTL, and this was responsible for the poor powers for QTL1. For QTL2 and QTL3 with moderate PVQ values, both Bayesian methods showed higher powers than IM; powers were higher than 80% at (m,n)=(200,10) and increased to about 95% at (m,n)=(50,40) with the Bayesian methods.

The estimates of the positions were slightly biased for QTL2 and QTL3, but were noticeably biased for QTL1 at (m,n)=(200,10) in the Bayesian methods. The estimates of the effects of the alleles from the reference line obtained with Bayesian methods were considerably biased for QTL2 and QTL3. For example, the simulated effect of QTL2 was −0.8 (Table 1), but the estimates were shrunk towards zero (Table 2). This shrinkage was less in Bayes2 than in Bayes1; that is, in Bayes2, the estimated values were closer to the simulated values given in Table 1. Bayes2, however, provided biased estimates of the effects of QTL3 for (m,n)=(50,40) and (100,20), where the simulated effect was given as −0.2, but the respective estimates were inflated to −0.32 and −0.37, respectively (Table 2).

In Table 3, we have summarized the inferences about QTL segregation in the families for the analyses with Bayes2. In the analysis of each simulated data set, we obtained a posterior probability of QTL segregation (i.e., sli=1) for each family and averaged the probabilities over the families derived from tested lines that possessed identical alleles at each QTL. We further averaged the probabilities over the repetitions with successful detection of the QTL in each setting of (m,n) and the results are listed for each allele in the rows labeled ‘Probability of segregation’ of Table 3.

Table 3 Inference for QTL segregation in each family using Bayes2 in simulation experiments

In addition, to evaluate the ability of Bayes2 to identify tested lines with alleles that differ from that of the reference line, resulting in QTL segregation in the families derived from the tested lines crossed with the reference line, we investigated the proportions of the tested lines with the posterior probabilities of QTL segregation exceeding two pre-determined values 0.6 and 0.9 for each QTL allele in the replications with successful QTL detection (Table 3). For example, consider QTL1 at a setting of (m,n)=(50,40). There were 35 lines with allele A11, five lines with A12 and 10 lines with A13 in each replication, given allele frequencies 0.7, 0.1 and 0.2 for A11, A12 and A13, respectively, as given in Table 1. Therefore, the total numbers of the tested lines with A11, A12 and A13 investigated in 41 replications that successfully detected QTL1 were 1435, 205 and 410, respectively, from which we obtained the numbers of tested lines with posterior probability for QTL segregation exceeding 0.6 as 215, 125 and 98, with proportions 15, 61 and 23%, respectively, as listed in Table 3. Similarly, the proportions of tested lines with posterior probability of QTL segregation exceeding 0.9 were 1, 20 and 3% for the lines with alleles A11, A12 and A13, respectively. As the first alleles at three QTL (A11, A21 and A31) were allocated to the reference line in our simulations, the proportions of the tested lines possessing these QTL alleles with posterior probabilities of QTL segregation greater than 0.6 or 0.9 were regarded as the false discovery rates for QTL segregation in the non-segregating families derived from the tested lines. For the tested lines with QTL alleles that differed from those of the reference line, the proportions indicated the capability of correct identification for the families, derived from the tested lines, which segregated for the QTL. The accuracies of inference for segregating tested lines were enhanced as the effects of alleles or the family size increased (Table 3).

Discussion

The efficiencies of the Bayesian methods in analyzing simulated data sets

As shown in Table 2, the powers of QTL detection were greater for both Bayesian methods than IM, indicating the possibility that information on QTL that distinguishes a reference line relative to the tested lines might be effectively elucidated by the Bayesian methods using the experimental design adopted in this study. Especially, in a setting of (m,n)=(200,10), Bayes2 method showed noticeably higher powers (87%) for the detection of QTL2 and QTL3 than IM (35 and 50%).

However, the estimates of the QTL effects of alleles from the reference line were biased in the Bayesian methods (Table 2). This might have been caused by inaccuracies in the inference about QTL segregation in each family. In Bayes1, as the inference about QTL segregation in each family was not incorporated to the analyses, the estimates of alleles from the reference line were shrunk to zero. In Bayes2 which could infer QTL segregation in each family using a variable indicating QTL segregation, the accuracies in the inference were varied depending on the effects of QTL alleles and the combinations of the number of families and each family size (Table 3), as was the accuracies in the estimation of QTL effects. For example, at QTL3, the QTL segregation in the families from the tested lines with allele A33 were frequently undiscoverable, where the posterior probabilities of segregation were only 0.30, 0.33 and 0.44 for (m,n)=(50,40), (100,20) and (200,10), respectively. Accordingly, allele A33 was frequently misidentified as the allele from the reference line, A31, especially, in (m,n)=(50,40) and (100,20). Therefore, the effect of A33 (α33=−0.6, Table 1) was confounded with the effect of A31 (α31=−0.2, Table 1) causing considerable downward bias in the estimates of α31 in (m,n)=(50,40) and (100,20), as shown in Table 2. The accuracies in the inference about QTL segregation decreased as the number of families increased and each family size decreased owing to sampling error in segregation caused by small family size. Taking QTL2 as an example, the power of identifying QTL segregation in the families with allele A22 reduced as m, consequently, the estimate of α21 was increasingly biased with increasing m (Tables 2 and 3).

The posterior probability of segregation in each family at each QTL obtained with Bayes2 can be used to identify the tested lines that have QTL alleles different from that of the reference line. As shown in Table 3, at (m,n)=(50,40), tested lines that had QTL alleles with effects greatly different from the QTL alleles of the reference line were efficiently identified. For example, the power of correctly identifying the segregation at QTL3 in the families derived from the tested lines having alleles A35 was 72% based on the criterion of the posterior probability of segregation greater than 0.6. In this criterion, however, the false discovery of segregation at QTL3 in non-segregating families, which were derived from tested lines with allele A31, occurred at a rate of 14%. Increasing the threshold for the posterior probability of segregation to 0.9 decreased the power of correct identification of segregating families to 16% for the allele A35 at QTL3, but the rate of false discovery of QTL segregation for non-segregating families was negligible (Table 3). Using the threshold of 0.9 for the posterior probability of QTL segregation in (m,n)=(50,40), tested lines with A22 at QTL2 were still correctly identified with 55% as having a different allele from that of the reference line. Therefore, Bayes2 showed a practical capability to identify tested lines with QTL alleles different from that of a reference line in (m,n)=(50,40) although the rates of successful identification for the segregating families were lower at settings of (m,n)=(100,20) and (200,10), as shown in Table 3.

MCMC algorithm in Bayesian model selection for multiple families

The dimensionality of the parameters in the models for QTL mapping changes depending on the number of QTL included. Although effective sampling schemes based on Gibbs sampling, such as stochastic search variable selection (SSVS) (Yi et al., 2003) and Bayesian shrinkage estimation (Xu, 2003), have recently been proposed, we adopted RJ-MCMC for the inference of the number of QTL, N, in this study. A Gibbs sampling scheme for model selection can only be performed over a composite model that is a product space of candidate models and their parameters (Godsill, 2001). The model for multiple families considered in the present study is determined not only by the number of QTL but also by the configurations of QTL alleles in the tested lines in contrast to a model for the QTL analysis in a biparental cross family, where the model is simply determined by the QTL number. Accordingly, the number of possible models becomes intractably large in the analysis of multiple families designed in this study, in which the composite model space is difficult to be dealt with for Gibbs sampling schemes. Therefore, we chose the RJ-MCMC sampling for estimation of the QTL number and a random model approach was introduced to cope with the enormous number of possible configurations of alleles in the tested lines at each QTL for the Bayesian estimation.

In the Bayesian method proposed by Xu (2003), the effects of QTL assumed at each position in a genome were treated as random effects, the priors of which were normal distributions with mean zero and different variances for different QTL. In our study, the priors of the effects of QTL alleles from tested lines were also assumed as normal distributions with mean zero and different variances for different QTL. In Bayes2, in addition, we incorporated a binary variable indicating QTL segregation in each family at each QTL, which can be regarded as analogous to the indicator variable for the presence of a QTL at each genome position used by Yi et al. (2003) for SSVS. Although Jannink and Wu (2003) applied RJ-MCMC for the inference about allele configurations in multiple interconnected families, accurate estimation of the allele configurations in a large number of families with moderate to small family sizes would be difficult as the number of possible configurations becomes enormously large owing to the increase in the number of potential alleles. Moreover, the difference in alleles between the tested lines can only be indirectly inferred in the multiple families, considered in the present study, through a single reference line shared by the families, which would make suitable configuration of alleles in the families more difficult.

As shown in the simulation experiments (Tables 2 and 3), Bayes2 might be a practical method to detect QTL segregating between a reference line and tested lines and to allow the inference about QTL segregation in each family unless the family size is too small, as in the setting of (m,n)=(50,40) in simulations. Slower convergence and a poorer mixing property of RJ-MCMC compared with Gibbs sampling would be compensated for to some extent by increasing the iterations, which is possible for the high-performance computers that are now available without requiring excessive computational time.

For the prior of the variance σbl2 of QTL effects bli of the alleles from the tested lines, we adopted p(σbl2sli)1/σbl2. As shown by ter Braak et al. (2005), this form of a prior for σbl2 yielded the improper posteriors for σbl2 and bli, which had infinite mass near zero, thus, if the Markov chain truly converged, the values of σbl2 and bli should be fixed at zero. Hobert and Casella (1996) discussed that the MCMC procedure with improper posterior cannot converge. They found, however, that the posterior sample from the MCMC could show nice-looking behavior despite the improper posterior. In our method (Bayes2), our main concern is to detect the segregation of QTL in each family, indicated by a binary variable sli, rather than to estimate σbl2 and bli. Posterior samples of sli might be robust to the impropriety of the posteriors of σbl2and bli. Therefore, in the present analyses, we daringly used the prior p(σbl2sli)1/σbl2 and the posterior samples of σbl2 and bli seemingly behaved well along with sli while this problem of the improper posterior requires further consideration.

One might be interested in the influence of a prior mean of QTL number, N, on the power of QTL detection for Bayesian methods. We, thus, applied additional analyses for the same data sets used in simulations assuming the prior means of N equal to 1 and 10. As the results of these additional analyses, we obtained almost the same powers as the original analyses with the prior mean of N being 2 (results not shown).

Utility of multiple families derived from germplasm collections

The germplasm collections that have been recently established for some crops are useful for association mapping of traits of economic importance. Some statistical methods, including mixed linear model and Bayesian method, have been devised for whole genome association studies in such collections (Yu et al., 2006, 2008; Iwata et al., 2007). A whole genome scan with association mapping requires a considerable number of markers that cover the entire genome at high density, making it both expensive and time-consuming. Therefore, multiple populations of segregating families derived by crossing some accessions in the collection with a reference line such as a popular commercial variety, as described here, would be valuable for obtaining preliminary QTL information, including the number of QTL and their positions for a subsequent association study, in which the target regions can be confined to the QTL regions estimated from the preliminary linkage QTL analysis. The population structure present in the original collections, which decreases the efficiencies in association mapping, is also minimized by reshuffling the genomes of two parents in each family to construct the multiple families (Yu et al., 2008). Therefore, adopting a linkage mapping strategy in the multiple families derived from germplasm collections will improve the power of QTL detection although the mapping resolution is inferior to association mapping approach. In addition, in the analysis of the multiple families described in the present study, we can select the tested lines that will be useful for the future breeding programs. The results of our study show that the Bayesian method developed for analyzing such families can play a practical role in QTL analysis in germplasm collections.

The program (written with Fortran 77) used in the simulation experiment of this study can be applied to actual data of multiple RILs derived from crossing a reference line to several tested lines and a Windows executable version of the program is available on request to the authors.