Introduction

Pairwise relatedness at a single locus and its correlation among linked loci along chromosomes give insights into patterns of genomic evolution in natural populations. Pairwise relatedness reveals the genetic similarity between individuals due to recently shared ancestors, and is affected by several evolutionary forces (Wright, 1969). The correlation of pairwise relatedness and its ‘spatial’ pattern along chromosomes may reflect differential processes of co-evolution that have occurred among different regions of chromosomes. Hitchhiking effects and background selection (Maynard Smith and Haigh, 1974; Charlesworth et al, 1993) can enhance a positive regional correlation of pairwise relatedness. Recombination separates linked alleles that have a common ancestor, and hence alters the coalescence process for linked loci (eg Hudson, 1991), resulting in a negative correlation in pairwise relatedness for the two loci. Random drift or founder effects can also reduce the correlation in pairwise relatedness. In general, the joint effects of different forces can bring about a nonuniform distribution of the correlation of relatedness along chromosomes. Thus, a combination of linkage maps and the correlation of pairwise relatedness among loci can help us understand the ‘patchy’ pattern of genomic evolution.

The significance of pairwise relatedness has long been appreciated in understanding population genetic structure and evolution (Wright, 1922, 1969; Cotterman, 1940; Jacquard, 1974). Estimation of the pairwise relatedness is now simplified by the advent of abundant polymorphic markers, such as microsatellite markers and single-nucleotide polymorphisms (SNPs) (eg Brookes, 1999). The populations studied can either be populations with known pedigree or populations with unknown pedigree. There have been numerous studies exploring the marker-based statistical methods for the purpose of estimating pairwise relatedness (eg Thompson, 1975; Pamilo and Crozier, 1982; Lynch, 1988; Queller and Goodnight, 1989; Ritland, 1996; Lynch and Ritland, 1999; Wang, 2002; Milligan, 2003). However, these previous analyses mainly focus on the average pairwise relatedness per locus and have not been coupled with assessment of the genome-wide diversity in relatedness. The heterogeneity of pairwise relatedness along chromosomes cannot be assessed using these methods.

Previous theories on kinship mapping, the relationship between the joint probability of identity-by-descent (IBD) of linked markers and the recombination fraction (Morton et al, 1971; Morton and Simpson, 1983) did not examine the interaction among linked loci in terms of the correlation of relatedness. A recent theoretical study evaluated the role of drift, gene flow, selfing, and mutation in affecting the association of gene identity, and shows that a small identity disequilibrium (ID) within subpopulations is present (Vitalis and Couvet, 2001a). ID is termed as the difference between the joint two-loci probability of identity-in-state (IIS) and the expected product of each component's IIS probability. Such an ID approach is essentially distinct from the correlation of relatedness, which assesses the difference in terms of IBD. Unlike Vitalis and Couvet (2001b), who apply ID to jointly estimate the local effective population size and migration rate, I use the joint probabilities of two- and three-loci IBD to address the correlation of pairwise relatedness among loci.

The purpose of this article is to develop a new method for jointly estimating the pairwise relatedness at a single locus and its correlation among loci. Unlike previous methods where pairwise relatedness is estimated at individual loci and then weighted to gain the average (eg Ritland, 1996; Lynch and Ritland, 1999; Wang, 2002), the present method relies on the estimation of the joint probabilities of IBD at two or three loci.

In my analyses of pairwise relatedness and its correlation among loci, data can be randomly sampled from either the haploids or diploids genotyped with codominant markers, each with an arbitrary number of alleles. The use of haploid and of diploid genotypic data is evaluated through extensive Monte Carlo simulations. The preexisting approaches are based on sampling diploids (eg Ritland, 1996; Lynch and Ritland, 1999). However, the approach of using the haploid genotypic data allows the method to be applicable to specific chromosomes, such as the sex chromosomes, and has an advantage of ignoring the effects of mating system.

Two statistical methods are extensively evaluated through simulation study: the weighted least squares (LS) and the maximum likelihood (ML) methods. Application of the simulation results to practical analysis is discussed.

Two-loci relatedness

Throughout the study, the concept of pairwise relatedness at a single locus refers to the probability that an allele randomly sampled from one individual is IBD with an allele at the same locus randomly sampled from another individual (eg Jacquard, 1974; Ritland, 1996; Lynch and Ritland, 1999). Rousset (2002) reviewed some properties of this relatedness definition.

Haploid data

Consider a pair of two linked co-dominant loci in a population, denoted by A and B, with numbers of nA and nB alleles, respectively. Let pu and qv be the frequencies of alleles Au (u=1, 2, …, nA; ∑upu=1) and Bv (v=1, 2, …, nB; ∑vqv=1), respectively. Following the classical definition on the probability of IBD at a single locus (Jacquard, 1974), four parameters are defined in the two-loci case: κ11 (0≤κ11≤1) is the probability that the two alleles at each of the two loci are IBD, κ10 (0≤κ10≤1) is the probability that the two alleles at the A locus are IBD but the two alleles at the B locus are not, κ01 (0≤κ01≤1) is the probability that the two alleles at the B locus are IBD but the two alleles at the A locus are not, and κ00 (=1−κ11−κ10−κ01) is the probability that the two alleles at each of the two loci are not IBD. Note that there are several definitions of the probability of two-loci descent (Whitlock et al, 1993; Vitalis and Couvet, 2001a 2001b; Laurie and Weir, 2003), but the present definition of κ11 is actually the same as the definition F11 of Whitlock et al (1993).

The basic approach for estimating the relatedness is to use the probabilities of IIS to infer the relatedness parameters, similar to previous studies (eg Ritland, 1996). Denote by Puvu′v′ the probability that a pair of two-loci gametes have genotypes AuBv and Au′Bv′ (u, u′=1, 2, …, nA; v, v′=1, 2, …, nB). When linkage disequilibrium (LD) is absent, Puvu′v′ can be decomposed as

where δuu′ is Kronecker delta variable, which is equal to unity when u=u′ and zero otherwise. The factor 2 - ( 1 - δ u u ′ ) ( 1 - δ v v ′ ) in the coefficient of κ00 on the right side of equation (1) is introduced so that the coupling and repulsion linkage phases can be separated, which is distinct from the previous four-gene case at a single locus (Lynch and Ritland, 1999). Note that the gamete and allele frequencies in equation (1) are assumed known beforehand with sufficient accuracy, as in previous studies (eg Ritland, 1996, 2000; Lynch and Ritland, 1999).

When LD is present, a more general expression of Puvu′v′ can be written as

where guv and gu′v′ are the frequencies of gametes AuBv and Au′Bv′ in the population, respectively.

In the case of two alleles per locus (say Ai, Aj, Bk, and Bl), there are four categories of haplotype pairs according to the number of shared alleles: IIS for both the two alleles of each locus (AiBk−AiBk, AiBl−AiBl, AjBk−AjBk, AjBl−AjBl), IIS for the two A alleles but not for the two B alleles (AiBk−AiBl, AjBk−AjBl) and the reverse case (AjBl−AiBl, AiBk−AiBl), and no shared alleles for both loci (AiBk−AjBl, AjBk−AiBl). There are 16 types of haplotype pairs, but only 10 of them are distinguishable with codominant markers (Table 1). For an arbitrary number of alleles at each locus, the number of distinguishable haplotype pairs, denoted by nAB, is shown to be equal to nAnB(nAnB+1)/2.

Table 1 Probabilities for the two-loci haplotype pairs (two alleles per locus: Ai, Aj; Bk, Bl)

Diploid data

When diploid genotypic data are used, the preceding method can be applied as long as the gamete frequencies are available. However, estimation of gamete frequencies requires the assumption of random association of gametes (Hill, 1974; Weir, 1996; Kalinowski and Hedrick, 2001); otherwise, the relative proportion of heterozygotes with repulsion versus coupling linkage phases is indistinguishable. When there are nonrandom associations between gametes in forming zygotes (Yang, 2002), expectation–maximization (EM) (Dempster et al, 1977) and other methods are not applicable. The following method is only valid under the assumption of random association of gametes.

Denote by HAB the probability that both A and B loci are heterozygous, HA the probability that the A locus is heterozygous but the B locus is not, HB the probability that the B locus is heterozygous but the A locus is not, and H01 (H01=1−HA−HB−HAB) the probability that both loci are homozygous. I obtain

Note that equation (3d) differs from the previous study where only one parameter is considered (Morton and Simpson, 1983). When only one locus is considered, equations (3a), (3b), (3c) and (3d) reduce to the classical results (Falconer and Mackay, 1996, p 66). In practice, the heterozygote frequencies (H01–HAB variables) can be estimated directly from the genotypic data. Thus, according to equations (3a), (3b), (3c) and (3d), the three unknown parameters (κ11, κ10, and κ01) can be estimated.

Correlation of pairwise relatedness

Denote by rA and rB the probabilities of pairwise relatedness at the A and B loci, respectively, and cr the covariance of the probabilities of pairwise relatedness between A and B. If the three unknown parameters (κ̂11, κ̂10, and κ̂01) are estimated, the probability of pairwise relatedness at a single locus (rA and rB) and its covariance among loci (cr) can be calculated from the following equations:

Solution to equations (4a), (4b) and (4c) is r̂A=κ̂11+κ̂10, r̂B=κ̂11+κ̂01, and ĉr=κ̂11κ̂00−κ̂10κ̂01

From equation (4a), cr can also be viewed as the kinship disequilibrium since it is expressed as the difference between the joint probability of two-loci IBD and the product of single-locus probability of IBD, analogous to the definition of ID (Vitalis and Couvet, 2001a). cr may be negative when κ11κ00<κ10κ01, or positive when κ11κ00>κ10κ01. Theoretically, cr is associated with the recombination fraction, or inversely proportional to the physical distance between the two linked loci. A smaller distance between two loci implies a stronger correlation of their relatedness. In order to make the correlation be comparable among different pairs of loci, the correlation coefficient of pairwise relatedness is defined as ĉr(r̂Ar̂B(1−r̂A)(1−r̂B))−1/2 so that its value ranges from −1 to 1. This formula can also be proven using the general definition of statistical correlation (see also Hartl and Clark, 1989, pp 53–54).

Three-loci relatedness

Compared with the two-loci analysis, the advantage of a three-loci analysis is that it considers the event of double crossovers during meiosis and hence can give more precise estimates.

Haploid data

The preceding two-loci method can be extended to the three-loci case. Suppose that an additional marker C with nC (≥2) alleles is linked to the A and B markers. The ordering of the three loci is unknown. Denote by κ111 (0≤κ111≤1) the joint probability that the two alleles at each of the three loci are IBD, κ110 (0≤κ110≤1) the joint probability that the two alleles at both A and B loci are IBD but the two alleles at the C locus are not IBD. The definitions of other parameters κ101–κ000 can be given in a similar way. The joint probability for a pair of three-loci gametes (AuBvCw–Au′Bv′Cw′; w, w′=1, …, nC), denoted by Puvwu′v′w′, can be written in a general formula,

where ow is the frequency of allele Cw in the population, and guvw and gu′v′w′ are the frequencies of gametes AuBvCw and Au′Bv′Cw′, respectively. For an arbitrary number of alleles at each locus, the number of distinguishable three-loci gamete pairs, denoted by nABC, is shown to be equal to nAnBnC(nAnBnC+1)/2.

Diploid data

As in the two-loci case, denote by HAC the probability that both A and C loci are heterozygous, HBC the probability that both B and C loci are heterozygous, HABC the probability that the three loci are heterozygous, and HO2 the probability that three loci are homozygous. With the assumption of random association of gametes, I obtain

The seven unknown parameters (κ111–κ001) can be solved using equations (6a), (6b), (6c), (6d), (6e), (6f), (6g) and (6h), provided that the frequencies of alleles and three-loci gametes are available with sufficient accuracy.

Correlation of pairwise relatedness

Denote by cr1, cr2, and cr3 the covariances of pairwise relatedness between the A and B loci, the B and C loci, and the A and C loci, respectively. Let e θ A θ B θ C (θA, θB, θC=0, 1) be the residual part of κ θ A θ B θ C ( ∑ θ A ∑ θ B ∑ θ C κ θ A θ B θ C = 1 ) after the deduction of individual components of the covariances of two-loci relatedness. The residual part comes from the effects of double crossover among the three loci. Thus, k θ A θ B θ C can be written in a general form,

There are eight configurations of the sequence of θAθBθC, that is, (111), (110), (101), (011), (100), (010), (001), and (000). The probabilities of relatedness at individual loci can be estimated by r ^ A = ∑ θ B ∑ θ C k ^ 1 θ B θ C , r ^ B = ∑ θ A ∑ θ C k ^ θ A 1 θ C , and r ^ C = ∑ θ A ∑ θ B k ^ θ A θ B 1 . According to equation (7), the coefficients for the three covariances ( ( - 1 ) θ A + θ B , ( - 1 ) θ B + θ C , and ( - 1 ) θ A + θ C ) are the same between the partitions of κ111 and κ000, κ110 and κ001, κ101 and κ010, and κ011 and κ100. Thus, I can only use the partitions of four three-loci relatedness values (κ111, κ110, κ1 01, and κ011) to estimate cr1, cr2, and cr3. The analytic solution from the least-square method is given by

Denote by R1, R2, and R3 the correlation coefficients of relatedness between the A and B loci, the B and C loci, and the A and C loci, respectively. Estimates of these three correlation coefficients are respectively given by

Monte Carlo simulation

Like previous studies of the pairwise relatedness at a single locus (eg Ritland, 1996; Lynch and Ritland, 1999; Wang, 2002), the aims of the simulations are to examine the effects of (i) sample size, (ii) allele frequency distribution, (iii) the type of data sets (haploid or diploid), and (iv) LD.

Statistical methods

The weighted LS and ML methods are used to estimate the correlation coefficient of pairwise relatedness. With the two-loci haploid genotypic data, estimates of pairwise relatedness with the weighted LS method can be written as

where κ̂ is the vector of (κ̂11,κ̂10,κ̂01)′, 1 is the vector of ( 1 , 1 , ... , 1 ) ′ n A B × 1 , X is the known coefficient matrix with nAB × 3 elements calculated from equation (2), Y is the known vector ( y u ′ v ′ u v ) n A B × 1 in which

W is the known diagonal matrix with the diagonal element being wuvu′v′=1/Puvu′v′(1− Puvu′v′), and is the estimate of the mean of yuvu′v′.

For the ML method, the likelihood function is set as , where Muvu′v′ is the observed number of gamete pairs (AuBv−Au′Bv′) in the random sample with N haploids . ML estimates are obtained through Newton–Raphson iteration. The estimate of κ̂ at the (t+1) step is iteratively calculated by

where s(κ) is the score vector, equal to , and is the Fisher information matrix. The above iterative calculation is continued until is sufficiently small (convergence).

When the estimates of pairwise relatedness are obtained, the correlation coefficients of pairwise relatedness are calculated according to ĉr(r̂Ar̂B(1−r̂A) (1−r̂B))−1/2 for the two-loci case and equations (9a), (9b) and (9c) for the three-loci case. The correlation coefficients of pairwise relatedness in the other cases (two-loci diploid, three-loci haploid and diploid) can be estimated in a way similar to the above two approaches.

Data generation

The simulated samples with the haploid or diploid data are generated in the following steps. Given a set of parameters, including LD, the number of alleles, allele frequencies and the distribution type, and pairwise relatedness (κ11–κ00 in the two-loci case and κ111–κ000 in the three-loci case) calculate the probabilities for each two-gamete pair according to equation (2) for the haploid case, and the probabilities of each type of heterozygote according to equations (3a), (3b), (3c), (3d) and equations (6a), (6b), (6c), (6d), (6e), (6f), (6g) and (6h) for the diploid case. Then, use this probability distribution (multinomial distribution) to create random samples. It can be shown that a sample of N/2 diploids (or N haploids) can generate a total of N(N−1) gamete pairs for either the two- or the three-loci case. Simulation programs in C are available upon request.

In all, 5000 independent data sets are created, and each is used for estimating the correlation coefficients of pairwise relatedness according to the theories described in the preceding two sections. Means and standard deviations of estimates are calculated from these replicated data sets.

Results

With the two-loci haploid data and the weighted LS method, average estimates of the correlation coefficients of pairwise relatedness gradually become consistent with their actual values as the sample size increases (Figure 1a). The standard deviations decrease with the number of haploids (Figure 1b). However, there are large differences in the sample sizes required for obtaining appropriate estimates: 50 haploids for the two-allele case, 120 haploids for the four-allele case, and more than 180 haploids for the eight-allele case (Figure 1a).

Figure 1
figure 1

Effects of sample size: (a) average correlation coefficients of pairwise relatedness; (b) standard deviations. The results are obtained from 5000 independent runs under the uniform distribution of allele frequencies, with the two-loci haploid data, the weighted LS method, and linkage equilibrium. Two-loci relatedness values are set as κ11=0.3, κ10=0.2, and κ01=0.2.

A difference between the weighted LS and ML methods is that the ML method can give better estimates when the sample size is small. For example, when the number of haploids is greater than 40 for the four- or eight-allele case, an appropriate estimate of the correlation coefficient of pairwise relatedness can be obtained (Figure 2a, b).

Figure 2
figure 2

Effects of sample size: (a) four-allele case; (b) eight-allele case. The results are obtained from 5000 independent runs under the uniform distribution of allele frequencies, with the two-loci haploid data, the ML method, and linkage equilibrium. Two-loci relatedness values are set at κ11=0.3, κ10=0.2, and κ01=0.2.

The effects of the distribution of allele frequency (uniform versus triangular distribution; see Lynch and Ritland, 1999) on the LS or ML methods are very small. Although the triangular distribution produces a slightly greater standard deviation than does the uniform distribution (Figure 3), there are no differences in obtaining the unbiased average of the correlation coefficients of pairwise relatedness.

Figure 3
figure 3

Effects of the uniform versus triangular distributions of allele frequencies. The results are obtained from 5000 independent runs with the two-loci haploid data (two alleles per locus) and linkage equilibrium. Two-loci relatedness values are set at κ11=0.3, κ10=0.2, and κ01=0.2.

Unbiased estimates of the correlation coefficient of pairwise relatedness can be obtained when LD is present and the sample size is appropriate. For example, there are no significant differences when LD is changed from 0 to 0.2 in the two-allele case (Figure 4a, b). Also, no significant differences are observed between the weighted LS and ML methods.

Figure 4
figure 4

Effects of LD: (a) average correlation coefficients; (b) standard deviations. The results are obtained from 5000 independent runs under the uniform distribution of allele frequencies, with the two-loci haploid data (two alleles per locus), and the weighted LS method. Two-loci relatedness values are set at κ11=0.3, κ10=0.2, and κ01=0.2.

With the two-loci diploid data, an unbiased estimate of the correlation coefficient of pairwise relatedness can be obtained with each of the two methods when an appropriate sample size is provided. Compared with the haploid case, the optimal sample size is smaller. For example, a good estimate can be obtained with sampling 40 diploids in the four-allele case (Figure 5). Both the weighted LS and ML methods have the same performance (Figure 5).

Figure 5
figure 5

Comparison of the weighted LS versus the ML methods in terms of the standard deviation of estimates. The results are obtained from 5000 independent runs under the uniform distribution of allele frequencies, with the two-loci diploid data (four alleles per locus), and linkage equilibrium. Two-loci relatedness values are set as κ11=0.3, κ10=0.2, and κ01=0.2.

With the three-loci diploid data, the unbiased average estimates of the three correlation coefficients of pairwise relatedness can be simultaneously obtained when the sample size is appropriate. For example, when the sample size is more than 80 individuals, the three unknown parameters (R1, R2, and R3) can be estimated with a good accuracy and precision in the case of four alleles per locus (Figure 6a–f). Both the weighted LS and the ML methods have a very similar performance.

Figure 6
figure 6

Effects of sample size on three-loci relatedness analysis: (a) average correlation coefficients between A and B loci (R1); (b) standard deviations (R1); (c) average correlation coefficients between B and C (R2); (d) standard deviations (R2); (e) average correlation coefficients between A and C loci (R3); and (f) standard deviations (R3). The results are obtained from 5000 independent runs under the uniform distribution of allele frequencies, with the three-loci diploid data (four alleles per locus), and linkage equilibrium. Three-loci relatedness values are set as κ111=0.15, κ110=0.1, κ101=0.05, κ011=0.08, κ100=0.08, κ010=0.05, and κ001=0.1.

Discussion

In this paper, I have shown that the correlation of pairwise relatedness among loci along chromosomes can be estimated from the approach of partitioning the joint probabilities of IIS into the probabilities of IBD at two or three loci. Such an approach of using two- or three-loci probabilities of IBD enables the estimation of individual pairwise relatedness and its correlation among loci simultaneously, and allows us to study the picture of ‘landscape’ relatedness along chromosomes and to infer the naturally occurring pattern of co-evolution. Unlike traditional kinship mapping, which pictures the ‘static’ relationships between the physical position of markers and IBD (eg Morton and Simpson, 1983), the map of the correlation of relatedness implies ‘dynamic’ relationships among linked loci.

When the linkage maps of all the markers assayed are available beforehand, the relatedness at each locus and the correlation of relatedness among linked loci can be readily mapped. Since the present method involves only one-generation data randomly sampled from the population with unknown pedigree, the recombination fraction between linked loci cannot be estimated. The physical linkage map of relatedness cannot be directly constructed. However, the following properties are likely applicable to constructing the map of pairwise relatedness and its correlation among loci. First, a significant difference of κ11 from zero indicates that the two loci are likely to be linked. Second, the joint probability of two-loci IBD (κ11) is negatively correlated with the mapping distance between the two loci, while the joint probabilities of one-locus IBD and one-locus non-IBD (κ10 and κ01) are positively correlated with mapping distance. Thus, a larger κ11 and a smaller κ10 or κ01 indicate a shorter distance between the two loci. Third, a larger positive correlation of relatedness indicates a shorter physical distance, while a larger negative correlation of relatedness indicates a larger physical distance. These properties can be combined for ordering markers. Among all possible linkage maps for a given set of markers, the optimal one should have the largest sum of all κ11's among adjacent markers.

The analytical formulae presented here are only suitable for the case of two- and three-loci relatedness, where each locus has an arbitrary number of alleles. When many loci on a chromosome (more than three loci) are analyzed, these loci can be analyzed in terms of two- or three-loci as a unit, similar to the procedure of classical linkage mapping analysis. The individual two-or three-loci results are then jointly analyzed to map the correlation of pairwise relatedness.

Statistically, the critical problem for the weighted LS method with haploid data is that the number of distinguishable pairs is substantially increased with the number of alleles. Only when the number of sampled haplotype pairs is much greater than the number of distinguishable haplotype pairs can an appropriate estimate be obtained. Thus, the optimal sample size varies with the number of alleles. The present simulations suggest sample sizes of 100–200 haploids for moderate numbers of alleles, the lower bound for the two-allele case, the upper bound for the eight-allele case. However, such situations are not often met in the single locus case (Ritland, 1996; Lynch and Ritland, 1999; Wang, 2002), where the sample size is much larger than the expected number of distinguishable two-gene pairs.

Compared with the haploid case, the weighted LS method with the diploid data has a better performance, and the optimal sample size is also smaller (eg 40 diploids for two loci, each with four to eight alleles). The reason for the better performance of the weighted LS method with the diploid than with the haploid data is a small number of ‘units of observations’ – the heterozygous types, that is, four in the two-loci case and eight in the three-loci case. These ‘condensed’ variables contain all possible sampled haplotype pairs, and display a ‘robust’ property even when the sample size is small. Clearly, the ML method for the haploid data is suggested when highly polymorphic markers (eg, ≥4 alleles per locus) and a small sample size are used. However, either the weighted LS or ML method can be applied with diploid data.

The advantage of the ML over the weighted LS method with the haploid data is that all information is utilized, including different sampling variances for individual pairs and the correlation between different pairs. This can be seen from the Fisher information matrix (F(κ)). One of the assumptions underlying the weighted LS method is the independence among different observations of distinguishable gamete-pairs, which is actually violated.

Another striking result is the same performance for the ML method with either the haploid or diploid data. The condensed H variable does not affect the accuracy and precision of estimation, compared with the analysis with the haploid data. The reason for such robust behavior is that the score function (s(κ)) and the Fisher information matrix (F(κ)) are essentially the same with either approach under the assumption of random association of gametes, and this can be shown algebraically. However, the advantage of using diploid over haploid data is significant in practice.

There are two distinctions between the present two-loci haplotype approach and the previous ‘four-gene’ pairs at a single locus in pairwise relatedness analysis. There are three parameters in the former (κ11, κ10, and κ01), but only two parameters in the latter (eg Ritland, 1996; Lynch and Ritland, 1999). There are 10 informative pairs in the former for a four-gene case (two alleles per locus), but six in the latter (four alleles per locus). Such distinctions imply that a larger sample size is required in the two-loci analysis, compared with the case of a single locus.

Finally, one must acknowledge the assumptions underlying the present method of estimating the correlation of pairwise relatedness. First, the method is based on the assumption of the availability of accurate and precise estimates of gamete and allele frequencies (eg Ritland, 1996; Lynch and Ritland, 1999; Ritland, 2000). In practice, gamete and allele frequencies will probably be estimated from the same data sets used for relatedness analysis. Clearly, biased estimates of these frequencies can bring about biased estimates of the correlation coefficients of pairwise relatedness. Second, the present diploid approach is based on the assumption of random association of gametes. Extension to partially selfing populations is clearly needed in future study, as selfing populations show greater linkage disequilibria (Wright, 1969) and likely enhance the correlation of pairwise relatedness along chromosomes.