Introduction

The aim of this paper is to demonstrate through two case studies that a careful evaluation of the likelihood polynomial yields a better result in a linkage analysis using pedigree data. A simple visualization of the likelihood on the whole region [0,1] × [0,1] of paternal and maternal recombination fractions helps us understand more accurately what the pedigree data tell us about the disease locus. Superimposing the Fisher information matrix to the contour plot also helps us to see the reliability of the estimates of the fractions. We will use the probability inheritance algorithm1 to evaluate the likelihood as a polynomial of recombination fractions:

where θ0 and θ1 are the paternal recombination fraction and the maternal recombination fraction, respectively. The introduction of different recombination fractions for male and female plays an important role when seeking a more accurate mapping of the disease locus, as is already pointed out.2, 3, 4, 5 The same is also demonstrated in this paper.

Materials and methods

Pedigree data

We have analyzed two real pedigree data to show the importance of careful evaluation of the likelihood polynomial.

Primary open-angle glaucoma data. The data used in Case study 1 is the primary open-angle glaucoma pedigree data,6 in which the markers are placed on chromosome 5q. As in Pang et al.,6 we have used disease allele frequency 0.0001 and an autosomal dominant mode of inheritance with one liability class. The penetrances for a homozygote without the disease allele, a heterozygote and a homozygote with the disease allele were set at 0, 1 and 1, respectively. The marker allele frequencies were estimated from the given data.

Familial juvenile hyperuricemic nephropathy data. The data used in Case study 2 is the familial juvenile hyperuricemic nephropathy (FJHN) pedigree data.7 The use of the individual genotype data has been approved by the institutional ethics committees of the Keio University and the Tokyo Women's Medical University. Informed consent was obtained from each of the subjects. For FJHN, the disease gene has already been identified as uromodulin (UMOD; GenBank accession no. NM_003361) on chromosome 16p.8 We have analyzed the 81 markers on chromosome 16p that are used in the analysis by Kudo et al.9 The pedigree consists of 65 individuals, but only 58 descendants are analyzed because seven ancestors have no effect on the maximization of the likelihood since their marker genotypes are not available (NA). We also changed the affected status of IV-21 to non-affected since Kudo et al.9 reported that it is a phenocopy. We assume that the mode of inheritance is autosomal dominant with one liability class with penetrance 0, 0.95 and 0.95. The marker allele frequencies were estimated from the given data. The disease allele frequency was assumed to be 0.0001, same as in Hart et al.8

Visual validation on a unit square

The likelihood function for pedigree data can be obtained as a polynomial by the probability inheritance algorithm.1 The idea behind the algorithm is that the probability of affected status and marker genotypes of the ancestor is inherited to their descendants along with the inheritance of the haplotype. The likelihood is reduced generation by generation into the likelihood of the ancestor starting from a terminal sibling until it is reduced into the haplotype frequencies of the founder. The evaluation of the likelihood is then executed back to the terminal sibling. The likelihood of the ancestor is polynomial of recombination fractions; therefore, it is enough that the descendants inherit the coefficients of the polynomial. This is in contrast to the existing algorithms,10, 11, 12, 13, 14, 15 in which the likelihood has to be numerically evaluated for each value of recombination fractions. More details of this algorithm and its implementation on R are available from http://stat.math.keio.ac.jp/~sugaya/PIA/index.html. The obtained polynomial is useful for drawing the two-dimensional contour on the unit square [0,1] × [0,1] with a superimposed ellipsoid for the Fisher information matrix for each marker, as well as for finding the maximum likelihood estimate of θ=(θ0, θ1) using a Newton–Raphson type algorithm. Fisher information is helpful to know the reliability of the maximum likelihood estimate even when it falls in the feasible region [0,0.5] × [0,0.5]. The Fisher information matrix

at provides the amount of information that can be extracted from the given data using the maximum likelihood principle. We will use

as an estimate of since it is very complicated to exactly evaluate functional I(θ). The evaluation of (1) is straightforward in our case because we have already obtained the functional form of L(θ) as a polynomial. The Fisher information is displayed together with the contour plot by an ellipsoid with axes proportional to the eigenvalues of I(θ) in the direction of the eigenvectors of I(θ). Thus, we can see that the maximum likelihood estimate is reliable if the size of the ellipsoid is relatively large in the direction of each coordinate. Thomas16 has drawn the two-dimensional contour on the half square [0,0.5] × [0,0.5] by using a contour-drawing package CONICON3,17 evaluating values and values of the derivatives on grid points, but the contour is an approximation of the surface. It is impossible to calculate Fisher information and to display it together with the obtained contour plot for visual validation of the likelihood.

Results

Case study 1: primary open-angle glaucoma data

First we show the result by conventional linkage analysis. Figure 1 shows the curves of the LOD scores log10 (L(θ)/L(0.5)) for each marker, where 0⩽θ=θ0=θ1⩽0.5 is assumed. Our estimates of θ are summarized in Table 1 and are more precise than those in Pang et al.6 because of our functional evaluation of the likelihood. A natural consequence from these estimates is that the disease locus would be around D5S2098. In fact, the disease locus suggested by Pang et al. is around this marker.

Figure 1
figure 1

Curves of the LOD score for the primary open-angle glaucoma pedigree data.

Table 1 Maximum likelihood estimates of the common recombination fraction for the POAG pedigree data

However, a different picture emerges from the sex-specific linkage analysis, particularly when using the probability inheritance algorithm. The likelihood function is obtained as a polynomial of the paternal and maternal recombination fractions (θ0, θ1) between an unknown disease locus and a marker locus. The orders of each likelihood polynomial in terms of θ0 and θ1 are listed in Table 2 and vary with the number of homozygotes in the pedigree. The contour plots on the unit square {(θ0, θ1); 0⩽θ0, θ1⩽1} are given in Figure 2; these are arranged from the top to the bottom and from the left to the right in the order of the marker locations. We observe that such contour plots on the unit square are more informative than those on the region of feasible recombination fractions, {(θ0, θ1); 0⩽θ0, θ1⩽0.5}. In fact, the maximum likelihood estimates of θ=(θ0, θ1) exist outside the feasible region for the first three markers and the penultimate marker. Although it is unrealistic to deal with recombination fractions >0.5, such a value can appear as an estimate, particularly when penetrance values such as 0, 1 and 1 are assumed. This compels us to focus on the inheritances from the heterozygote of the affected individuals in terms of the underlying marker. For example, the two affected females are only focused on for the first and penultimate markers, and as a result, the pattern of disease inheritance significantly increases the estimate of θ1 toward 1. The same also happens for the second marker where the affected males and females are focused on, and also for the third marker where the affected males are focused on. Hence, it would be natural to set the estimates of both θ0 and θ1 as NA if one of them exceeds 0.5, since nothing definite can be said from such markers. It is worth noting that the likelihood may take its maximum at θ0=0.5 or θ1=0.5 for such markers if (θ0, θ1) is restricted to the half-square [0,0.5] × [0,0.5] as in Thomas,16 and a misleading answer may be obtained that the disease locus is far from the underlying marker.

Table 2 Orders of the likelihood polynomial for the POAG pedigree data
Figure 2
figure 2

Two-dimensional contour plots of the log likelihood function for the primary open-angle glaucoma pedigree data in which the maximum likelihood estimate is shown by the center of a Fisher information ellipsoid on each plot.

Among the estimates of θ for non-NA markers 4–10, the estimate for marker 8 is unreliable from the viewpoint of Fisher information and hence it is also regarded as NA. Further, for the last marker, Fisher information is very small for the maternal recombination fraction, although the estimated recombination fractions are 0.08 and 0.49.

Table 3 summarizes the above information. We can see from Table 3 that the estimated recombination fractions are 0.20 or 0.17 except for the marker D5S638. This observation suggests that the disease locus would not be in the region between the given markers D5S2065 and D5S2011. This region seems to be forming a block that inherits as a whole, generation by generation. In fact, the disease gene WDR36 (GenBank accession no. NM_139281) identified by Monemi et al.18 is outside of the region in the centromeric direction. This case study shows the importance of carefully looking at a two-dimensional picture of the likelihood function.

Table 3 Maximum likelihood estimate of the paternal and maternal recombination fractions for the POAG pedigree data

Case study 2: FJHN data

Table 4 gives the list of the estimated recombination fractions θ with the LOD scores for all 81 markers. The markers with small θ and high LOD scores (>3) are scattered over. It shows that the conventional procedure does not provide a clear indication of the disease locus.

Table 4 Maximum LOD score for each marker in the FJHN pedigree data

Only 32 markers remained as ‘non-NA markers’ after the same criterion is applied as in the previous case study. The orders of all likelihood polynomials for such markers are listed in Table 5. The contour plots for these 32 markers are shown in Figure 3. The plots are arranged from the top to the bottom and from the left to the right in the order of the marker locations. Table 6 gives us a numerical summary. UMOD is also listed in the first column as a reference. The second column in the table indicates the physical positions of the markers on chromosome 16p in the kilobase pairs. It is easily seen from Table 6 that the estimates of both θ0 and θ1 are non-NA and that the LOD scores are high for a block from marker #238 to marker ac002302a4, with the exception of #129. This observation and the fact that both the estimated values are 0, except for marker #129 suggest that the disease locus exists in this block. In fact, the disease locus UMOD resides in the middle of this block. It was hard to identify this block using Table 4 obtained from the conventional procedure.

Table 5 Orders of the likelihood polynomial for the FJHN pedigree data
Figure 3
figure 3

Two-dimensional contour plots of the likelihood function for the familial juvenile hyperuricemic nephropathy (FJHN) pedigree data.

Table 6 Summary of the result for the FJHN pedigree data

Discussion

We have demonstrated that a careful validation of the likelihood provides a more reliable result. For the validation to be effective, a functional evaluation of the likelihood is useful and the contour plot of the likelihood on the unit square [0,1] × [0,1] of paternal and maternal recombination fractions is helpful. An overplotted ellipsoid of the Fisher information matrix is also useful to rule out any unreliable estimate. Validation of this method by any other sets of pedigree data will be reported together with further application to multipoint linkage analysis elsewhere, even though the two-point linkage analysis is enough to localize the disease locus in our case studies.