Exploration of the disease locus by a careful evaluation of the likelihood polynomial for pedigree data

Sugaya, Yuki; Shibata, Ritei

doi:10.1038/jhg.2011.24

Download PDF

Original Article
Published: 17 March 2011

Exploration of the disease locus by a careful evaluation of the likelihood polynomial for pedigree data

Yuki Sugaya¹ &
Ritei Shibata²

Journal of Human Genetics volume 56, pages 383–389 (2011)Cite this article

342 Accesses
2 Citations
Metrics details

Subjects

Disease genetics

Abstract

It is demonstrated through two case studies that a careful evaluation of the likelihood polynomial results in a more accurate localization of disease locus. The evaluation of the likelihood function as a polynomial enables more flexible exploration of the disease locus. Visualization by a contour plot of the function on a unit square of paternal and maternal recombination fractions along with a superimposed ellipsoid of the Fisher information matrix helps us to find a more accurate localization of the disease locus.

Liability threshold modeling of case–control status and family history of disease increases association power

Article 20 April 2020

Margaux L. A. Hujoel, Steven Gazal, … Alkes L. Price

Local Ancestry Inference in Large Pedigrees

Article Open access 13 January 2020

Heming Wang, Tamar Sofer, … Xiaofeng Zhu

Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits

Article Open access 01 May 2023

Brian C. Zhang, Arjun Biddanda, … Pier Francesco Palamara

Introduction

The aim of this paper is to demonstrate through two case studies that a careful evaluation of the likelihood polynomial yields a better result in a linkage analysis using pedigree data. A simple visualization of the likelihood on the whole region [0,1] × [0,1] of paternal and maternal recombination fractions helps us understand more accurately what the pedigree data tell us about the disease locus. Superimposing the Fisher information matrix to the contour plot also helps us to see the reliability of the estimates of the fractions. We will use the probability inheritance algorithm¹ to evaluate the likelihood as a polynomial of recombination fractions:

where θ₀ and θ₁ are the paternal recombination fraction and the maternal recombination fraction, respectively. The introduction of different recombination fractions for male and female plays an important role when seeking a more accurate mapping of the disease locus, as is already pointed out.^{2, 3, 4, 5} The same is also demonstrated in this paper.

Materials and methods

Pedigree data

We have analyzed two real pedigree data to show the importance of careful evaluation of the likelihood polynomial.

Primary open-angle glaucoma data. The data used in Case study 1 is the primary open-angle glaucoma pedigree data,⁶ in which the markers are placed on chromosome 5q. As in Pang et al.,⁶ we have used disease allele frequency 0.0001 and an autosomal dominant mode of inheritance with one liability class. The penetrances for a homozygote without the disease allele, a heterozygote and a homozygote with the disease allele were set at 0, 1 and 1, respectively. The marker allele frequencies were estimated from the given data.

Familial juvenile hyperuricemic nephropathy data. The data used in Case study 2 is the familial juvenile hyperuricemic nephropathy (FJHN) pedigree data.⁷ The use of the individual genotype data has been approved by the institutional ethics committees of the Keio University and the Tokyo Women's Medical University. Informed consent was obtained from each of the subjects. For FJHN, the disease gene has already been identified as uromodulin (UMOD; GenBank accession no. NM_003361) on chromosome 16p.⁸ We have analyzed the 81 markers on chromosome 16p that are used in the analysis by Kudo et al.⁹ The pedigree consists of 65 individuals, but only 58 descendants are analyzed because seven ancestors have no effect on the maximization of the likelihood since their marker genotypes are not available (NA). We also changed the affected status of IV-21 to non-affected since Kudo et al.⁹ reported that it is a phenocopy. We assume that the mode of inheritance is autosomal dominant with one liability class with penetrance 0, 0.95 and 0.95. The marker allele frequencies were estimated from the given data. The disease allele frequency was assumed to be 0.0001, same as in Hart et al.⁸

Visual validation on a unit square

The likelihood function for pedigree data can be obtained as a polynomial by the probability inheritance algorithm.¹ The idea behind the algorithm is that the probability of affected status and marker genotypes of the ancestor is inherited to their descendants along with the inheritance of the haplotype. The likelihood is reduced generation by generation into the likelihood of the ancestor starting from a terminal sibling until it is reduced into the haplotype frequencies of the founder. The evaluation of the likelihood is then executed back to the terminal sibling. The likelihood of the ancestor is polynomial of recombination fractions; therefore, it is enough that the descendants inherit the coefficients of the polynomial. This is in contrast to the existing algorithms,^{10, 11, 12, 13, 14, 15} in which the likelihood has to be numerically evaluated for each value of recombination fractions. More details of this algorithm and its implementation on R are available from http://stat.math.keio.ac.jp/~sugaya/PIA/index.html. The obtained polynomial is useful for drawing the two-dimensional contour on the unit square [0,1] × [0,1] with a superimposed ellipsoid for the Fisher information matrix for each marker, as well as for finding the maximum likelihood estimate of θ=(θ₀, θ₁) using a Newton–Raphson type algorithm. Fisher information is helpful to know the reliability of the maximum likelihood estimate even when it falls in the feasible region [0,0.5] × [0,0.5]. The Fisher information matrix

at provides the amount of information that can be extracted from the given data using the maximum likelihood principle. We will use

as an estimate of since it is very complicated to exactly evaluate functional I(θ). The evaluation of (1) is straightforward in our case because we have already obtained the functional form of L(θ) as a polynomial. The Fisher information is displayed together with the contour plot by an ellipsoid with axes proportional to the eigenvalues of I(θ) in the direction of the eigenvectors of I(θ). Thus, we can see that the maximum likelihood estimate is reliable if the size of the ellipsoid is relatively large in the direction of each coordinate. Thomas¹⁶ has drawn the two-dimensional contour on the half square [0,0.5] × [0,0.5] by using a contour-drawing package CONICON3,¹⁷ evaluating values and values of the derivatives on grid points, but the contour is an approximation of the surface. It is impossible to calculate Fisher information and to display it together with the obtained contour plot for visual validation of the likelihood.

Results

Case study 1: primary open-angle glaucoma data

First we show the result by conventional linkage analysis. Figure 1 shows the curves of the LOD scores log₁₀ (L(θ)/L(0.5)) for each marker, where 0⩽θ=θ₀=θ₁⩽0.5 is assumed. Our estimates of θ are summarized in Table 1 and are more precise than those in Pang et al.⁶ because of our functional evaluation of the likelihood. A natural consequence from these estimates is that the disease locus would be around D5S2098. In fact, the disease locus suggested by Pang et al. is around this marker.

Table 1 Maximum likelihood estimates of the common recombination fraction for the POAG pedigree data

Full size table

However, a different picture emerges from the sex-specific linkage analysis, particularly when using the probability inheritance algorithm. The likelihood function is obtained as a polynomial of the paternal and maternal recombination fractions (θ₀, θ₁) between an unknown disease locus and a marker locus. The orders of each likelihood polynomial in terms of θ₀ and θ₁ are listed in Table 2 and vary with the number of homozygotes in the pedigree. The contour plots on the unit square {(θ₀, θ₁); 0⩽θ₀, θ₁⩽1} are given in Figure 2; these are arranged from the top to the bottom and from the left to the right in the order of the marker locations. We observe that such contour plots on the unit square are more informative than those on the region of feasible recombination fractions, {(θ₀, θ₁); 0⩽θ₀, θ₁⩽0.5}. In fact, the maximum likelihood estimates of θ=(θ₀, θ₁) exist outside the feasible region for the first three markers and the penultimate marker. Although it is unrealistic to deal with recombination fractions >0.5, such a value can appear as an estimate, particularly when penetrance values such as 0, 1 and 1 are assumed. This compels us to focus on the inheritances from the heterozygote of the affected individuals in terms of the underlying marker. For example, the two affected females are only focused on for the first and penultimate markers, and as a result, the pattern of disease inheritance significantly increases the estimate of θ₁ toward 1. The same also happens for the second marker where the affected males and females are focused on, and also for the third marker where the affected males are focused on. Hence, it would be natural to set the estimates of both θ₀ and θ₁ as NA if one of them exceeds 0.5, since nothing definite can be said from such markers. It is worth noting that the likelihood may take its maximum at θ₀=0.5 or θ₁=0.5 for such markers if (θ₀, θ₁) is restricted to the half-square [0,0.5] × [0,0.5] as in Thomas,¹⁶ and a misleading answer may be obtained that the disease locus is far from the underlying marker.

Table 2 Orders of the likelihood polynomial for the POAG pedigree data

Full size table

Among the estimates of θ for non-NA markers 4–10, the estimate for marker 8 is unreliable from the viewpoint of Fisher information and hence it is also regarded as NA. Further, for the last marker, Fisher information is very small for the maternal recombination fraction, although the estimated recombination fractions are 0.08 and 0.49.

Table 3 summarizes the above information. We can see from Table 3 that the estimated recombination fractions are 0.20 or 0.17 except for the marker D5S638. This observation suggests that the disease locus would not be in the region between the given markers D5S2065 and D5S2011. This region seems to be forming a block that inherits as a whole, generation by generation. In fact, the disease gene WDR36 (GenBank accession no. NM_139281) identified by Monemi et al.¹⁸ is outside of the region in the centromeric direction. This case study shows the importance of carefully looking at a two-dimensional picture of the likelihood function.

Table 3 Maximum likelihood estimate of the paternal and maternal recombination fractions for the POAG pedigree data

Full size table

Case study 2: FJHN data

Table 4 gives the list of the estimated recombination fractions θ with the LOD scores for all 81 markers. The markers with small θ and high LOD scores (>3) are scattered over. It shows that the conventional procedure does not provide a clear indication of the disease locus.

Table 4 Maximum LOD score for each marker in the FJHN pedigree data

Full size table

Only 32 markers remained as ‘non-NA markers’ after the same criterion is applied as in the previous case study. The orders of all likelihood polynomials for such markers are listed in Table 5. The contour plots for these 32 markers are shown in Figure 3. The plots are arranged from the top to the bottom and from the left to the right in the order of the marker locations. Table 6 gives us a numerical summary. UMOD is also listed in the first column as a reference. The second column in the table indicates the physical positions of the markers on chromosome 16p in the kilobase pairs. It is easily seen from Table 6 that the estimates of both θ₀ and θ₁ are non-NA and that the LOD scores are high for a block from marker #238 to marker ac002302a4, with the exception of #129. This observation and the fact that both the estimated values are 0, except for marker #129 suggest that the disease locus exists in this block. In fact, the disease locus UMOD resides in the middle of this block. It was hard to identify this block using Table 4 obtained from the conventional procedure.

Table 5 Orders of the likelihood polynomial for the FJHN pedigree data

Full size table

Table 6 Summary of the result for the FJHN pedigree data

Full size table

Discussion

We have demonstrated that a careful validation of the likelihood provides a more reliable result. For the validation to be effective, a functional evaluation of the likelihood is useful and the contour plot of the likelihood on the unit square [0,1] × [0,1] of paternal and maternal recombination fractions is helpful. An overplotted ellipsoid of the Fisher information matrix is also useful to rule out any unreliable estimate. Validation of this method by any other sets of pedigree data will be reported together with further application to multipoint linkage analysis elsewhere, even though the two-point linkage analysis is enough to localize the disease locus in our case studies.

References

Sugaya, Y. & Shibata, R. Probability inheritance algorithm and its application. The 52nd Annual Meeting of the Japan Society of Human Genetics, The Japan Society of Human Genetics, Tokyo, Japan, 115 (2007).
Google Scholar
Daw, E. W., Thompson, E. A. & Wijsman, E. M. Bias in multipoint linkage analysis arising from map misspecification. Genet. Epidemiol. 19, 366–380 (2000).
Article CAS Google Scholar
Wu, R., Xing, M. C., Wu, S. S. & Zeng, Z. B. Linkage mapping of sex-specific differences. Genet. Res. 79, 85–96 (2002).
Article Google Scholar
Feenstra, B., Greenberg, D. A. & Hodge, S. E. Using LOD scores to detect sex differences in male-female recombination fractions. Hum. Hered. 57, 100–108 (2004).
Article CAS Google Scholar
Fingerlin, T. E., Abecasis, G. R. & Boehnke, M. Using sex-average genetic maps in multipoint linkage analysis when identity-by-decent status is incompletely known. Genet. Epidemiol. 30, 384–396 (2006).
Article Google Scholar
Pang, C. P., Fan, B. J., Canlas, O., Wang, D. Y., Dubois, S., Tam, P. O. et al. A genome-wide scan maps a novel juvenile-onset primary open angle glaucoma locus to chromosome 5q. Mol. Vis. 12, 85–92 (2006).
CAS PubMed Google Scholar
Kamatani, N., Moritani, M., Yamanaka, H., Takeuchi, F., Hosoya, T. & Itakura, M. Localization of a gene for familial juvenile hyperuricemic nephropathy causing underexcretion-type gout to 16p12 by genome-wide linkage analysis of a large family. Arthritis Rheum. 43, 925–929 (2000).
Article CAS Google Scholar
Hart, T. C., Gorry, M. C., Hart, P. S., Woodard, A. S., Shihabi, Z., Sandhu, J. et al. Mutations of the UMOD gene are responsible for medullary cystic kidney disease 2 and familial juvenile hyperuricaemic nephropathy. J. Med. Genet. 39, 882–892 (2002).
Article CAS Google Scholar
Kudo, E., Kamatani, N., Tezuka, O., Taniguchi, A., Yamanaka, H., Yabe, S. et al. Familial juvenile hyperuricemic nephropathy: detection of mutations in the uromodulin gene in five Japanese families. Kidney Int. 65, 1589–1597 (2004).
Article CAS Google Scholar
Elston, R. C. & Stewart, J. A general model for the genetic analysis of pedigree data. Hum. Hered. 21, 523–542 (1971).
Article CAS Google Scholar
Ott, J. Estimation of the recombination fraction in human pedigrees: efficient computation of the likelihood for human linkage study. Am. J. Hum. Genet. 26, 588–597 (1974).
CAS PubMed PubMed Central Google Scholar
Lange, K. & Elston, R. C. Extensions to pedigree analysis. Hum. Hered. 25, 95–105 (1975).
Article CAS Google Scholar
Cannings, C., Thompson, E. A. & Skolnick, M. H. Probability functions on complex pedigrees. Appl. Prob. 10, 26–61 (1978).
Article Google Scholar
Lander, E. S. & Green, P. Construction of multilocus genetic linkage maps in humans. Genetics 84, 2363–2367 (1987).
CAS Google Scholar
Fishelson, M. & Geiger, D. Exact genetic linkage computations for general pedigrees. Bioinfomatics 18, 189–198 (2002).
Article Google Scholar
Thomas, A. Gene hunting with gradients of likelihoods. J. R. Statist. Soc. B 53, 3–26 (1991).
Google Scholar
Sibson, R. CONICON 3 handbook ( University of Bath 1987).
Monemi, S., Spaeth, G., Dasilva, A., Popinchalk, S., Illitchev, E., Liebmann, J. et al. Identification of a novel adult-onset primary open-angle glaucoma (POAG) gene on 5q22.1. Hum. Mol. Genet. 14, 725–733 (2005).
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

School of Fundamental Science and Technology, Keio University, Yokohama, Japan
Yuki Sugaya
Department of Mathematics, Keio University, Yokohama, Japan
Ritei Shibata

Authors

Yuki Sugaya
View author publications
You can also search for this author in PubMed Google Scholar
Ritei Shibata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuki Sugaya.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sugaya, Y., Shibata, R. Exploration of the disease locus by a careful evaluation of the likelihood polynomial for pedigree data. J Hum Genet 56, 383–389 (2011). https://doi.org/10.1038/jhg.2011.24

Download citation

Received: 06 October 2010
Revised: 12 January 2011
Accepted: 07 February 2011
Published: 17 March 2011
Issue Date: May 2011
DOI: https://doi.org/10.1038/jhg.2011.24

Keywords

This article is cited by

MLEP: an R package for exploring the maximum likelihood estimates of penetrance parameters
- Yuki Sugaya
BMC Research Notes (2012)