Estimating Recessive Disease Allele Frequency Based on Genetic Maps

Haghighi, Fatemeh; Ott, Jurg

doi:10.1007/BF03405918

Original Paper
Published: July 1997

Estimating Recessive Disease Allele Frequency Based on Genetic Maps

Fatemeh Haghighi¹ &
Jurg Ott^1,2

European Journal of Human Genetics volume 5, pages 203–205 (1997)Cite this article

107 Accesses
Metrics details

Abstract

For a recessive disease whose gene has been localized on the human gene map, a new method is described for estimating the population frequency of the disease allele. The method focuses on affected individuals whose parents are first cousins, where parents and grandparents are genotyped for highly polymorphic markers at the disease gene. The primary statistic is the proportion of such probands who are autozygous (homozygous due to identity by descent of the two disease alleles), where this proportion is a function of the disease allele frequency. Our map-based method is compared to Dahlberg’s method of estimating recessive disease allele frequencies, which is based on the proportion of affected individuals whose parents are first cousins; this proportion is also a function of the disease allele frequency. For small to moderate sample sizes and traits that are not too common, our new method is more efficient (for some parameter values dramatically more efficient) than Dahlberg’s method.

You have full access to this article via your institution.

Download PDF

Carrier screening for recessive disorders

Article 29 May 2019

Stylianos E. Antonarakis

Liability threshold modeling of case–control status and family history of disease increases association power

Article 20 April 2020

Margaux L. A. Hujoel, Steven Gazal, … Alkes L. Price

Estimating disease prevalence in large datasets using genetic risk scores

Article Open access 08 November 2021

Benjamin D. Evans, Piotr Słowiński, … Nicholas J. Thomas

Introduction

For a common trait, prevalence is easily estimated from a random sample of the population. However, this is prohibitively expensive for a rare disease, which is often ascertained through probands [1]. Population prevalence must then be estimated via the ascertainment probability.

Specialized indirect methods have been devised that do not rely on complete enumeration of all cases; they work with probands but use additional information that obviates the need for ascertainment corrections. In this paper, we are concerned with recessively inherited traits. For these, when q denotes the population frequency of the disease allele, q² is the disease incidence. Assuming an equal life expectancy for affected and unaffected individuals, q² is also the disease prevalence. A common method of estimating q is due to Dahlberg [2] and relies on the observation that recessive traits tend to occur more frequently among offspring of consanguineous matings than of unrelated parents. This method and extensions of it have been covered by Li [3] and applied, for example, to cystic fibrosis in Italy [4]. Below, we propose a new map-based method for estimating q and compare its efficiency relative to the Dahlberg method and that of simple population sampling.

For the two methods, probands are defined as follows. In our map-based method, a proband is an affected individual whose parents are first cousins, whereas in the Dahlberg method, a proband is any affected invididual. Thus, as will be outlined in the discussion, the cost of ascertaining probands varies among the two methods. For each method, N denotes the number of probands.

Methods

Map-Based Method

Consider a recessive trait with disease allele frequency, q. For a random individual, the probability of being affected (homozygous) is given by Fq + (1 - F)q², where F is the individual’s inbreeding coefficient [3]. The first term indicates the probability of being autozygous, that is, homozygous due to having inherited the two disease alleles as copies of the same ancestral allele (identically by descent), while the second term refers to being allozygous. Therefore, for an affected individual whose parents are first cousins (F = 1/16), the conditional probability of being autozygous is given by

$$p = Fq{\rm{/}}\left[ {Fq + \left( {1 - F} \right){q^2}} \right] = 1/\left( {1 + 15q} \right).$$

((1))

If the disease gene has a known genomic position and is located in a dense map of marker loci, marker typing of parents and grandparents will allow one to determine whether a proband is autozygous or allozygous. In other words, inheritance of alleles for markers tightly linked with the disease locus will show whether the two disease alleles in a proband are copies of one disease allele in one of the great-grandparents (autozygosity) or whether the two disease alleles have entered the pedigree separately (allozygosity). Consider N such probands of which an observed proportion, p, is autozygous. Based on (1), this estimate of p can be translated into a maximum likelihood estimate, q̂, of the allele frequency. Because some values of p may lead to values of q exceeding 1, we define

$$\hat{q} = \left\{ {\matrix{ {\left( {1 - \hat{p}} \right){\rm{/}}\left( {15\hat{p}} \right)} \hfill & {{\rm{if}}\;\hat{p} > 1{\rm{/}}16} \hfill \cr 1 \hfill & {{\rm{if}}\;\hat{p} \le 1{\rm{/}}16} \hfill \cr } } \right.$$

((2))

The variance of the estimate, V(q̂), is computed numerically as follows. For a given sample size, N, and population allele frequency, q, each possible outcome (number of autozygous probands), i = 0,…N, occurs with binomial probability, B(p, N), where p is given by (1). For each i, there is an associated p̂ = i/N and corresponding q̂ as given by (2). The variance of the allele frequency estimate is then obtained as V(q̂) = E(q̂²) - E²(q̂), where E stands for expectation (mean).

Dahlberg’s Method

To compare the variance of our estimate for q to that of the conventional (Dahlberg’s) estimate, we applied Dahlberg’s method as follows. Assume that among all matings in a population, a known proportion c is between first cousins while all other matings are between unrelated individuals (in practice, the latter category includes the rare matings between individuals of other relationships). Then, the proportion of recessive cases born to cousin marriages among all recessive cases in the population is

$$k = c\left( {1 + 15q} \right){\rm{/}}\left[ {c\left( {1 - q} \right) + 16q} \right]$$

((3))

[2], which may be viewed as being analogous to (1). Consider a sample of N probands of which an observed proportion, k̂, has parents who are first cousins. Based on (3), this estimate of k can be translated into a maximum likelihood estimate, q̂_D, of the allele frequency. Because some values of k may lead to values of q exceeding 1, we define

$${\hat{q}_D} = \left\{ {\matrix{ {c\left( {1 - \hat{k}} \right){\rm{/}}\left[ {\hat{k}\left( {16 - c} \right) - 15c} \right]} \hfill & {{\rm{if}}\;\hat{k} > c} \hfill \cr 1 \hfill & {{\rm{if}}\;\hat{k} \le c.} \hfill \cr } } \right.$$

((4))

The variance, V(q̂_D) is calculated in analogy to the calculation described above leading to V(q̂).

Results

One of the best-studied recessive traits, phenylketonuria, shows a population frequency in the US of approximately 1 in 12,000 [5], that is, a disease allele frequency of just about 0.01 while other recessive traits appear to be more common. For a range of population allele frequencies, table 1 shows the standard error (square root of variance) of our allele frequency estimate, q̂, depending on the sample size of N probands. Clearly, standard errors are quite high and it takes considerable sample sizes for reasonably accurate allele frequency estimates. For example, with q = 0.05 and N = 100, the 95% confidence interval approximately ranges from 0.03 to 0.07.

Table 1. Standard error of map-based estimate

Full size table

For Dahlberg’s method, the population frequency of first cousin matings among all marriages must be known. In western societies, this rate is on the order of c = 0.001 [6], while in some eastern countries, it can be as high as c = 0.200 or higher [7]. For a range of these values, table 2 shows the efficiency of our map-based estimate versus the Dahlberg estimate, where efficiency is defined in the customary manner, that is, as the variance ratio, V(q̂_D)/V(q̂). This ratio expresses the relative accuracy of the two estimates but, as outlined in the discussion, does not take into account the costs associated with sampling probands. For rare diseases (q ≤ 0.02) and small sample size (N ≤ 20), map-based estimation is seen to be always more efficient than the Dahlberg method (i.e. for any of the c values considered). For cousin marriage rates of c ≤ 0.10, the map-based estimate is generally more efficient except for small N and very large q. The differences in accuracy can be quite dramatic. For example, for a common allele frequency of 0.05 and c = 0.001, efficiency is 7.1 for N = 10, and 1,427 for N = 100. Thus, in most situations, the map-based estimate is clearly superior to Dahlberg’s estimate.

Table 2. Relative efficiency (accuracy) of map-based versus Dahlberg’s method of allele frequency estimation

Full size table

Discussion

As shown above, for small to moderate sample sizes and traits that are not too common, our new method is more efficient than Dahlberg’s method. In practice, grandparents are often not typed and the number of probands in linkage studies of recessive traits tends to be rather small. Thus it is fortuitous that this is the situation in which our method shines.

Clearly, obtaining probands suitable for the map-based method requires more resources than obtaining probands for Dahlberg’s method. In principle, all relevant ancestors of the former probands must be genotyped but such data are difficult or costly to obtain. With very highly polymorphic markers, identity by state is almost equivalent to identity by descent so that it may be possible to obtain approximate solutions based on homozygosity versus heterozygosity at closely linked marker loci. Such approaches are under investigation.

For Dahlberg’s method, the population proportion of cousin marriages must be known. If it is unknown or inaccurate, this method presumably furnishes biased results. There is no such requirement for the map-based method.

Our approach focuses on one proband per family. In linkage studies of recessive traits, there is often more than one affected individual per shibship. We have not yet investigated how information from multiplex sibships could be used in the map-based method. This does not appear to be a simple problem.

The method introduced in this paper is tailored for a specific relationship of the parents. This relationship — first cousins — is the most common one among related parents in western civilizations. In other, for example, eastern populations, other relationships may be more common for which our method is not directly applicable.

References

Morton NE: Outline of Genetic Epidemiology. Basel, Karger, 1982.
Google Scholar
Dahlberg G: Mathematical Methods for Population Genetics. Basel, Karger, 1947.
Google Scholar
Li CC: First Course in Population Genetics. Pacific Grove, Boxwood Press, 1978.
Google Scholar
Romeo G, Bianco M, Devoto M, Menozzi P, Mastella G, Giunta AM, Micalizzi C, Antonelli M, Battistini A, Santamaria F, et al: Incidence in Italy, genetic heterogeneity, and segregation analysis of cystic fibrosis. Am J Hum Genet 1985;37:338–349.
CAS PubMed PubMed Central Google Scholar
Vogel F, Motulsky AG: Human Genetics. New York, Springer, 1986.
Book Google Scholar
Lebel RR: Consanguinity studies in Wisconsin I: Secular trends in consanguineous marriage, 1843–1981. Am J Med Genet 1983;15:543–560.
Article CAS PubMed Google Scholar
Khlat M, Khoury M: Inbreeding and diseases: Demographic, genetic, and epidemiologic perspectives. Epidemiol Rev 1991; 13:28–41.
Article CAS PubMed Google Scholar

Download references

Acknowledgment

This work was supported by grant HG00008 from the National Human Genome Research Institute.

Author information

Authors and Affiliations

Department of Genetics and Development, Columbia University, New York, N.Y., USA
Fatemeh Haghighi & Jurg Ott
Laboratory of Statistical Genetics, Rockefeller University, Box 192, 1230 York Avenue, New York, NY, 10021-6399, USA
Jurg Ott

Authors

Fatemeh Haghighi
View author publications
You can also search for this author in PubMed Google Scholar
Jurg Ott
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jurg Ott.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haghighi, F., Ott, J. Estimating Recessive Disease Allele Frequency Based on Genetic Maps. Eur J Hum Genet 5, 203–205 (1997). https://doi.org/10.1007/BF03405918

Download citation

Received: 22 November 1996
Revised: 18 April 1997
Accepted: 10 May 1997
Issue Date: July 1997
DOI: https://doi.org/10.1007/BF03405918

Estimating Recessive Disease Allele Frequency Based on Genetic Maps

Abstract

Similar content being viewed by others

Carrier screening for recessive disorders

Liability threshold modeling of case–control status and family history of disease increases association power

Estimating disease prevalence in large datasets using genetic risk scores

Introduction

Methods

Map-Based Method

Dahlberg’s Method

Results

Discussion

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key Words

Search

Quick links

Abstract

Similar content being viewed by others

Carrier screening for recessive disorders

Liability threshold modeling of case–control status and family history of disease increases association power

Estimating disease prevalence in large datasets using genetic risk scores

Introduction

Methods

Map-Based Method

Dahlberg’s Method

Results

Discussion

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key Words

Search

Quick links