Abstract
We generalize an approach suggested by Hill (Heredity, 33, 229–239, 1974) for testing for significant association among alleles at two loci when only genotype and not haplotype frequencies are available. The principle is to use the Expectation-Maximization (EM) algorithm to resolve double heterozygotes into haplotypes and then apply a likelihood ratio test in order to determine whether the resolutions of haplotypes are significantly nonrandom, which is equivalent to testing whether there is statistically significant linkage disequilibrium between loci. The EM algorithm in this case relies on the assumption that genotype frequencies at each locus are in Hardy-Weinberg proportions. This method can accommodate X-linked loci and samples from haplodiploid species. We use three methods for testing the significance of the likelihood ratio: the empirical distribution in a large number of randomized data sets, the χ2 approximation for the distribution of likelihood ratios, and the Z2 test. The performance of each method is evaluated by applying it to simulated data sets and comparing the tail probability with the tail probability from Fisher's exact test applied to the actual haplotype data. For realistic sample sizes (50–150 individuals) all three methods perform well with two or three alleles per locus, but only the empirical distribution is adequate when there are five to eight alleles per locus, as is typical of hypervariable loci such as microsatellites. The method is applied to a data set of 32 microsatellite loci in a Finnish population and the results confirm the theoretical predictions. We conclude that with highly polymorphic loci, the EM algorithm does lead to a useful test for linkage disequilibrium, but that it is necessary to find the empirical distribution of likelihood ratios in order to perform a test of significance correctly.
Similar content being viewed by others
Article PDF
References
Dempster, A P, Laird, N M, and Rubin, D B. 1977. Maxi mum likelihood from incomplete data via the EM algorithm. J R Statist Soc B, 39, 1–38.
Elandt-Johnson, R C. 1971. Probability Models and Statistical Methods in Genetics. Wiley&Sons, New York.
Excoffier, L, and Slatkin, M. 1995. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol, 12, 921–927.
Guo, S W, and Thompson, E A. 1992. Performing the exact test of Hardy–Weinberg proportion for multiple alleles. Biometrics, 48, 361–372.
Hill, W G. 1974. Estimation of linkage disequilibrium in randomly mating populations. Heredity, 33, 229–239.
Hill, W G. 1975. Tests for association of gene frequencies at several loci in random mating diploid populations. Biometrics, 31, 881–888.
Jorde, L B. 1995. Linkage disequilibrium as a gene mapping tool. Am J Hum Genet, 56, 11–14.
Lewontin, R C, and Kojima, K. 1960. The evolutionary dynamics of complex polymorphisms. Evolution, 14, 458–472.
Long, J C, Williams, R C, and Urbanek, M. 1995. An E-M algorithm and testing strategy for multiple-locus haplotypes. Am J Hum Genet, 56, 799–810.
Maiste, P J. 1993. Comparisons of Statistical Tests for Independence at Genetic Loci with Many Alleles. Ph.D. Thesis, North Carolina State University.
Peterson, A C, Lehesjoki, A E, De La Chapelle, A, Di Rienzo, A, Slatkin, M, and Freimer, N B. 1995. The distribution of linkage disequilibrium over anonymous genome regions. Hum Mol Genet, 4, 887–894.
Slatkin, M. 1994. Linkage disequilibrium in growing and stable populations. Genetics, 137, 331–336.
Watkins, W S, Zenger, R, O'Brien, E, et al., 1994. Linkage disequilibrium patterns vary with chromosomal location: a case study from the von Willebrand factor region. Am J Hum Genet, 55, 348–355.
Weir, B S. 1979. Inferences about linkage disequilibrium. Biometrics, 25, 235–254.
Weir, B S. 1990. Genetic Data Analysis. Sinauer Associates, Sunderland, MA.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Slatkin, M., Excoffier, L. Testing for linkage disequilibrium in genotypic data using the Expectation-Maximization algorithm. Heredity 76, 377–383 (1996). https://doi.org/10.1038/hdy.1996.55
Received:
Issue Date:
DOI: https://doi.org/10.1038/hdy.1996.55
Keywords
This article is cited by
-
Population genetic structure of Aedes aegypti subspecies in selected geographical locations in Sudan
Scientific Reports (2024)
-
Genetic Polymorphism of 24 Autosomal STR in the Population of Rwanda
Biochemical Genetics (2022)
-
A linkage disequilibrium-based approach to position unmapped SNPs in crop species
BMC Genomics (2021)
-
Population genetics of Anopheles arabiensis, the primary malaria vector in the Republic of Sudan
Malaria Journal (2021)
-
Use of a Plasmodium vivax genetic barcode for genomic surveillance and parasite tracking in Sri Lanka
Malaria Journal (2020)