Haplotype blocks and linkage disequilibrium in the human genome

Key Points

  • Linkage disequilibrium (LD) is the nonrandom association of alleles at different sites.

  • Recent studies have proposed that patterns of LD in the human genome can be summarized by a series of discrete haplotype blocks: regions of high LD that are separated from other haplotype blocks by many historical recombination events.

  • Patterns of LD and the fit of the haplotype-block model vary tremendously from region to region: some show extensive well-defined haplotype blocks, while others contain essentially no haplotype blocks.

  • This variability across regions is probably the result of several factors, which include large-scale variation in recombination rates (apparent from genetic maps), fine-scale variation in recombination rates (for example, hotspots) and the inherent stochasticity of LD.

  • Simulations indicate that although recombination hotspots generally create haplotype-block boundaries, the converse is not true: most haplotype-block boundaries do not occur at hotspots

  • The identification of haplotype blocks will be of some use for future association studies, but there will be a substantial fraction of the genome (not covered by large haplotype blocks) for which other approaches will be useful.


There is great interest in the patterns and extent of linkage disequilibrium (LD) in humans and other species. Characterizing LD is of central importance for gene-mapping studies and can provide insights into the biology of recombination and human demographic history. Here, we review recent developments in this field, including the recently proposed 'haplotype-block' model of LD. We describe some of the recent data in detail and compare the observed patterns to those seen in simulations.

Figure 2: The proportion of sequence contained in haplotype blocks of various sizes.
Figure 1: Pairwise |D′| plots for representative regions from different studies.
Figure 3: Schematic of the haplotype blocks identified in five genomic regions32.
Figure 4: Schematic of the haplotype blocks found in simulations.


We thank D. Nickerson, S. Gabriel, M. Daly, D. Altshuler and S. Schaffner for help in accessing and interpreting their data, and A. DiRienzo and S. Zoellner for discussions. We also thank M. Przeworski and the anonymous reviewers for comments on an earlier version of this manuscript. This work was supported by a National Institutes of Health grant to J.K.P.

A temporary reduction in population size that causes the loss of genetic variation.


The mixture of two or more genetically distinct populations.


(Pairwise LD). The strength of association between alleles at two different markers.


(Pre-ascertained SNPs). SNPs that have already been detected in previous studies, usually from an extremely small sample of chromosomes.


Sequence data in which the phase of double heterozygotes was not determined.


A statistical approach that, given a set of assumptions about the underlying model, can provide a rigorous assessment of uncertainty.


A method of simulating data under a population genetic model.


The bias in patterns of variation that results from using pre-ascertained SNPs.


Recombination that involves the nonreciprocal transfer of information from one sister chromatid to another.

