Key Points
-
Haplotypes link together (that is, 'phase') groups of genetic variants that co-occur on single chromosomes. Although haplotypes have an important role in clinical genetics and association studies, they are not typically obtained by contemporary genotyping or sequencing technologies and must be determined separately.
-
Inferential methods for haplotype determination perform fairly poorly for the rare and private variants implicated in many genetic diseases. To phase this class of variants accurately and comprehensively, direct experimental methods are needed.
-
Dense haplotyping methods comprehensively phase variants into haplotype blocks at the scale of a single gene or a small number of genes and corresponding regulatory regions. Contiguity is defined within each block but not between adjacent or distant haplotype blocks.
-
Sparse haplotyping methods phase a more modest number of distant variants distributed along an entire chromosome or a chromosome arm. Resulting haplotypes are not comprehensive but have long-range contiguity that is currently unattainable using dense methods.
-
Reference panels of previously ascertained haplotypes can be used to correct errors in, or increase the density or contiguity of, directly obtained haplotypes. Such hybrid approaches yield improved haplotypes at low costs.
-
Although contiguity metrics are typically used to compare haplotype assemblies, comprehensive comparisons should also include measures of the accuracy, density and allele frequency spectrum of the phased variants.
Abstract
Human genomes are diploid and, for their complete description and interpretation, it is necessary not only to discover the variation they contain but also to arrange it onto chromosomal haplotypes. Although whole-genome sequencing is becoming increasingly routine, nearly all such individual genomes are mostly unresolved with respect to haplotype, particularly for rare alleles, which remain poorly resolved by inferential methods. Here, we review emerging technologies for experimentally resolving (that is, 'phasing') haplotypes across individual whole-genome sequences. We also discuss computational methods relevant to their implementation, metrics for assessing their accuracy and completeness, and the relevance of haplotype information to applications of genome sequencing in research and clinical medicine.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
AmelHap: Leveraging drone whole-genome sequence data to create a honey bee HapMap
Scientific Data Open Access 10 April 2023
-
Failing the four-gamete test enables exact phasing: the Corners’ Algorithm
Genetics Selection Evolution Open Access 14 November 2022
-
Experimental method for haplotype phasing across the entire length of chromosome 21 in trisomy 21 cells using a chromosome elimination technique
Journal of Human Genetics Open Access 31 May 2022
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout




References
Pasaniuc, B. et al. Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nature Genet. 44, 631–635 (2012).
Ripke, S. et al. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nature Genet. 45, 1150–1159 (2013).
Tsoi, L. C. et al. Identification of 15 new psoriasis susceptibility loci highlights the role of innate immunity. Nature Genet. 44, 1341–1348 (2012).
Nalls, M. A. et al. Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson's disease. Nature Genet. 46, 989–993 (2014).
Vernot, B. & Akey, J. M. Resurrecting surviving Neandertal lineages from modern human genomes. Science 343, 1017–1021 (2014).
Sankararaman, S. et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature 507, 354–357 (2014).
Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nature Genet. 46, 919–925 (2014).
Drysdale, C. M. et al. Complex promoter and coding region β2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. Proc. Natl Acad. Sci. USA 97, 10483–10488 (2000).
Deenen, M. J. et al. Relationship between single nucleotide polymorphisms and haplotypes in DPYD and toxicity and efficacy of capecitabine in advanced colorectal cancer. Clin. Cancer Res. 17, 3455–3468 (2011).
Browning, S. R. & Browning, B. L. Haplotype phasing: existing methods and new developments. Nature Rev. Genet. 12, 703–714 (2011).
The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 490, 56–65 (2012).
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Reich, D. et al. Reduced neutrophil count in people of African descent is due to a regulatory variant in the duffy antigen receptor for chemokines gene. PLoS Genet. 5, e1000360 (2009).
Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
Venter, J. C. The sequence of the human genome. Science 291, 1304–1351 (2001).
Shendure, J. & Aiden, E. L. The expanding scope of DNA sequencing. Nature Biotech. 30, 1084–1094 (2012).
McKernan, K. J. et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 1527–1541 (2009).
Dear, P. H. & Cook, P. R. Happy mapping: a proposal for linkage mapping the human genome. Nucleic Acids Res. 17, 6795–6807 (1989). This paper provides the conceptual framework for various subsequent phasing approaches that exploit the physical linkage between markers on HMW DNA and rely on limiting dilution to sub-haploid pools.
Burgtorf, C. et al. Clone-based systematic haplotyping (CSH): a procedure for physical haplotyping of whole genomes. Genome Res. 13, 2717–2724 (2003). This paper describes haplotype resolution using fosmid clone sequencing and laid the groundwork for massively parallel implementations.
Raymond, C. K. et al. Targeted, haplotype-resolved resequencing of long segments of the human genome. Genomics 86, 759–766 (2005).
Kitzman, J. O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nature Biotech. 29, 59–63 (2011). This is the first report of a molecularly phased human genome that was sequenced on a massively parallel, short-read sequencing platform.
Lo, C. et al. On the design of clone-based haplotyping. Genome Biol. 14, R100 (2013).
Suk, E. K. et al. A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res. 21, 1672–1685 (2011).
Adey, A. et al. The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line. Nature 500, 207–211 (2013).
Kitzman, J. O. et al. Noninvasive whole-genome sequencing of a human fetus. Sci. Transl Med. 4, 137ra76 (2012).
Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).
Duitama, J. et al. Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of single individual haplotyping techniques. Nucleic Acids Res. 40, 2041–2053 (2012).
Hoehe, M. R. et al. Multiple haplotype-resolved genomes reveal population patterns of gene and protein diplotypes. Nature Commun. 5, 5569 (2014).
Paul, P. & Apgar, J. Single-molecule dilution and multiple displacement amplification for molecular haplotyping. BioTechniques 38, 553–559 (2005).
Peters, B. A. et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 487, 190–195 (2012). This paper describes a fully in vitro approach for sequencing and phasing human genomes in a production setting with greatly reduced requirements for input DNA mass.
Kaper, F. et al. Whole-genome haplotyping by dilution, amplification, and sequencing. Proc. Natl Acad. Sci. USA 110, 5552–5557 (2013).
Kuleshov, V. et al. Whole-genome haplotyping using long reads and statistical methods. Nature Biotech. 32, 261–266 (2014).
Amini, S. et al. Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nature Genet. 46, 1343–1349 (2014).
Krol, A. 10X Genomics at AGBT. Bio-ITWorld [online], (2015).
Hiatt, J. B., Patwardhan, R. P., Turner, E. H., Lee, C. & Shendure, J. Parallel, tag-directed assembly of locally derived short sequence reads. Nature Meth. 7, 119–122 (2010).
Voskoboynik, A. et al. The genome sequence of the colonial chordate, Botryllus schlosseri. eLife 2, e00569 (2013).
Laszlo, A. H. et al. Decoding long nanopore sequencing reads of natural DNA. Nature Biotech. 32, 829–833 (2014).
Chaisson, M. J. P. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015).
Yan, H. et al. Conversion of diploidy to haploidy. Nature 403, 723–724 (2000).
Douglas, J. A., Boehnke, M., Gillanders, E., Trent, J. M. & Gruber, S. B. Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies. Nature Genet. 28, 361–364 (2001).
Zhang, K. et al. Long-range polony haplotyping of individual human chromosome molecules. Nature Genet. 38, 382–387 (2006).
Ma, L. et al. Direct determination of molecular haplotypes by chromosome microdissection. Nature Meth. 7, 299–301 (2010).
Yang, H., Chen, X. & Wong, W. H. Completely phased genome sequencing through chromosome sorting. Proc. Natl Acad. Sci. USA 108, 12–17 (2011).
Fan, H. C., Wang, J., Potanina, A. & Quake, S. R. Whole-genome molecular haplotyping of single cells. Nature Biotech. 29, 51–57 (2010).
Zong, C., Lu, S., Chapman, A. R. & Xie, X. S. Genome-wide detection of single-nucleotide and copy-number variations of a single human cell. Science 338, 1622–1626 (2012).
Wang, J., Fan, H. C., Behr, B. & Quake, S. R. Genome-wide single-cell analysis of recombination activity and de novo mutation rates in human sperm. Cell 150, 402–412 (2012).
Kirkness, E. F. et al. Sequencing of isolated sperm cells for direct haplotyping of a human genome. Genome Res. 23, 826–832 (2013).
Lu, S. et al. Probing meiotic recombination and aneuploidy of single sperm cells by whole-genome sequencing. Science 338, 1627–1630 (2012).
Hou, Y. et al. Genome analyses of single human oocytes. Cell 155, 1492–1506 (2013).
de Bourcy, C. F. A. et al. A quantitative comparison of single-cell whole genome amplification methods. PLoS ONE 9, e105585 (2014).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Duan, Z. et al. A three-dimensional model of the yeast genome. Nature 465, 363–367 (2010).
Dekker, J., Marti-Renom, M. A. & Mirny, L. A. Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nature Rev. Genet. 14, 390–403 (2013).
Selvaraj, S., R. Dixon, J., Bansal, V. & Ren, B. Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nature Biotech. 31, 1111–1118 (2013). This paper reports the first use of chromatin interaction maps to capture long-range sparse haplotypes along with a hybrid strategy to increase haplotype density.
Putnam, N. H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. arXiv [online], (2015).
Lancia, G., Bafna, V., Istrail, S., Lippert, R. & Schwartz, R. in Lecture Notes in Computer Science Vol. 2161 (eds Goos, G. et al.) 182–193 (Springer, 2001).
Bansal, V., Halpern, A. L., Axelrod, N. & Bafna, V. An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Res. 18, 1336–1346 (2008).
Bansal, V. & Bafna, V. HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24, i153–i159 (2008).
Aguiar, D. & Istrail, S. Haplotype assembly in polyploid genomes and identical by descent shared tracts. Bioinformatics 29, 352–360 (2013).
Aguiar, D. & Istrail, S. HapCompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data. J. Comp. Bio. 19, 577–590 (2012).
Lo, C., Bashir, A., Bansal, V. & Bafna, V. Strobe sequence design for haplotype assembly. BMC Bioinformatics 12, S24 (2011).
Delaneau, O., Howie, B., Cox, A. J., Zagury, J.-F. & Marchini, J. Haplotype estimation using sequencing reads. Am. J. Hum. Genet. 93, 687–696 (2013).
Zhang, K. & Zhi, D. Joint haplotype phasing and genotype calling of multiple individuals using haplotype informative reads. Bioinformatics 29, 2427–2434 (2013).
Yang, W. Y. et al. Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data. Bioinformatics 29, 2245–2252 (2013).
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv [online], (2012).
Fan, H. C. et al. Non-invasive prenatal measurement of the fetal genome. Nature 487, 320–324 (2012).
Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).
Kuleshov, V. Probabilistic single-individual haplotyping. Bioinformatics 30, i379–i385 (2014).
Matsumoto, H. & Kiryu, H. MixSIH: a mixture model for single individual haplotyping. BMC Genomics 14, S5 (2013).
Pemberton, T. J. et al. Genomic patterns of homozygosity in worldwide human populations. Am. J. Hum. Genet. 91, 275–292 (2012).
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012). This paper demonstrates the use of allelic imbalance across long blocks of phased markers as a signal for aneuploidy in tumour genomes.
Schaaf, C. P. et al. Truncating mutations of MAGEL2 cause Prader–Willi phenotypes and autism. Nature Genet. 45, 1405–1408 (2013).
Wang, L. et al. Programming and inheritance of parental DNA methylomes in mammals. Cell 157, 979–991 (2014).
Gerstein, M. B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 488, 91–100 (2012).
Xie, W. et al. Base-resolution analyses of sequence and parent-of-origin dependent DNA methylation in the mouse genome. Cell 148, 816–831 (2012).
Leung, D. et al. Integrative analysis of haplotype-resolved epigenomes across human tissues. Nature 518, 350–354 (2015).
Lo, Y. M. D. et al. Maternal plasma DNA sequencing reveals the genome-wide genetic and mutational profile of the fetus. Sci. Transl Med. 2, 61ra91 (2010).
Brown, C. J. et al. The human XIST gene: analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Cell 71, 527–542 (1992).
Stoffregen, E. P., Donley, N., Stauffer, D., Smith, L. & Thayer, M. J. An autosomal locus that controls chromosome-wide replication timing and mono-allelic expression. Hum. Mol. Genet. 20, 2366–2378 (2011).
Xiao, M. et al. Determination of haplotypes from single DNA molecules: a method for single-molecule barcoding. Hum. Mutat. 28, 913–921 (2007).
Xiao, M. et al. Direct determination of haplotypes from single DNA molecules. Nature Meth. 6, 199–201 (2009).
Mitra, R. D. et al. Digital genotyping and haplotyping with polymerase colonies. Proc. Natl Acad. Sci. USA 100, 5926–5931 (2003).
Wetmur, J. G. Molecular haplotyping by linking emulsion PCR: analysis of paraoxonase 1 haplotypes and phenotypes. Nucleic Acids Res. 33, 2615–2619 (2005).
Turner, D. J. et al. Assaying chromosomal inversions by single-molecule haplotyping. Nature Meth. 3, 439–445 (2006).
Regan, J. F. et al. A rapid molecular approach for chromosomal phasing. PLoS ONE 10, e0118270 (2015).
Nedelkova, M. et al. Targeted isolation of cloned genomic regions by recombineering for haplotype phasing and isogenic targeting. Nucleic Acids Res. 39, e137 (2011).
de Vree, P. J. P. et al. Targeted sequencing by proximity ligation for comprehensive variant detection and local haplotyping. Nature Biotech. 32, 1019–1025 (2014).
Adey, A. et al. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res. 24, 2041–2049 (2014).
Steinberg, K. M. et al. Single haplotype assembly of the human genome from a hydatidiform mole. Genome Res. 24, 2066–2076 (2014).
Bauer, D. E. et al. An erythroid enhancer of BCL11A subject to genetic variation determines fetal hemoglobin level. Science 342, 253–257 (2013).
Acknowledgements
The authors thank B. Browning, B. Vernot, A. Gordon and members of the Shendure Lab for discussions.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors have patent or patent applications related to the subject matter reviewed here.
Glossary
- Low-frequency variants
-
Single-nucleotide variants, insertions and deletions (indels) or copy-number variants that have minor allele frequency in a population <1%; that is, variants found on <1 out of every 100 haplotypes.
- Private variants
-
Variants that are found in a single individual or pedigree and are thus recalcitrant to phasing by population-based methods owing to their absence from reference panels.
- Linkage disequilibrium
-
A measure of the probability that two polymorphic loci do not segregate independently within a population.
- Padlock probes
-
Single-stranded DNA oligonucleotides that have a constant region flanked by two targeting 'arms' that are complementary to the sequence of a genomic target. After highly specific hybridization to the target, the probes can be circularized and analysed for genotyping.
- Personal genome
-
A substantially complete genome sequence of a single individual, typically obtained to attempt to describe or predict medical or other traits of that individual.
- Mate-paired
-
Pertaining to a type of sequencing library preparation in which portions of a haplotype separated by 3–5 kb are brought into proximity by fragmentation and in vitro circularization. By sequencing across the junction of these circles, variants that are separated in genomic space but that appear on the same fragments can be jointly phased.
- High-molecular-weight (HMW) genomic DNA
-
Genomic DNA isolated in such a way as to preserve long intact DNA fragments, ideally exceeding 100 kb on average. The ideal length may differ depending on the application.
- Fosmids
-
DNA cloning vectors containing up to 40 kb of insert, typically packaged in bulk into phage and transfected into Escherichia coli, in which a library can be propagated.
- Multiple displacement amplification
-
(MDA). A method for high-gain whole-genome amplification in which a low input mass of high-molecular-weight DNA is exponentially copied by random priming with short oligonucleotides, followed by primer extension with a strand-displacing polymerase at a constant temperature. Resulting amplicons are typically several kilobases in length.
- Complete Genomics sequencing platform
-
A form of high-throughput short-read sequencing technology and a suite of analysis tools offered as a commercial service.
- Illumina sequencing platform
-
The most commonly used form of high-throughput short-read sequencing that offers a low cost per base.
- Moleculo system
-
A commercial library preparation and in silico method for reconstructing the sequence of a 6–10-kb fragment of DNA using short-read sequencing instruments.
- Long-read sequencing methods
-
Sequencing technologies in which either raw or computationally assembled reads exceed 1 kb, such that each read has a greater probability of capturing two or more variants on a single haplotype. They are typically associated with a higher cost per base and a lower throughput than short-read technologies.
- Subassembly
-
An in silico method for reconstructing the sequence of a DNA fragment that exceeds the maximum read length of the sequencing instrument. Molecules of ~500 bp are uniquely tagged, amplified, concatemerized and randomly fragmented. Short reads capturing the tag and a random portion of the original fragment can be jointly assembled to recover the full-length sequence.
- Single-molecule real-time (SMRT) sequencing
-
A form of sequencing technology that directly interrogates individual molecules of DNA and thus does not require library amplification before sequencing.
- Nanopore sequencing
-
A method for DNA sequencing in which small changes in electrical current are detected as sequential bases of a DNA polymer pass through a 1 nm transmembrane protein or solid-state pore. As single molecules of DNA can be sequenced directly, no library amplification step is required.
- Chromatin interaction maps
-
Sets of measurements of the pairwise 3D spatial proximity of many non-adjacent regions of genomic DNA in a nucleus, as ascertained experimentally by crosslinking chromatin, ligating together fragments of DNA that are associated with the crosslinked proteins, and sequencing.
- Phred-scale quality scores
-
A scoring system, originally developed for assigning confidence to individual base calls from sequencing instruments, in which an estimated error probability (P) is converted to a quality score (Q) by the transformation Q = −10 log10(P).
- Runs of homozygosity
-
Regions of the genome above a given distance threshold at which both haplotypes are identical.
- Compound heterozygosity
-
The presence of two different recessive alleles, one on each haplotype, in a specific gene in a single individual. It is particularly relevant for autosomal recessive genetic diseases, which are frequently caused by compound heterozygosity in non-consanguineous pedigrees.
- Variant imputation
-
A statistically grounded method for 'filling in' missing alleles in sparsely genotyped individuals to increase the power of association studies on the basis of similarity to reference panels of previously ascertained haplotypes.
Rights and permissions
About this article
Cite this article
Snyder, M., Adey, A., Kitzman, J. et al. Haplotype-resolved genome sequencing: experimental methods and applications. Nat Rev Genet 16, 344–358 (2015). https://doi.org/10.1038/nrg3903
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrg3903
This article is cited by
-
AmelHap: Leveraging drone whole-genome sequence data to create a honey bee HapMap
Scientific Data (2023)
-
Failing the four-gamete test enables exact phasing: the Corners’ Algorithm
Genetics Selection Evolution (2022)
-
Experimental method for haplotype phasing across the entire length of chromosome 21 in trisomy 21 cells using a chromosome elimination technique
Journal of Human Genetics (2022)
-
Noninvasive prenatal testing of α-thalassemia and β-thalassemia through population-based parental haplotyping
Genome Medicine (2021)
-
trioPhaser: using Mendelian inheritance logic to improve genomic phasing of trios
BMC Bioinformatics (2021)