The importance of phase information for human genomics

Abstract

Contemporary sequencing studies often ignore the diploid nature of the human genome because they do not routinely separate or 'phase' maternally and paternally derived sequence information. However, many findings — both from recent studies and in the more established medical genetics literature — indicate that relationships between human DNA sequence and phenotype, including disease, can be more fully understood with phase information. Thus, the existing technological impediments to obtaining phase information must be overcome if human genomics is to reach its full potential.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: The distribution of variants between homologous chromosomes can affect gene function.
Figure 2: Strategies for empirical haplotype reconstruction.
Figure 3: Phase reconstruction using mate-pair information.

References

  1. 1

    Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

    PubMed  PubMed Central  Google Scholar 

  2. 2

    Lifton, R. P. Individual genomes on the horizon. N. Engl. J. Med. 362, 1235–1236 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3

    Ashley, E. A. et al. Clinical assessment incorporating a personal genome. Lancet 375, 1525–1535 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4

    Roach, J. C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. 5

    Ng, S. B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nature Genet. 42, 30–35 (2010).

    CAS  Google Scholar 

  6. 6

    A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  7. 7

    Frazer, K. A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8

    Kasowski, M. et al. Variation in transcription factor binding among humans. Science 328, 232–235 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9

    Lupski, J. R. et al. Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N. Engl. J. Med. 362, 1181–1191 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10

    Montgomery, S. B. et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010).

    CAS  PubMed  Google Scholar 

  11. 11

    Pickrell, J. K. et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. 12

    Morozova, O., Hirst, M. & Marra, M. A. Applications of new sequencing technologies for transcriptome analysis. Annu. Rev. Genomics Hum. Genet. 10, 135–151 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13

    Park, P. J. ChIP-seq: advantages and challenges of a maturing technology. Nature Rev. Genet. 10, 669–680 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14

    Tucker, T., Marra, M. & Friedman, J. M. Massively parallel sequencing: the next big thing in genetic medicine. Am. J. Hum. Genet. 85, 142–154 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15

    McDaniell, R. et al. Heritable individual-specific and allele-specific chromatin signatures in humans. Science 328, 235–239 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  16. 16

    Zhang, D. et al. Genetic control of individual differences in gene-specific methylation in human brain. Am. J. Hum. Genet. 86, 411–419 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17

    Kong, A. et al. Parental origin of sequence variants associated with complex diseases. Nature 462, 868–874 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18

    Tycko, B. Mapping allele-specific DNA methylation: a new tool for maximizing information from GWAS. Am. J. Hum. Genet. 86, 109–112 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19

    Gimelbrant, A., Hutchinson, J. N., Thompson, B. R. & Chess, A. Widespread monoallelic expression on human autosomes. Science 318, 1136–1140 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. 20

    Wen, G. et al. Both rare and common polymorphisms contribute functional variation at CHGA, a regulator of catecholamine physiology. Am. J. Hum. Genet. 74, 197–207 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21

    Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genet. 41, 1061–1067 (2009).

    CAS  Google Scholar 

  22. 22

    Wain, L. V., Armour, J. A. & Tobin, M. D. Genomic copy number variation, human health, and disease. Lancet 374, 340–350 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. 23

    Leary, R. J. et al. Integrated analysis of homozygous deletions, focal amplifications, and sequence alterations in breast and colorectal cancers. Proc. Natl Acad. Sci. USA 105, 16224–16229 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24

    Knudson, A. G. Two genetic hits (more or less) to cancer. Nature Rev. Cancer 1, 157–162 (2001).

    CAS  Google Scholar 

  25. 25

    Cirulli, E. T. & Goldstein, D. B. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nature Rev. Genet. 11, 415–425 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. 26

    Zschocke, J. Dominant versus recessive: molecular mechanisms in metabolic disease. J. Inherit. Metab. Dis. 31, 599–618 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27

    Frazer, K. A., Murray, S. S., Schork, N. J. & Topol, E. J. Human genetic variation and its contribution to complex traits. Nature Rev. Genet. 10, 241–251 (2009).

    CAS  Google Scholar 

  28. 28

    Su, Z., Cardin, N., Donnelly, P., Marchini, J. & Control, W. T. C. A Bayesian method for detecting and characterizing allelic heterogeneity and boosting signals in genome-wide association etudies. Statistical Sci. 24, 430–450 (2009).

    Google Scholar 

  29. 29

    Dickson, S. P., Wang, K., Krantz, I., Hakonarson, H. & Goldstein, D. B. Rare variants create synthetic genome-wide associations. Plos Biol. 8, e1000294 (2010).

    PubMed  PubMed Central  Google Scholar 

  30. 30

    Graham, R. R. et al. Genetic variants near TNFAIP3 on 6q23 are associated with systemic lupus erythematosus. Nature Genet. 40, 1059–1061 (2008).

    CAS  Google Scholar 

  31. 31

    Musone, S. L. et al. Multiple polymorphisms in the TNFAIP3 region are independently associated with systemic lupus erythematosus. Nature Genet. 40, 1062–1064 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32

    Graham, R. R. et al. A common haplotype of interferon regulatory factor 5 (IRF5) regulates splicing and expression and is associated with increased risk of systemic lupus erythematosus. Nature Genet. 38, 550–555 (2006).

    CAS  Google Scholar 

  33. 33

    Graham, R. R. et al. Three functional variants of IFN regulatory factor 5 (IRF5) define risk and protective haplotypes for human lupus. Proc. Natl Acad. Sci. USA 104, 6758–6763 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34

    Harley, J. B. et al. Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAA1542 and other loci. Nature Genet. 40, 204–210 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35

    Shimane, K. et al. The association of a nonsynonymous single-nucleotide polymorphism in TNFAIP3 with systemic lupus erythematosus and rheumatoid arthritis in the Japanese population. Arthritis Rheum. 62, 574–579 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36

    Lemmers, R. J. et al. A unifying genetic model for facioscapulohumeral muscular dystrophy. Science 329, 1650–1653 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37

    Kitzman, J. O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nature Biotech. 19 Dec 2010 (doi:10.1038/nbt.1740).

    PubMed  PubMed Central  Google Scholar 

  38. 38

    Nievergelt, C. M., Libiger, O. & Schork, N. J. Generalized analysis of molecular variance. PLoS Genet. 3, e51 (2007).

    PubMed  PubMed Central  Google Scholar 

  39. 39

    Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40

    Fan, H. C., Wang, J., Potanina, A. & Quake, S. R. Whole-genome molecular haplotyping of single cells. Nature Biotech. 19 Dec 2010 (doi:10.1038/nbt.1739).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41

    Kong, A. et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nature Genet. 40, 1068–1075 (2008).

    CAS  Google Scholar 

  42. 42

    Li, Y., Willer, C., Sanna, S. & Abecasis, G. Genotype imputation. Annu. Rev. Genomics Hum. Genet. 10, 387–406 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. 43

    Browning, S. R. Missing data imputation and haplotype phase inference for genome-wide association studies. Hum. Genet. 124, 439–450 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44

    Biernacka, J. M. et al. Assessment of genotype imputation methods. BMC Proc. 3 Suppl. 7, S5 (2009).

    PubMed  PubMed Central  Google Scholar 

  45. 45

    Gao, G., Allison, D. B. & Hoeschele, I. Haplotyping methods for pedigrees. Hum. Hered. 67, 248–266 (2009).

    PubMed  PubMed Central  Google Scholar 

  46. 46

    Salem, R. M., Wessel, J. & Schork, N. J. A comprehensive literature review of haplotyping software and methods for use with unrelated individuals. Hum. Genomics 2, 39–66 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47

    Andres, A. M. et al. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genet. Epidemiol. 31, 659–671 (2007).

    PubMed  PubMed Central  Google Scholar 

  48. 48

    Durbin, R. M. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

    CAS  PubMed  Google Scholar 

  49. 49

    Goriely, A. & Wilkie, A. O. Missing heritability: paternal age effect mutations and selfish spermatogonia. Nature Rev. Genet. 11, 589 (2010).

    CAS  Google Scholar 

  50. 50

    Moloney, D. M. et al. Exclusive paternal origin of new mutations in Apert syndrome. Nature Genet. 13, 48–53 (1996).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. 51

    Bansal, V., Tewhey, R., Topol, E. J. & Schork, N. The next phase in human genetics. Nature Biotech. 29, 38–39 (2011).

    CAS  Google Scholar 

  52. 52

    Ma, L. et al. Direct determination of molecular haplotypes by chromosome microdissection. Nature Methods 7, 299–301 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. 53

    Kouprina, N. & Larionov, V. TAR cloning: insights into gene function, long-range haplotypes and genome structure and evolution. Nature Rev. Genet. 7, 805–812 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. 54

    Paul, P. & Apgar, J. Single-molecule dilution and multiple displacement amplification for molecular haplotyping. Biotechniques 38, 553–559 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. 55

    Kim, J. H., Waterman, M. S. & Li, L. M. Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi. Genome Res. 17, 1101–1110 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. 56

    Bansal, V. & Bafna, V. HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24, i153–159 (2008).

    Google Scholar 

  57. 57

    Bansal, V., Halpern, A. L., Axelrod, N. & Bafna, V. An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Res. 18, 1336–1346 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58

    He, D., Choi, A., Pipatsrisawat, K., Darwiche, A. & Eskin, E. Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 26, i183–i190 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. 59

    Shimizu, H. et al. Epidermolysis bullosa simplex associated with muscular dystrophy: phenotype-genotype correlations and review of the literature. J. Am. Acad. Dermatol. 41, 950–956 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. 60

    Fong, C. Y., Mumford, A. D., Likeman, M. J. & Jardine, P. E. Cerebral palsy in siblings caused by compound heterozygous mutations in the gene encoding protein C. Dev. Med. Child. Neurol. 52, 489–493 (2010).

    PubMed  PubMed Central  Google Scholar 

  61. 61

    McLaughlin, H. M. et al. Compound heterozygosity for loss-of-function lysyl-tRNA synthetase mutations in a patient with peripheral neuropathy. Am. J. Hum. Genet. 87, 560–566 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. 62

    Welch, K. O., Marin, R. S., Pandya, A. & Arnos, K. S. Compound heterozygosity for dominant and recessive GJB2 mutations: effect on phenotype and review of the literature. Am. J. Med. Genet. A 143A, 1567–1573 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. 63

    Aguilar Martinez, P. et al. Compound heterozygotes for hemochromatosis gene mutations: may they help to understand the pathophysiology of the disease? Blood Cells Mol. Dis. 23, 269–276 (1997).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. 64

    Nakamura, A., Yazaki, M., Tokuda, T., Hattori, T. & Ikeda, S. A Japanese patient with familial Mediterranean fever associated with compound heterozygosity for pyrin variant E148Q/M694I. Intern. Med. 44, 261–265 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. 65

    Majumdar, S. et al. Compound heterozygous mutation with a novel splice donor region DNA sequence variant in the succinate dehydrogenase subunit B gene in malignant paraganglioma. Pediatr. Blood Cancer 54, 473–475 (2010).

    PubMed  PubMed Central  Google Scholar 

  66. 66

    Avigad, S. et al. Compound heterozygosity in nonphenylketonuria hyperphenylalanemia: the contribution of mutations for classical phenylketonuria. Am. J. Hum. Genet. 49, 393–399 (1991).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. 67

    Moon, S. et al. Novel compound heterozygous mutations in the fructose-1,6-bisphosphatase gene cause hypoglycemia and lactic acidosis. Metabolism 60, 107–113 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. 68

    Dork, T., Bendix-Waltes, R., Wegner, R. D. & Stumm, M. Slow progression of ataxia-telangiectasia with double missense and in frame splice mutations. Am. J. Med. Genet. A 126A, 272–277 (2004).

    PubMed  PubMed Central  Google Scholar 

  69. 69

    Maimaiti, M. et al. Silent exonic mutation in the acid-α-glycosidase gene that causes glycogen storage disease type II by affecting mRNA splicing. J. Hum. Genet. 54, 493–496 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. 70

    Miyake, A. et al. A compound heterozygote of novel and recurrent DTDST mutations results in a novel intermediate phenotype of Desbuquois dysplasia, diastrophic dysplasia, and recessive form of multiple epiphyseal dysplasia. J. Hum. Genet. 53, 764–768 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. 71

    De Rosa, M. et al. Evidence for a recessive inheritance of Turcot's syndrome caused by compound heterozygous mutations within the PMS2 gene. Oncogene 19, 1719–1723 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. 72

    Drysdale, C. M. et al. Complex promoter and coding region β2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. Proc. Natl Acad. Sci. USA 97, 10483–10488 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. 73

    Horan, M. et al. Human growth hormone 1 (GH1) gene expression: complex haplotype-dependent influence of polymorphic variation in the proximal promoter and locus control region. Hum. Mutat. 21, 408–423 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  74. 74

    Barroso, E. et al. FANCD2 associated with sporadic breast cancer risk. Carcinogenesis 27, 1930–1937 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. 75

    Chen, H. et al. Single nucleotide polymorphisms in the human interleukin-1B gene affect transcription according to haplotype context. Hum. Mol. Genet. 15, 519–529 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  76. 76

    Weyrich, P. et al. Role of AMP-activated protein kinase gamma 3 genetic variability in glucose and lipid metabolism in non-diabetic whites. Diabetologia 50, 2097–2106 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. 77

    Yang, H. et al. ATM sequence variants associate with susceptibility to non-small cell lung cancer. Int. J. Cancer 121, 2254–2259 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  78. 78

    Maggini, V. et al. MDR1 diplotypes as prognostic markers in multiple myeloma. Pharmacogenet. Genomics 18, 383–389 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  79. 79

    Pickard, B. S. et al. Interacting haplotypes at the NPAS3 locus alter risk of schizophrenia and bipolar disorder. Mol. Psychiatry 14, 874–884 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. 80

    Sun, H. et al. The association of adiponectin allele 45T/G and -11377C/G polymorphisms with type 2 diabetes and rosiglitazone response in Chinese patients. Br. J. Clin. Pharmacol. 65, 917–926 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  81. 81

    Williams, A. L., Housman, D. E., Rinard, M. C. & Gifford, D. K. Rapid haplotype inference for nuclear families. Genome Biol. 11, R108 (2010).

    PubMed  PubMed Central  Google Scholar 

  82. 82

    Jiang, H. T., Xu, Y., Zhao, Y. Z. & Chen, G. L. A novel algorithm for minimum recombinant haplotyping on pedigrees by zero recombinant block partition. Interdiscip. Sci. 2, 185–192 (2010).

    PubMed  PubMed Central  Google Scholar 

  83. 83

    Delaneau, O., Coulonges, C. & Zagury, J. F. Shape-IT: new rapid and accurate algorithm for haplotype inference. BMC Bioinformatics 9, 540 (2008).

    PubMed  PubMed Central  Google Scholar 

  84. 84

    Browning, B. L. & Browning, S. R. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84, 210–223 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. 85

    Eronen, L., Geerts, F. & Toivonen, H. HaploRec: efficient and accurate large-scale reconstruction of haplotypes. BMC Bioinformatics 7, 542 (2006).

    PubMed  PubMed Central  Google Scholar 

  86. 86

    Scheet, P. & Stephens, M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  87. 87

    Halperin, E. & Eskin, E. Haplotype reconstruction from genotype data using imperfect phylogeny. Bioinformatics 20, 1842–1849 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  88. 88

    Qin, Z. S., Niu, T. & Liu, J. S. Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. Am. J. Hum. Genet. 71, 1242–1247 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  89. 89

    Abecasis, G. R., Cherny, S. S., Cookson, W. O. & Cardon, L. R. Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nature Genet. 30, 97–101 (2002).

    CAS  Google Scholar 

  90. 90

    Stephens, M., Smith, N. J. & Donnelly, P. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68, 978–989 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  91. 91

    Gudbjartsson, D. F., Thorvaldsson, T., Kong, A., Gunnarsson, G. & Ingolfsdottir, A. Allegro version 2. Nature Genet. 37, 1015–1016 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  92. 92

    Excoffier, L. & Slatkin, M. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol. Biol. Evol. 12, 921–927 (1995).

    CAS  Google Scholar 

  93. 93

    Lander, E. S. & Green, P. Construction of multilocus genetic linkage maps in humans. Proc. Natl Acad. Sci. USA 84, 2363–2367 (1987).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported, in part, by the following research grants: U19 AG023122-01, R01 MH078151-01A1,N01 MH22005, U01 DA024417-01, P50 MH081755-01 and UL1 RR025774, as well as the Price Foundation and Scripps Genomic Medicine. This work is the authors' sole responsibility and does not necessarily represent funding agencies' views.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Nicholas J. Schork.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links

FURTHER INFORMATION

Nicholas J. Schork's homepage

Polymorphism Research Laboratory

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Tewhey, R., Bansal, V., Torkamani, A. et al. The importance of phase information for human genomics. Nat Rev Genet 12, 215–223 (2011). https://doi.org/10.1038/nrg2950

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing