Whole-genome molecular haplotyping of single cells

Journal name:
Nature Biotechnology
Volume:
29,
Pages:
51–57
Year published:
DOI:
doi:10.1038/nbt.1739
Received
Accepted
Published online

Abstract

Conventional experimental methods of studying the human genome are limited by the inability to independently study the combination of alleles, or haplotype, on each of the homologous copies of the chromosomes. We developed a microfluidic device capable of separating and amplifying homologous copies of each chromosome from a single human metaphase cell. Single-nucleotide polymorphism (SNP) array analysis of amplified DNA enabled us to achieve completely deterministic, whole-genome, personal haplotypes of four individuals, including a HapMap trio with European ancestry (CEU) and an unrelated European individual. The phases of alleles were determined at ~99.8% accuracy for up to ~96% of all assayed SNPs. We demonstrate several practical applications, including direct observation of recombination events in a family trio, deterministic phasing of deletions in individuals and direct measurement of the human leukocyte antigen haplotypes of an individual. Our approach has potential applications in personal genomics, single-cell genomics and statistical genetics.

At a glance

Figures

  1. Microfluidic device designed for the amplification of metaphase chromosomes from a single cell.
    Figure 1: Microfluidic device designed for the amplification of metaphase chromosomes from a single cell.

    A single metaphase cell is recognized microscopically and captured in region 1. Protease (pepsin at low pH) is introduced to generate chromosome suspension in region 2. Chromosome suspension is partitioned into 48 units (region 3). Content in each partition is individually amplified (region 4). Specifically, chromosomes at low pH are first neutralized and treated with trypsin to digest chromosomal proteins. Chromosomes are denatured with alkali and subsequently neutralized for multiple strand displacement amplification to take place. As reagents are introduced sequentially into each air-filled chamber, enabled by the gas permeability of the device's material, chromosomes are pushed into one chamber after the next and finally arrive in the amplification chamber. Amplified materials are retrieved at the collection ports (region 5). In the overview image of the device, control channels are filled with green dye. Flow channels in the cell-sorting region and amplification region are filled with red and blue dyes, respectively.

  2. Whole-genome haplotyping.
    Figure 2: Whole-genome haplotyping.

    (a) Determining the chromosomal origin of amplification products in a microfluidic device using 46-loci PCR. This table represents results from an experiment using a single metaphase cell of P0's cultured whole blood. A row represents the content inside a chamber on the microfluidic device, and a column represents a locus, with specified chromosome and coordinate (NCBI Build 36.1). Each locus, except those on chromosomes 17 and 20, was found in two chambers. The two alleles of a SNP are highlighted in red and green. Heterozygous loci are labeled in blue. Chamber numbers labeled yellow were pooled together and genotyped on one HumanOmni1-Quad array, and chamber numbers labeled orange were pooled together and genotyped on another array. Genomic DNA extracted from cultured whole blood was also tested with the same 46-loci PCR. (b) Statistics of whole-genome haplotyping. The fraction of SNPs present on the array phased for each chromosome of each individual (GM12891, GM12892, GM12878 and a European individual 'P0') is shown as a colored bar. (c) Fraction of SNPs phased as a function of the number of pairs of homologous chromosomes assayed. This is based on the results from four single-cell experiments of P0. Each point represents the coverage of an autosome. The error bars represent s.e.m.

  3. Comparison of statistically determined phases with experimentally determined phases.
    Figure 3: Comparison of statistically determined phases with experimentally determined phases.

    (a) Comparison of experimentally determined phases of ~160,000 heterozygous SNPs of GM12878 (child of the trio) and those determined by phase III of the HapMap project. Unambiguous SNPs refer to those that are homozygous for at least one parent and are deterministically phased using family data in HapMap. This comparison shows the accuracy of DDP. Ambiguous SNPs refer to those that are heterozygous for all members of the trio and statistical phasing is used in HapMap. This comparison provides an evaluation of statistical phasing. (b) Comparison of experimentally determined phases of P0 and those determined by PHASE. Seventy-six regions on the autosomal chromosomes were randomly selected and statistically phased three times. Each region carried 100 heterozygous SNPs and spanned an average of ~2 Mb. Switch error rate was calculated as the proportion of heterozygous SNPs with different phases relative to the SNP immediately upstream. Single-site error rate was calculated as the proportion of heterozygous SNPs with incorrect phase. A SNP was considered correctly phased if it had the dominant phase. For each region, the average values from the three runs were reported. Presented here are the average switch error and single-site error per region. The deterministic phases measured by DDP are taken as the ground truth.

  4. Direct observation of recombination events and deterministic phasing of heterozygous deletions in the family trio.
    Figure 4: Direct observation of recombination events and deterministic phasing of heterozygous deletions in the family trio.

    Each allele with DDP data available for the child and the parent is represented by a colored line (blue, alleles transmitted to the child from the father; red, alleles transmitted to the child from the mother; black, untransmitted alleles). Centromeres and regions of heterochromatin are not assayed by genotyping arrays and are thus in white. Heterozygous deletions in the parents are represented as triangles along each homologous chromosome. A solid triangle represents one copy and a hollow triangle represents a null copy. The phases of deletions are determined for each parent independently. The triangles are color coded according to the state of transmittance as determined by the location of the deletion relative to spots of recombination. The phases of the deletions in the child are determined independent of the parents and are shown on top of the parental chromosomes. The integers on the left are the IDs of each region given by HapMap phase III. The numbers on the right are the copy number of a region in the child as determined by HapMap. Chromosomes are plotted with the same length.

  5. HLA haplotypes of P0 determined using DDP.
    Figure 5: HLA haplotypes of P0 determined using DDP.

    At each of the six classical HLA loci, the experimentally phased SNP haplotypes of P0 and 176 phased SNP haplotypes of CEU trios available from HapMap phase III were placed on a neighbor-joining tree. The two haplotypes of P0 are labeled in red and blue. For haplotypes in the CEU panel with HLA typing data, the four-digit HLA allele is presented next to the sample label. Most of each tree is compressed. Each compressed subtree is labeled with the HLA allele associated with members inside the subtree, if HLA allele information is available. The allelic identities of HLA-B and HLA-C on haplotype 1 were not determined with DDP because CEU individuals with similar SNP haplotypes as P0's SNP haplotypes did not have HLA typing data at these loci but could be inferred from the results of direct HLA typing of genomic DNA (first row of table). HLA-DQA1 was not directly typed.

Accession codes

Referenced accessions

Sequence Read Archive

References

  1. Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872876 (2008).
  2. Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 5359 (2008).
  3. Ahn, S.M. et al. The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. 19, 16221629 (2009).
  4. Kim, J.I. et al. A highly annotated whole-genome sequence of a Korean individual. Nature 460, 10111015 (2009).
  5. Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 6065 (2008).
  6. Pushkarev, D., Neff, N.F. & Quake, S.R. Single-molecule sequencing of an individual human genome. Nat. Biotechnol. 27, 847850 (2009).
  7. Schuster, S.C. et al. Complete Khoisan and Bantu genomes from southern Africa. Nature 463, 943947 (2010).
  8. Petersdorf, E.W., Malkki, M., Gooley, T.A., Martin, P.J. & Guo, Z. MHC haplotype matching for unrelated hematopoietic cell transplantation. PLoS Med. 4, e8 (2007).
  9. de Bakker, P.I. et al. A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nat. Genet. 38, 11661172 (2006).
  10. Stewart, C.A. et al. Complete MHC haplotype sequencing for common disease gene mapping. Genome Res. 14, 11761187 (2004).
  11. Groenendijk, M., Cantor, R.M., de Bruin, T.W. & Dallinga-Thie, G.M. The apoAI-CIII-AIV gene cluster. Atherosclerosis 157, 111 (2001).
  12. Nagel, R.L. et al. The Senegal DNA haplotype is associated with the amelioration of anemia in African-American sickle cell anemia patients. Blood 77, 13711375 (1991).
  13. Sun, T. et al. Haplotypes in matrix metalloproteinase gene cluster on chromosome 11q22 contribute to the risk of lung cancer development and progression. Clin. Cancer Res. 12, 70097017 (2006).
  14. Drysdale, C.M. et al. Complex promoter and coding region beta 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. Proc. Natl. Acad. Sci. USA 97, 1048310488 (2000).
  15. The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 12991320 (2005).
  16. Frazer, K.A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851861 (2007).
  17. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
  18. Zhang, K. et al. Long-range polony haplotyping of individual human chromosome molecules. Nat. Genet. 38, 382387 (2006).
  19. Mitra, R.D. et al. Digital genotyping and haplotyping with polymerase colonies. Proc. Natl. Acad. Sci. USA 100, 59265931 (2003).
  20. Ding, C. & Cantor, C.R. Direct molecular haplotyping of long-range genomic DNA with M1-PCR. Proc. Natl. Acad. Sci. USA 100, 74497453 (2003).
  21. Michalatos-Beloin, S., Tishkoff, S.A., Bentley, K.L., Kidd, K.K. & Ruano, G. Molecular haplotyping of genetic markers 10 kb apart by allele-specific long-range PCR. Nucleic Acids Res. 24, 48414843 (1996).
  22. Ruano, G., Kidd, K.K. & Stephens, J.C. Haplotype of multiple polymorphisms resolved by enzymatic amplification of single DNA molecules. Proc. Natl. Acad. Sci. USA 87, 62966300 (1990).
  23. Woolley, A.T., Guillemette, C., Li Cheung, C., Housman, D.E. & Lieber, C.M. Direct haplotyping of kilobase-size DNA using carbon nanotube probes. Nat. Biotechnol. 18, 760763 (2000).
  24. Burgtorf, C. et al. Clone-based systematic haplotyping (CSH): a procedure for physical haplotyping of whole genomes. Genome Res. 13, 27172724 (2003).
  25. Xiao, M. et al. Direct determination of haplotypes from single DNA molecules. Nat. Methods 6, 199201 (2009).
  26. Ma, L. et al. Direct determination of molecular haplotypes by chromosome microdissection. Nat. Methods 7, 299301 (2010).
  27. Douglas, J.A., Boehnke, M., Gillanders, E., Trent, J.M. & Gruber, S.B. Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies. Nat. Genet. 28, 361364 (2001).
  28. Marchini, J. et al. A comparison of phasing algorithms for trios and unrelated individuals. Am. J. Hum. Genet. 78, 437450 (2006).
  29. Ashley, E.A. et al. Clinical assessment incorporating a personal genome. Lancet 375, 15251535 (2010).
  30. Bredel, M. et al. Amplification of whole tumor genomes and gene-by-gene mapping of genomic aberrations from limited sources of fresh-frozen and paraffin-embedded DNA. J. Mol. Diagn. 7, 171182 (2005).
  31. Marcy, Y. et al. Nanoliter reactors improve multiple displacement amplification of genomes from single cells. PLoS Genet. 3, e155 (2007).
  32. Marcy, Y. et al. Dissecting biological “dark matter” with single-cell genetic analysis of rare and uncultivated TM7 microbes from the human mouth. Proc. Natl. Acad. Sci. USA 104, 1188911894 (2007).
  33. Stephens, M., Smith, N.J. & Donnelly, P. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68, 978989 (2001).
  34. Stephens, M. & Donnelly, P. A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am. J. Hum. Genet. 73, 11621169 (2003).
  35. Stephens, M. & Scheet, P. Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am. J. Hum. Genet. 76, 449462 (2005).
  36. Kukita, Y. et al. Genome-wide definitive haplotypes determined using a collection of complete hydatidiform moles. Genome Res. 15, 15111518 (2005).
  37. Andres, A.M. et al. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genet. Epidemiol. 31, 659671 (2007).
  38. Broman, K.W., Murray, J.C., Sheffield, V.C., White, R.L. & Weber, J.L. Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am. J. Hum. Genet. 63, 861869 (1998).
  39. Kong, A. et al. A high-resolution recombination map of the human genome. Nat. Genet. 31, 241247 (2002).
  40. Frazer, K.A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851861 (2007).
  41. Conrad, D.F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704712 (2010).
  42. Su, S.Y. et al. Inferring combined CNV/SNP haplotypes from genotype data. Bioinformatics 26, 14371445 (2010).
  43. McCarroll, S.A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40, 11661174 (2008).
  44. Shiina, T., Hosomichi, K., Inoko, H. & Kulski, J.K. The HLA genomic loci map: expression, interaction, diversity and disease. J. Hum. Genet. 54, 1539 (2009).
  45. Guo, Z., Hood, L., Malkki, M. & Petersdorf, E.W. Long-range multilocus haplotype phasing of the MHC. Proc. Natl. Acad. Sci. USA 103, 69646969 (2006).
  46. Maiers, M., Gragert, L. & Klitz, W. High-resolution HLA alleles and haplotypes in the United States population. Hum. Immunol. 68, 779788 (2007).
  47. Price, P. et al. The genetic basis for the association of the 8.1 ancestral haplotype (A1, B8, DR3) with multiple immunopathological diseases. Immunol. Rev. 167, 257274 (1999).
  48. White, R.A. III, Blainey, P.C., Fan, H.C. & Quake, S.R. Digital PCR provides sensitive and absolute calibration for high throughput sequencing. BMC Genomics 10, 116 (2009).
  49. Tamura, K., Dudley, J., Nei, M. & Kumar, S. MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24, 15961599 (2007).

Download references

Author information

Affiliations

  1. Department of Bioengineering, Stanford University, Stanford, California, USA.

    • H Christina Fan,
    • Jianbin Wang &
    • Stephen R Quake
  2. Howard Hughes Medical Institute, Stanford University, Stanford, California, USA.

    • Anastasia Potanina &
    • Stephen R Quake
  3. Department of Applied Physics, Stanford University, Stanford, California, USA.

    • Stephen R Quake

Contributions

H.C.F. and S.R.Q. conceived the experiments. H.C.F. designed the microfluidic device. A.P. developed protocols for device fabrication. H.C.F. and J.W. performed the experiments. H.C.F., J.W. and S.R.Q. analyzed the data and wrote the manuscript.

Competing financial interests

S.R.Q. is a founder, consultant and shareholder of Fluidigm Corporation and Helicos Biosciences Corporation, and a consultant and shareholder of Artemis Health. H.C.F. was previously employed at Fluidigm Corporation. All other authors declare no conflict of interest.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (980K)

    Supplementary Tables 1–6 and Supplementary Figs. 1–4

Text files

  1. Supplementary Data Set 1 (21M)

    Haplotypes 1 GM12891 haplotypes.txt

  2. Supplementary Data Set 2 (20M)

    Haplotypes 2 GM12892 haplotypes.txt

  3. Supplementary Data Set 3 (21M)

    Haplotypes 3 GM12878 haplotypes.txt

  4. Supplementary Data Set 4 (31M)

    P0 Omni1S

  5. Supplementary Data Set 5 (16M)

    P0 Omni1Quad

Additional data