Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Haplotype phasing of whole human genomes using bead-based barcode partitioning in a single tube

Abstract

Haplotype-resolved genome sequencing promises to unlock a wealth of information in population and medical genetics. However, for the vast majority of genomes sequenced to date, haplotypes have not been determined because of cumbersome haplotyping workflows that require fractions of the genome to be sequenced in a large number of compartments. Here we demonstrate barcode partitioning of long DNA molecules in a single compartment using “on-bead” barcoded tagmentation. The key to the method that we call “contiguity preserving transposition” sequencing on beads (CPTv2-seq) is transposon-mediated transfer of homogenous populations of barcodes from beads to individual long DNA molecules that get fragmented at the same time (tagmentation). These are then processed to sequencing libraries wherein all sequencing reads originating from each long DNA molecule share a common barcode. Single-tube, bulk processing of long DNA molecules with 150,000 different barcoded bead types provides a barcode-linked read structure that reveals long-range molecular contiguity. This technology provides a simple, rapid, plate-scalable and automatable route to accurate, haplotype-resolved sequencing, and phasing of structural variants of the genome.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Summary of the bead-based indexing workflow and the intra- vs.
Figure 2: Detection of deletion and interchromosomal translocation using linked read information.

Similar content being viewed by others

References

  1. Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Snyder, M.W., Adey, A., Kitzman, J.O. & Shendure, J. Haplotype-resolved genome sequencing: experimental methods and applications. Nat. Rev. Genet. 16, 344–358 (2015).

    Article  CAS  PubMed  Google Scholar 

  3. Bansal, V., Tewhey, R., Topol, E.J. & Schork, N.J. The next phase in human genetics. Nat. Biotechnol. 29, 38–39 (2011).

    Article  CAS  PubMed  Google Scholar 

  4. Browning, S.R. & Browning, B.L. Haplotype phasing: existing methods and new developments. Nat. Rev. Genet. 12, 703–714 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Ripke, S. et al. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat. Genet. 45, 1150–1159 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Nalls, M.A. et al. Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson's disease. Nat. Genet. 46, 989–993 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Peters, B.A. et al. Detection and phasing of single base de novo mutations in biopsies from human in vitro fertilized embryos by advanced whole-genome sequencing. Genome Res. 25, 426–434 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Zheng, G.X.Y. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34, 303–311 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Feuk, L., Carson, A.R. & Scherer, S.W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).

    Article  CAS  PubMed  Google Scholar 

  10. Moncunill, V. et al. Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads. Nat. Biotechnol. 32, 1106–1112 (2014).

    Article  CAS  PubMed  Google Scholar 

  11. Medvedev, P., Stanciu, M. & Brudno, M. Computational methods for discovering structural variation with next-generation sequencing. Nat. Methods 6 (Suppl. 1), S13–S20 (2009).

    Article  CAS  PubMed  Google Scholar 

  12. Fan, H.C., Wang, J., Potanina, A. & Quake, S.R. Whole-genome molecular haplotyping of single cells. Nat. Biotechnol. 29, 51–57 (2011).

    Article  CAS  PubMed  Google Scholar 

  13. Kaper, F. et al. Whole-genome haplotyping by dilution, amplification, and sequencing. Proc. Natl. Acad. Sci. USA 110, 5552–5557 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Kitzman, J.O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).

    Article  CAS  PubMed  Google Scholar 

  15. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

  16. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    Article  CAS  PubMed  Google Scholar 

  18. Putnam, N.H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Kuleshov, V. et al. Whole-genome haplotyping using long reads and statistical methods. Nat. Biotechnol. 32, 261–266 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Cao, H. et al. De novo assembly of a haplotype-resolved human genome. Nat. Biotechnol. 33, 617–622 (2015).

    Article  CAS  PubMed  Google Scholar 

  21. Peters, B.A. et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 487, 190–195 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Burton, J.N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Sović, I., Križanović, K., Skala, K. & Šikić, M. Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads. Bioinformatics 32, 2582–2589 (2016).

    Article  CAS  PubMed  Google Scholar 

  24. Goodwin, S. et al. Oxford Nanopore Sequencing and de novo assembly of a eukaryotic genome. Preprint at bioRxiv https://doi.org/10.1101/013490 (2015).

  25. Dear, P.H. & Cook, P.R. Happy mapping: a proposal for linkage mapping the human genome. Nucleic Acids Res. 17, 6795–6807 (1989).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Amini, S. et al. Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet. 46, 1343–1349 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Peters, B.A., Liu, J. & Drmanac, R. Co-barcoded sequence reads from long DNA fragments: a cost-effective solution for “perfect genome” sequencing. Front. Genet. 5, 466 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Lan, F., Haliburton, J.R., Yuan, A. & Abate, A.R. Droplet barcoding for massively parallel single-molecule deep sequencing. Nat. Commun. 7, 11784 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Loman, N.J., Quick, J. & Simpson, J.T. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods 12, 733–735 (2015).

    Article  CAS  PubMed  Google Scholar 

  30. Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Cusanovich, D.A. et al. Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348, 910–914 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Xie, M., Wang, J. & Jiang, T. A fast and accurate algorithm for single individual haplotyping. BMC Syst. Biol. 6 (Suppl. 2), S8 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Zong, C., Lu, S., Chapman, A.R. & Xie, X.S. Genome-wide detection of single-nucleotide and copy-number variations of a single human cell. Science 338, 1622–1626 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).

    Article  CAS  PubMed  Google Scholar 

  35. Eberle, M.A. et al. A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 27, 157–164 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Adey, A. et al. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res. 24, 2041–2049 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Adey, A. et al. The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line. Nature 500, 207–211 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Steemers, F.J. et al. Whole-genome genotyping with the single-base extension assay. Nat. Methods 3, 31–33 (2006).

    Article  CAS  PubMed  Google Scholar 

  39. Furka, A., Sebestyén, F., Asgedom, M. & Dibó, G. General method for rapid synthesis of multicomponent peptide mixtures. Int. J. Pept. Protein Res. 37, 487–493 (1991).

    Article  CAS  PubMed  Google Scholar 

  40. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed  PubMed Central  Google Scholar 

  41. Sos, B.C. et al. Characterization of chromatin accessibility with a transposome hypersensitive sites sequencing (THS-seq) assay. Genome Biol. 17, 20 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Ason, B. & Reznikoff, W.S. DNA sequence bias during Tn5 transposition. J. Mol. Biol. 335, 1213–1225 (2004).

    Article  CAS  PubMed  Google Scholar 

  43. Lebl, M. et al. Automatic oligonucleotide synthesizer utilizing the concept of parallel processing. Collect. Symp. Ser. 12, 264–267 (2011).

    Article  CAS  Google Scholar 

  44. Kremsky, J.N. et al. Immobilization of DNA via oligonucleotides containing an aldehyde or carboxylic acid group at the 5′ terminus. Nucleic Acids Res. 15, 2891–2909 (1987).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Steinberg, G., Stromsborg, K., Thomas, L., Barker, D. & Zhao, C. Strategies for covalent attachment of DNA to beads. Biopolymers 73, 597–605 (2004).

    Article  CAS  PubMed  Google Scholar 

  46. Raczy, C. et al. Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics 29, 2041–2043 (2013).

    Article  CAS  PubMed  Google Scholar 

  47. Tarasov, A., Vilella, A.J., Cuppen, E., Nijman, I.J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We would like to thank the research, development, and software engineering departments at Illumina for sequencing technology development. Specifically, we would like to thank C.L. Pan at Illumina for the sequencing technology development and G. Bean, J. Leng, and S. Swamy at Illumina for data analysis software development. We would like to thank R. Daza for providing HeLa and NA12878 Gentra DNA preparations. The genome sequence described in this paper was derived from a HeLa cell line. Henrietta Lacks, and the HeLa cell line that was established from her tumor cells in 1951, have made significant contributions to scientific progress and advances in human health. We are grateful to Henrietta Lacks, now deceased, and to her surviving family members for their contributions to biomedical research.

Author information

Authors and Affiliations

Authors

Contributions

F.J.S., K.L.G., and N.G. conceived the study; F.J.S. and M.C.R. oversaw the technology development. F.Z. and D.P. led the assay development, L.C., J.T., S.J.N., and D.P. performed the experiments, and analyzed the data. F.Z. performed the phasing analysis and wrote custom analysis software. A.G., E.J., R.J., and N.M. helped with the assay development. Y.Z., M.W., and E.W. prepared the bead pool. A.H. developed the data analysis pipeline. M.R. led the project coordination. F.J.S., F.Z., L.C., K.L.G., D.P., and J.S. co-wrote the paper. All authors contributed to the revision and review of the manuscript.

Corresponding author

Correspondence to Frank J Steemers.

Ethics declarations

Competing interests

The authors declare competing financial interests in the form of stock ownership, patents, or employment through Illumina, Inc.

Integrated supplementary information

Supplementary Figure 1 Sequencing overview.

(a) Example sequencing plot for the intensity versus cycle (IVC) of the library from the hybrid method. (b) Example plot for the IVC plot with the single-tube method. The first 15 cycles of read 1 and read 2 show the typical Tn5 insertion bias42.

Supplementary Figure 2 Typical example of a BioanalyzerTM library size distribution.

Supplementary Figure 3 Island structure visualization.

An 868 kb genomic region is shown in the top panel for the reads sharing the same barcode. The reads are grouped into two clusters called island, which originate from two long DNA molecules interacting with beads having the barcode 1. The zoomed in snapshot shows the reads distribution within island2 with a total genomic coverage of around 25%.

Supplementary Figure 4 Proximal versus distal read distribution from the hybrid method using GentraTM DNA.

Reads with the same barcode are aligned to the genome. The genomic distance between adjacent reads sharing the same barcode shows a bimodal distribution. The short distances between the adjacent reads within the island structure contribute to the proximal peak centered around 2 kb. On average, there are about 500-1000 long DNA molecules transposed by multiple beads with the same barcode. These long DNA molecules are randomly distributed across the genome, generating a distal peak for the distance between adjacent islands around 3Gbp/(500 or 1000) ~3-6Mbp.

Supplementary Figure 5 The workflow for phasing and structural variant analysis.

The Input unphased VCF files of the genomes used in this study were obtained from a previous publication. Raw sequencing reads were demultiplexed by their unique barcode and partitioned into individual fastq files. Reads for each partition were then aligned to the reference genome and mapped to discrete islands. The islands that covered at least two heterozygous variants were retained for phasing. Islands from all partitions were subsequently combined. The genome then was split into partitions such that no informative islands (islands covering more than one heterozygous SNP) overlapped with the partition boundaries. For each partition, we used H-BOP to generate the phasing blocks. The phased SNPs, combined with island structure generated before, were used as input for the structural variant analysis.

Supplementary Figure 6 Phasing yield and accuracy.

Phasing yield (a) and accuracy (b) for GentraTM prep NA12878 using the hybrid method. Phasing yield (c) and accuracy (d) for freshly prepared NA12878 using the single-tube method. The inset in the panel is a zoomed in representation of the phasing yield/accuracy from 0-50 kb.

Supplementary Figure 7 Comparison of island lengths between GentraTM prep and freshly prepared DNA.

Same number of reads are randomly subsampled from the original data to generate the histograms.

Supplementary Figure 8 The histogram of the number of reads per barcode for the 150,000 combinatorial bead pool using the single-tube method.

Supplementary Figure 9 Detection of heterozygous deletion using linked reads. Each color represents a distinct barcode and the short vertical bar represents a mapped read.

Supplementary Figure 10 Confirmation of heterozygous deletion detection using trio analysis.

The sequencing depth for NA12878, NA12891, and NA12892 were normalized around the deletion locations. One of the parents showed lower depth of sequencing in the same region indicative of a heterozygous deletion.

Supplementary Figure 11 Interchromosomal translocation detection for the Hela genome. 17 out of total 20 interchromosomal translocations previously reported37 show strong signal in the barcode sharing heat map.

Supplementary Figure 12 Haplotyping accuracy at the island level.

The analysis was applied to 5000 randomly selected barcodes. The number of islands versus the hetSNPs per islands follows the expected exponential decay as larger islands are underrepresented (blue bars). Edit errors per number of hetSNPs in an island are depicted as red diamonds. Although not applied in this paper, conflicting islands can be filtered out before input into the H-BOP.

Supplementary Figure 13 Comparison of genomic coverage uniformity for Zheng et al8, single-tube method, hybrid method, and Truseq PCR-free.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–13 (PDF 1672 kb)

Supplementary Table 1

The number of reads with different barcodes in intra- vs inter- beads transposition experiment (XLSX 7 kb)

Supplementary Table 2

The DNA sequences of the indexed transposons, the oligos on beads, the sequencing primers and the primers for deletion detection. (XLSX 41 kb)

Supplementary Table 3

Comparison of coverage and SNP/INDEL recall/precision among different approaches. (XLSX 10 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, F., Christiansen, L., Thomas, J. et al. Haplotype phasing of whole human genomes using bead-based barcode partitioning in a single tube. Nat Biotechnol 35, 852–857 (2017). https://doi.org/10.1038/nbt.3897

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.3897

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research