Abstract
Haplotype-resolved genome sequencing promises to unlock a wealth of information in population and medical genetics. However, for the vast majority of genomes sequenced to date, haplotypes have not been determined because of cumbersome haplotyping workflows that require fractions of the genome to be sequenced in a large number of compartments. Here we demonstrate barcode partitioning of long DNA molecules in a single compartment using “on-bead” barcoded tagmentation. The key to the method that we call “contiguity preserving transposition” sequencing on beads (CPTv2-seq) is transposon-mediated transfer of homogenous populations of barcodes from beads to individual long DNA molecules that get fragmented at the same time (tagmentation). These are then processed to sequencing libraries wherein all sequencing reads originating from each long DNA molecule share a common barcode. Single-tube, bulk processing of long DNA molecules with ∼150,000 different barcoded bead types provides a barcode-linked read structure that reveals long-range molecular contiguity. This technology provides a simple, rapid, plate-scalable and automatable route to accurate, haplotype-resolved sequencing, and phasing of structural variants of the genome.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
Snyder, M.W., Adey, A., Kitzman, J.O. & Shendure, J. Haplotype-resolved genome sequencing: experimental methods and applications. Nat. Rev. Genet. 16, 344–358 (2015).
Bansal, V., Tewhey, R., Topol, E.J. & Schork, N.J. The next phase in human genetics. Nat. Biotechnol. 29, 38–39 (2011).
Browning, S.R. & Browning, B.L. Haplotype phasing: existing methods and new developments. Nat. Rev. Genet. 12, 703–714 (2011).
Ripke, S. et al. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat. Genet. 45, 1150–1159 (2013).
Nalls, M.A. et al. Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson's disease. Nat. Genet. 46, 989–993 (2014).
Peters, B.A. et al. Detection and phasing of single base de novo mutations in biopsies from human in vitro fertilized embryos by advanced whole-genome sequencing. Genome Res. 25, 426–434 (2015).
Zheng, G.X.Y. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34, 303–311 (2016).
Feuk, L., Carson, A.R. & Scherer, S.W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).
Moncunill, V. et al. Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads. Nat. Biotechnol. 32, 1106–1112 (2014).
Medvedev, P., Stanciu, M. & Brudno, M. Computational methods for discovering structural variation with next-generation sequencing. Nat. Methods 6 (Suppl. 1), S13–S20 (2009).
Fan, H.C., Wang, J., Potanina, A. & Quake, S.R. Whole-genome molecular haplotyping of single cells. Nat. Biotechnol. 29, 51–57 (2011).
Kaper, F. et al. Whole-genome haplotyping by dilution, amplification, and sequencing. Proc. Natl. Acad. Sci. USA 110, 5552–5557 (2013).
Kitzman, J.O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Putnam, N.H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350 (2016).
Kuleshov, V. et al. Whole-genome haplotyping using long reads and statistical methods. Nat. Biotechnol. 32, 261–266 (2014).
Cao, H. et al. De novo assembly of a haplotype-resolved human genome. Nat. Biotechnol. 33, 617–622 (2015).
Peters, B.A. et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 487, 190–195 (2012).
Burton, J.N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
Sović, I., Križanović, K., Skala, K. & Šikić, M. Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads. Bioinformatics 32, 2582–2589 (2016).
Goodwin, S. et al. Oxford Nanopore Sequencing and de novo assembly of a eukaryotic genome. Preprint at bioRxiv https://doi.org/10.1101/013490 (2015).
Dear, P.H. & Cook, P.R. Happy mapping: a proposal for linkage mapping the human genome. Nucleic Acids Res. 17, 6795–6807 (1989).
Amini, S. et al. Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet. 46, 1343–1349 (2014).
Peters, B.A., Liu, J. & Drmanac, R. Co-barcoded sequence reads from long DNA fragments: a cost-effective solution for “perfect genome” sequencing. Front. Genet. 5, 466 (2015).
Lan, F., Haliburton, J.R., Yuan, A. & Abate, A.R. Droplet barcoding for massively parallel single-molecule deep sequencing. Nat. Commun. 7, 11784 (2016).
Loman, N.J., Quick, J. & Simpson, J.T. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods 12, 733–735 (2015).
Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).
Cusanovich, D.A. et al. Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348, 910–914 (2015).
Xie, M., Wang, J. & Jiang, T. A fast and accurate algorithm for single individual haplotyping. BMC Syst. Biol. 6 (Suppl. 2), S8 (2012).
Zong, C., Lu, S., Chapman, A.R. & Xie, X.S. Genome-wide detection of single-nucleotide and copy-number variations of a single human cell. Science 338, 1622–1626 (2012).
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
Eberle, M.A. et al. A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 27, 157–164 (2017).
Adey, A. et al. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res. 24, 2041–2049 (2014).
Adey, A. et al. The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line. Nature 500, 207–211 (2013).
Steemers, F.J. et al. Whole-genome genotyping with the single-base extension assay. Nat. Methods 3, 31–33 (2006).
Furka, A., Sebestyén, F., Asgedom, M. & Dibó, G. General method for rapid synthesis of multicomponent peptide mixtures. Int. J. Pept. Protein Res. 37, 487–493 (1991).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Sos, B.C. et al. Characterization of chromatin accessibility with a transposome hypersensitive sites sequencing (THS-seq) assay. Genome Biol. 17, 20 (2016).
Ason, B. & Reznikoff, W.S. DNA sequence bias during Tn5 transposition. J. Mol. Biol. 335, 1213–1225 (2004).
Lebl, M. et al. Automatic oligonucleotide synthesizer utilizing the concept of parallel processing. Collect. Symp. Ser. 12, 264–267 (2011).
Kremsky, J.N. et al. Immobilization of DNA via oligonucleotides containing an aldehyde or carboxylic acid group at the 5′ terminus. Nucleic Acids Res. 15, 2891–2909 (1987).
Steinberg, G., Stromsborg, K., Thomas, L., Barker, D. & Zhao, C. Strategies for covalent attachment of DNA to beads. Biopolymers 73, 597–605 (2004).
Raczy, C. et al. Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics 29, 2041–2043 (2013).
Tarasov, A., Vilella, A.J., Cuppen, E., Nijman, I.J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).
Acknowledgements
We would like to thank the research, development, and software engineering departments at Illumina for sequencing technology development. Specifically, we would like to thank C.L. Pan at Illumina for the sequencing technology development and G. Bean, J. Leng, and S. Swamy at Illumina for data analysis software development. We would like to thank R. Daza for providing HeLa and NA12878 Gentra DNA preparations. The genome sequence described in this paper was derived from a HeLa cell line. Henrietta Lacks, and the HeLa cell line that was established from her tumor cells in 1951, have made significant contributions to scientific progress and advances in human health. We are grateful to Henrietta Lacks, now deceased, and to her surviving family members for their contributions to biomedical research.
Author information
Authors and Affiliations
Contributions
F.J.S., K.L.G., and N.G. conceived the study; F.J.S. and M.C.R. oversaw the technology development. F.Z. and D.P. led the assay development, L.C., J.T., S.J.N., and D.P. performed the experiments, and analyzed the data. F.Z. performed the phasing analysis and wrote custom analysis software. A.G., E.J., R.J., and N.M. helped with the assay development. Y.Z., M.W., and E.W. prepared the bead pool. A.H. developed the data analysis pipeline. M.R. led the project coordination. F.J.S., F.Z., L.C., K.L.G., D.P., and J.S. co-wrote the paper. All authors contributed to the revision and review of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare competing financial interests in the form of stock ownership, patents, or employment through Illumina, Inc.
Integrated supplementary information
Supplementary Figure 1 Sequencing overview.
(a) Example sequencing plot for the intensity versus cycle (IVC) of the library from the hybrid method. (b) Example plot for the IVC plot with the single-tube method. The first 15 cycles of read 1 and read 2 show the typical Tn5 insertion bias42.
Supplementary Figure 3 Island structure visualization.
An 868 kb genomic region is shown in the top panel for the reads sharing the same barcode. The reads are grouped into two clusters called island, which originate from two long DNA molecules interacting with beads having the barcode 1. The zoomed in snapshot shows the reads distribution within island2 with a total genomic coverage of around 25%.
Supplementary Figure 4 Proximal versus distal read distribution from the hybrid method using GentraTM DNA.
Reads with the same barcode are aligned to the genome. The genomic distance between adjacent reads sharing the same barcode shows a bimodal distribution. The short distances between the adjacent reads within the island structure contribute to the proximal peak centered around 2 kb. On average, there are about 500-1000 long DNA molecules transposed by multiple beads with the same barcode. These long DNA molecules are randomly distributed across the genome, generating a distal peak for the distance between adjacent islands around 3Gbp/(500 or 1000) ~3-6Mbp.
Supplementary Figure 5 The workflow for phasing and structural variant analysis.
The Input unphased VCF files of the genomes used in this study were obtained from a previous publication. Raw sequencing reads were demultiplexed by their unique barcode and partitioned into individual fastq files. Reads for each partition were then aligned to the reference genome and mapped to discrete islands. The islands that covered at least two heterozygous variants were retained for phasing. Islands from all partitions were subsequently combined. The genome then was split into partitions such that no informative islands (islands covering more than one heterozygous SNP) overlapped with the partition boundaries. For each partition, we used H-BOP to generate the phasing blocks. The phased SNPs, combined with island structure generated before, were used as input for the structural variant analysis.
Supplementary Figure 6 Phasing yield and accuracy.
Phasing yield (a) and accuracy (b) for GentraTM prep NA12878 using the hybrid method. Phasing yield (c) and accuracy (d) for freshly prepared NA12878 using the single-tube method. The inset in the panel is a zoomed in representation of the phasing yield/accuracy from 0-50 kb.
Supplementary Figure 7 Comparison of island lengths between GentraTM prep and freshly prepared DNA.
Same number of reads are randomly subsampled from the original data to generate the histograms.
Supplementary Figure 10 Confirmation of heterozygous deletion detection using trio analysis.
The sequencing depth for NA12878, NA12891, and NA12892 were normalized around the deletion locations. One of the parents showed lower depth of sequencing in the same region indicative of a heterozygous deletion.
Supplementary Figure 12 Haplotyping accuracy at the island level.
The analysis was applied to 5000 randomly selected barcodes. The number of islands versus the hetSNPs per islands follows the expected exponential decay as larger islands are underrepresented (blue bars). Edit errors per number of hetSNPs in an island are depicted as red diamonds. Although not applied in this paper, conflicting islands can be filtered out before input into the H-BOP.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–13 (PDF 1672 kb)
Supplementary Table 1
The number of reads with different barcodes in intra- vs inter- beads transposition experiment (XLSX 7 kb)
Supplementary Table 2
The DNA sequences of the indexed transposons, the oligos on beads, the sequencing primers and the primers for deletion detection. (XLSX 41 kb)
Supplementary Table 3
Comparison of coverage and SNP/INDEL recall/precision among different approaches. (XLSX 10 kb)
Rights and permissions
About this article
Cite this article
Zhang, F., Christiansen, L., Thomas, J. et al. Haplotype phasing of whole human genomes using bead-based barcode partitioning in a single tube. Nat Biotechnol 35, 852–857 (2017). https://doi.org/10.1038/nbt.3897
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.3897
This article is cited by
-
Targeted phasing of 2–200 kilobase DNA fragments with a short-read sequencer and a single-tube linked-read library method
Scientific Reports (2024)
-
Rapid and sensitive single-cell RNA sequencing with SHERRY2
BMC Biology (2022)
-
Features and applications of haplotypes in crop breeding
Communications Biology (2021)
-
Modular barcode beads for microfluidic single cell genomics
Scientific Reports (2021)
-
Long-read human genome sequencing and its applications
Nature Reviews Genetics (2020)