Haplotype phasing of whole human genomes using bead-based barcode partitioning in a single tube

Zhang, Fan; Christiansen, Lena; Thomas, Jerushah; Pokholok, Dmitry; Jackson, Ros; Morrell, Natalie; Zhao, Yannan; Wiley, Melissa; Welch, Emily; Jaeger, Erich; Granat, Ana; Norberg, Steven J; Halpern, Aaron; C Rogert, Maria; Ronaghi, Mostafa; Shendure, Jay; Gormley, Niall; Gunderson, Kevin L; Steemers, Frank J

doi:10.1038/nbt.3897

Letter
Published: 26 June 2017

Haplotype phasing of whole human genomes using bead-based barcode partitioning in a single tube

Fan Zhang¹^na1,
Lena Christiansen¹^na1,
Jerushah Thomas¹,
Dmitry Pokholok¹,
Ros Jackson²,
Natalie Morrell²,
Yannan Zhao³,
Melissa Wiley³,
Emily Welch³,
Erich Jaeger⁴,
Ana Granat⁴,
Steven J Norberg¹,
Aaron Halpern⁴,
Maria C Rogert ORCID: orcid.org/0000-0002-2977-1517³,
Mostafa Ronaghi¹,
Jay Shendure⁵,
Niall Gormley²,
Kevin L Gunderson⁶ &
…
Frank J Steemers¹

Nature Biotechnology volume 35, pages 852–857 (2017)Cite this article

9072 Accesses
27 Citations
61 Altmetric
Metrics details

Subjects

Abstract

Haplotype-resolved genome sequencing promises to unlock a wealth of information in population and medical genetics. However, for the vast majority of genomes sequenced to date, haplotypes have not been determined because of cumbersome haplotyping workflows that require fractions of the genome to be sequenced in a large number of compartments. Here we demonstrate barcode partitioning of long DNA molecules in a single compartment using “on-bead” barcoded tagmentation. The key to the method that we call “contiguity preserving transposition” sequencing on beads (CPTv2-seq) is transposon-mediated transfer of homogenous populations of barcodes from beads to individual long DNA molecules that get fragmented at the same time (tagmentation). These are then processed to sequencing libraries wherein all sequencing reads originating from each long DNA molecule share a common barcode. Single-tube, bulk processing of long DNA molecules with ∼150,000 different barcoded bead types provides a barcode-linked read structure that reveals long-range molecular contiguity. This technology provides a simple, rapid, plate-scalable and automatable route to accurate, haplotype-resolved sequencing, and phasing of structural variants of the genome.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Summary of the bead-based indexing workflow and the intra- vs.**

**Figure 2: Detection of deletion and interchromosomal translocation using linked read information.**

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Saori Sakaue, Kathryn Weinand, … Soumya Raychaudhuri

A single-cell atlas enables mapping of homeostatic cellular shifts in the adult human breast

Article Open access 28 March 2024

Austin D. Reed, Sara Pensa, … Walid T. Khaled

scGHOST: identifying single-cell 3D genome subcompartments

Article 08 April 2024

Kyle Xiong, Ruochi Zhang & Jian Ma

References

Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
Article CAS PubMed PubMed Central Google Scholar
Snyder, M.W., Adey, A., Kitzman, J.O. & Shendure, J. Haplotype-resolved genome sequencing: experimental methods and applications. Nat. Rev. Genet. 16, 344–358 (2015).
Article CAS PubMed Google Scholar
Bansal, V., Tewhey, R., Topol, E.J. & Schork, N.J. The next phase in human genetics. Nat. Biotechnol. 29, 38–39 (2011).
Article CAS PubMed Google Scholar
Browning, S.R. & Browning, B.L. Haplotype phasing: existing methods and new developments. Nat. Rev. Genet. 12, 703–714 (2011).
Article CAS PubMed PubMed Central Google Scholar
Ripke, S. et al. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat. Genet. 45, 1150–1159 (2013).
Article CAS PubMed PubMed Central Google Scholar
Nalls, M.A. et al. Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson's disease. Nat. Genet. 46, 989–993 (2014).
Article CAS PubMed PubMed Central Google Scholar
Peters, B.A. et al. Detection and phasing of single base de novo mutations in biopsies from human in vitro fertilized embryos by advanced whole-genome sequencing. Genome Res. 25, 426–434 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zheng, G.X.Y. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34, 303–311 (2016).
Article CAS PubMed PubMed Central Google Scholar
Feuk, L., Carson, A.R. & Scherer, S.W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).
Article CAS PubMed Google Scholar
Moncunill, V. et al. Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads. Nat. Biotechnol. 32, 1106–1112 (2014).
Article CAS PubMed Google Scholar
Medvedev, P., Stanciu, M. & Brudno, M. Computational methods for discovering structural variation with next-generation sequencing. Nat. Methods 6 (Suppl. 1), S13–S20 (2009).
Article CAS PubMed Google Scholar
Fan, H.C., Wang, J., Potanina, A. & Quake, S.R. Whole-genome molecular haplotyping of single cells. Nat. Biotechnol. 29, 51–57 (2011).
Article CAS PubMed Google Scholar
Kaper, F. et al. Whole-genome haplotyping by dilution, amplification, and sequencing. Proc. Natl. Acad. Sci. USA 110, 5552–5557 (2013).
Article CAS PubMed PubMed Central Google Scholar
Kitzman, J.O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).
Article CAS PubMed Google Scholar
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
Article CAS PubMed PubMed Central Google Scholar
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Article CAS PubMed Google Scholar
Putnam, N.H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kuleshov, V. et al. Whole-genome haplotyping using long reads and statistical methods. Nat. Biotechnol. 32, 261–266 (2014).
Article CAS PubMed PubMed Central Google Scholar
Cao, H. et al. De novo assembly of a haplotype-resolved human genome. Nat. Biotechnol. 33, 617–622 (2015).
Article CAS PubMed Google Scholar
Peters, B.A. et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 487, 190–195 (2012).
Article CAS PubMed PubMed Central Google Scholar
Burton, J.N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
Article CAS PubMed PubMed Central Google Scholar
Sović, I., Križanović, K., Skala, K. & Šikić, M. Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads. Bioinformatics 32, 2582–2589 (2016).
Article CAS PubMed Google Scholar
Goodwin, S. et al. Oxford Nanopore Sequencing and de novo assembly of a eukaryotic genome. Preprint at bioRxiv https://doi.org/10.1101/013490 (2015).
Dear, P.H. & Cook, P.R. Happy mapping: a proposal for linkage mapping the human genome. Nucleic Acids Res. 17, 6795–6807 (1989).
Article CAS PubMed PubMed Central Google Scholar
Amini, S. et al. Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet. 46, 1343–1349 (2014).
Article CAS PubMed PubMed Central Google Scholar
Peters, B.A., Liu, J. & Drmanac, R. Co-barcoded sequence reads from long DNA fragments: a cost-effective solution for “perfect genome” sequencing. Front. Genet. 5, 466 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lan, F., Haliburton, J.R., Yuan, A. & Abate, A.R. Droplet barcoding for massively parallel single-molecule deep sequencing. Nat. Commun. 7, 11784 (2016).
Article CAS PubMed PubMed Central Google Scholar
Loman, N.J., Quick, J. & Simpson, J.T. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods 12, 733–735 (2015).
Article CAS PubMed Google Scholar
Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).
Article CAS PubMed PubMed Central Google Scholar
Cusanovich, D.A. et al. Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348, 910–914 (2015).
Article CAS PubMed PubMed Central Google Scholar
Xie, M., Wang, J. & Jiang, T. A fast and accurate algorithm for single individual haplotyping. BMC Syst. Biol. 6 (Suppl. 2), S8 (2012).
Article PubMed PubMed Central Google Scholar
Zong, C., Lu, S., Chapman, A.R. & Xie, X.S. Genome-wide detection of single-nucleotide and copy-number variations of a single human cell. Science 338, 1622–1626 (2012).
Article CAS PubMed PubMed Central Google Scholar
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
Article CAS PubMed Google Scholar
Eberle, M.A. et al. A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 27, 157–164 (2017).
Article CAS PubMed PubMed Central Google Scholar
Adey, A. et al. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res. 24, 2041–2049 (2014).
Article CAS PubMed PubMed Central Google Scholar
Adey, A. et al. The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line. Nature 500, 207–211 (2013).
Article CAS PubMed PubMed Central Google Scholar
Steemers, F.J. et al. Whole-genome genotyping with the single-base extension assay. Nat. Methods 3, 31–33 (2006).
Article CAS PubMed Google Scholar
Furka, A., Sebestyén, F., Asgedom, M. & Dibó, G. General method for rapid synthesis of multicomponent peptide mixtures. Int. J. Pept. Protein Res. 37, 487–493 (1991).
Article CAS PubMed Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
PubMed PubMed Central Google Scholar
Sos, B.C. et al. Characterization of chromatin accessibility with a transposome hypersensitive sites sequencing (THS-seq) assay. Genome Biol. 17, 20 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ason, B. & Reznikoff, W.S. DNA sequence bias during Tn5 transposition. J. Mol. Biol. 335, 1213–1225 (2004).
Article CAS PubMed Google Scholar
Lebl, M. et al. Automatic oligonucleotide synthesizer utilizing the concept of parallel processing. Collect. Symp. Ser. 12, 264–267 (2011).
Article CAS Google Scholar
Kremsky, J.N. et al. Immobilization of DNA via oligonucleotides containing an aldehyde or carboxylic acid group at the 5′ terminus. Nucleic Acids Res. 15, 2891–2909 (1987).
Article CAS PubMed PubMed Central Google Scholar
Steinberg, G., Stromsborg, K., Thomas, L., Barker, D. & Zhao, C. Strategies for covalent attachment of DNA to beads. Biopolymers 73, 597–605 (2004).
Article CAS PubMed Google Scholar
Raczy, C. et al. Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics 29, 2041–2043 (2013).
Article CAS PubMed Google Scholar
Tarasov, A., Vilella, A.J., Cuppen, E., Nijman, I.J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We would like to thank the research, development, and software engineering departments at Illumina for sequencing technology development. Specifically, we would like to thank C.L. Pan at Illumina for the sequencing technology development and G. Bean, J. Leng, and S. Swamy at Illumina for data analysis software development. We would like to thank R. Daza for providing HeLa and NA12878 Gentra DNA preparations. The genome sequence described in this paper was derived from a HeLa cell line. Henrietta Lacks, and the HeLa cell line that was established from her tumor cells in 1951, have made significant contributions to scientific progress and advances in human health. We are grateful to Henrietta Lacks, now deceased, and to her surviving family members for their contributions to biomedical research.

Author information

Fan Zhang and Lena Christiansen: These authors contributed equally to this work.

Authors and Affiliations

Advanced Research Department, Illumina, San Diego, California, USA.,
Fan Zhang, Lena Christiansen, Jerushah Thomas, Dmitry Pokholok, Steven J Norberg, Mostafa Ronaghi & Frank J Steemers
Technology Development Department, Illumina, Little Chesterford, Essex, UK.,
Ros Jackson, Natalie Morrell & Niall Gormley
Technology Development, Illumina, San Diego, California, USA.,
Yannan Zhao, Melissa Wiley, Emily Welch & Maria C Rogert
Gene Expression Department, Illumina, San Francisco, California, USA.,
Erich Jaeger, Ana Granat & Aaron Halpern
Department of Genome Sciences, University of Washington, Seattle, Washington, USA
Jay Shendure
Encodia, Inc., San Diego, California, USA
Kevin L Gunderson

Authors

Fan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lena Christiansen
View author publications
You can also search for this author in PubMed Google Scholar
Jerushah Thomas
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Pokholok
View author publications
You can also search for this author in PubMed Google Scholar
Ros Jackson
View author publications
You can also search for this author in PubMed Google Scholar
Natalie Morrell
View author publications
You can also search for this author in PubMed Google Scholar
Yannan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Melissa Wiley
View author publications
You can also search for this author in PubMed Google Scholar
Emily Welch
View author publications
You can also search for this author in PubMed Google Scholar
Erich Jaeger
View author publications
You can also search for this author in PubMed Google Scholar
Ana Granat
View author publications
You can also search for this author in PubMed Google Scholar
Steven J Norberg
View author publications
You can also search for this author in PubMed Google Scholar
Aaron Halpern
View author publications
You can also search for this author in PubMed Google Scholar
Maria C Rogert
View author publications
You can also search for this author in PubMed Google Scholar
Mostafa Ronaghi
View author publications
You can also search for this author in PubMed Google Scholar
Jay Shendure
View author publications
You can also search for this author in PubMed Google Scholar
Niall Gormley
View author publications
You can also search for this author in PubMed Google Scholar
Kevin L Gunderson
View author publications
You can also search for this author in PubMed Google Scholar
Frank J Steemers
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.J.S., K.L.G., and N.G. conceived the study; F.J.S. and M.C.R. oversaw the technology development. F.Z. and D.P. led the assay development, L.C., J.T., S.J.N., and D.P. performed the experiments, and analyzed the data. F.Z. performed the phasing analysis and wrote custom analysis software. A.G., E.J., R.J., and N.M. helped with the assay development. Y.Z., M.W., and E.W. prepared the bead pool. A.H. developed the data analysis pipeline. M.R. led the project coordination. F.J.S., F.Z., L.C., K.L.G., D.P., and J.S. co-wrote the paper. All authors contributed to the revision and review of the manuscript.

Corresponding author

Correspondence to Frank J Steemers.

Ethics declarations

Competing interests

The authors declare competing financial interests in the form of stock ownership, patents, or employment through Illumina, Inc.

Integrated supplementary information

Supplementary Figure 1 Sequencing overview.

(a) Example sequencing plot for the intensity versus cycle (IVC) of the library from the hybrid method. (b) Example plot for the IVC plot with the single-tube method. The first 15 cycles of read 1 and read 2 show the typical Tn5 insertion bias⁴².

Supplementary Figure 2 Typical example of a Bioanalyzer^TM library size distribution.

Supplementary Figure 3 Island structure visualization.

An 868 kb genomic region is shown in the top panel for the reads sharing the same barcode. The reads are grouped into two clusters called island, which originate from two long DNA molecules interacting with beads having the barcode 1. The zoomed in snapshot shows the reads distribution within island2 with a total genomic coverage of around 25%.

Supplementary Figure 4 Proximal versus distal read distribution from the hybrid method using Gentra^TM DNA.

Reads with the same barcode are aligned to the genome. The genomic distance between adjacent reads sharing the same barcode shows a bimodal distribution. The short distances between the adjacent reads within the island structure contribute to the proximal peak centered around 2 kb. On average, there are about 500-1000 long DNA molecules transposed by multiple beads with the same barcode. These long DNA molecules are randomly distributed across the genome, generating a distal peak for the distance between adjacent islands around 3Gbp/(500 or 1000) ~3-6Mbp.

Supplementary Figure 5 The workflow for phasing and structural variant analysis.

The Input unphased VCF files of the genomes used in this study were obtained from a previous publication. Raw sequencing reads were demultiplexed by their unique barcode and partitioned into individual fastq files. Reads for each partition were then aligned to the reference genome and mapped to discrete islands. The islands that covered at least two heterozygous variants were retained for phasing. Islands from all partitions were subsequently combined. The genome then was split into partitions such that no informative islands (islands covering more than one heterozygous SNP) overlapped with the partition boundaries. For each partition, we used H-BOP to generate the phasing blocks. The phased SNPs, combined with island structure generated before, were used as input for the structural variant analysis.

Supplementary Figure 6 Phasing yield and accuracy.

Phasing yield (a) and accuracy (b) for Gentra^TM prep NA12878 using the hybrid method. Phasing yield (c) and accuracy (d) for freshly prepared NA12878 using the single-tube method. The inset in the panel is a zoomed in representation of the phasing yield/accuracy from 0-50 kb.

Supplementary Figure 7 Comparison of island lengths between Gentra^TM prep and freshly prepared DNA.

Same number of reads are randomly subsampled from the original data to generate the histograms.

Supplementary Figure 8 The histogram of the number of reads per barcode for the 150,000 combinatorial bead pool using the single-tube method.

Supplementary Figure 9 Detection of heterozygous deletion using linked reads. Each color represents a distinct barcode and the short vertical bar represents a mapped read.

Supplementary Figure 10 Confirmation of heterozygous deletion detection using trio analysis.

The sequencing depth for NA12878, NA12891, and NA12892 were normalized around the deletion locations. One of the parents showed lower depth of sequencing in the same region indicative of a heterozygous deletion.

Supplementary Figure 11 Interchromosomal translocation detection for the Hela genome. 17 out of total 20 interchromosomal translocations previously reported³⁷ show strong signal in the barcode sharing heat map.

Supplementary Figure 12 Haplotyping accuracy at the island level.

The analysis was applied to 5000 randomly selected barcodes. The number of islands versus the hetSNPs per islands follows the expected exponential decay as larger islands are underrepresented (blue bars). Edit errors per number of hetSNPs in an island are depicted as red diamonds. Although not applied in this paper, conflicting islands can be filtered out before input into the H-BOP.

Supplementary Figure 13 Comparison of genomic coverage uniformity for Zheng et al⁸, single-tube method, hybrid method, and Truseq PCR-free.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–13 (PDF 1672 kb)

Supplementary Table 1

The number of reads with different barcodes in intra- vs inter- beads transposition experiment (XLSX 7 kb)

Supplementary Table 2

The DNA sequences of the indexed transposons, the oligos on beads, the sequencing primers and the primers for deletion detection. (XLSX 41 kb)

Supplementary Table 3

Comparison of coverage and SNP/INDEL recall/precision among different approaches. (XLSX 10 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, F., Christiansen, L., Thomas, J. et al. Haplotype phasing of whole human genomes using bead-based barcode partitioning in a single tube. Nat Biotechnol 35, 852–857 (2017). https://doi.org/10.1038/nbt.3897

Download citation

Received: 14 October 2016
Accepted: 10 May 2017
Published: 26 June 2017
Issue Date: September 2017
DOI: https://doi.org/10.1038/nbt.3897

This article is cited by

Targeted phasing of 2–200 kilobase DNA fragments with a short-read sequencer and a single-tube linked-read library method
- Veronika Mikhaylova
- Madison Rzepka
- Zhoutao Chen
Scientific Reports (2024)
Rapid and sensitive single-cell RNA sequencing with SHERRY2
- Lin Di
- Bo Liu
- Yanyi Huang
BMC Biology (2022)
Features and applications of haplotypes in crop breeding
- Javaid Akhter Bhat
- Deyue Yu
- Rajeev K. Varshney
Communications Biology (2021)
Modular barcode beads for microfluidic single cell genomics
- Cyrille L. Delley
- Adam R. Abate
Scientific Reports (2021)
Long-read human genome sequencing and its applications
- Glennis A. Logsdon
- Mitchell R. Vollger
- Evan E. Eichler
Nature Reviews Genetics (2020)