Haplotyping germline and cancer genomes with high-throughput linked-read sequencing

Zheng, Grace X Y; Lau, Billy T; Schnall-Levin, Michael; Jarosz, Mirna; Bell, John M; Hindson, Christopher M; Kyriazopoulou-Panagiotopoulou, Sofia; Masquelier, Donald A; Merrill, Landon; Terry, Jessica M; Mudivarti, Patrice A; Wyatt, Paul W; Bharadwaj, Rajiv; Makarewicz, Anthony J; Li, Yuan; Belgrader, Phillip; Price, Andrew D; Lowe, Adam J; Marks, Patrick; Vurens, Gerard M; Hardenbol, Paul; Montesclaros, Luz; Luo, Melissa; Greenfield, Lawrence; Wong, Alexander; Birch, David E; Short, Steven W; Bjornson, Keith P; Patel, Pranav; Hopmans, Erik S; Wood, Christina; Kaur, Sukhvinder; Lockwood, Glenn K; Stafford, David; Delaney, Joshua P; Wu, Indira; Ordonez, Heather S; Grimes, Susan M; Greer, Stephanie; Lee, Josephine Y; Belhocine, Kamila; Giorda, Kristina M; Heaton, William H; McDermott, Geoffrey P; Bent, Zachary W; Meschi, Francesca; Kondov, Nikola O; Wilson, Ryan; Bernate, Jorge A; Gauby, Shawn; Kindwall, Alex; Bermejo, Clara; Fehr, Adrian N; Chan, Adrian; Saxonov, Serge; Ness, Kevin D; Hindson, Benjamin J; Ji, Hanlee P

doi:10.1038/nbt.3432

Article
Published: 01 February 2016

Haplotyping germline and cancer genomes with high-throughput linked-read sequencing

Grace X Y Zheng¹^na1,
Billy T Lau²^na1,
Michael Schnall-Levin¹,
Mirna Jarosz¹,
John M Bell²,
Christopher M Hindson¹,
Sofia Kyriazopoulou-Panagiotopoulou¹,
Donald A Masquelier¹,
Landon Merrill¹,
Jessica M Terry¹,
Patrice A Mudivarti¹,
Paul W Wyatt¹,
Rajiv Bharadwaj¹,
Anthony J Makarewicz¹,
Yuan Li¹,
Phillip Belgrader¹,
Andrew D Price¹,
Adam J Lowe¹,
Patrick Marks ORCID: orcid.org/0000-0002-3231-7163¹,
Gerard M Vurens¹,
Paul Hardenbol¹,
Luz Montesclaros¹,
Melissa Luo¹,
Lawrence Greenfield¹,
Alexander Wong¹,
David E Birch¹,
Steven W Short¹,
Keith P Bjornson¹,
Pranav Patel¹,
Erik S Hopmans²,
Christina Wood³,
Sukhvinder Kaur¹,
Glenn K Lockwood¹,
David Stafford¹,
Joshua P Delaney¹,
Indira Wu¹,
Heather S Ordonez¹,
Susan M Grimes²,
Stephanie Greer³,
Josephine Y Lee¹,
Kamila Belhocine¹,
Kristina M Giorda¹,
William H Heaton¹,
Geoffrey P McDermott¹,
Zachary W Bent¹,
Francesca Meschi¹,
Nikola O Kondov¹,
Ryan Wilson¹,
Jorge A Bernate¹,
Shawn Gauby¹,
Alex Kindwall¹,
Clara Bermejo¹,
Adrian N Fehr¹,
Adrian Chan¹,
Serge Saxonov¹,
Kevin D Ness¹,
Benjamin J Hindson¹ &
…
Hanlee P Ji^2,3

Nature Biotechnology volume 34, pages 303–311 (2016)Cite this article

26k Accesses
402 Citations
185 Altmetric
Metrics details

Subjects

Abstract

Haplotyping of human chromosomes is a prerequisite for cataloguing the full repertoire of genetic variation. We present a microfluidics-based, linked-read sequencing technology that can phase and haplotype germline and cancer genomes using nanograms of input DNA. This high-throughput platform prepares barcoded libraries for short-read sequencing and computationally reconstructs long-range haplotype and structural variant information. We generate haplotype blocks in a nuclear trio that are concordant with expected inheritance patterns and phase a set of structural variants. We also resolve the structure of the EML4-ALK gene fusion in the NCI-H2228 cancer cell line using phased exome sequencing. Finally, we assign genetic aberrations to specific megabase-scale haplotypes generated from whole-genome sequencing of a primary colorectal adenocarcinoma. This approach resolves haplotype information using up to 100 times less genomic DNA than some methods and enables the accurate detection of structural variants.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Overview of the technology for generating linked reads.**

**Figure 2: Phasing performance of NA12878 trio analysis.**

**Figure 3: Detecting genomic deletions in NA12878.**

**Figure 4: Rearrangement detection of an *EML4*-*ALK* fusion from exome sequencing of NCI-H2228.**

**Figure 5: Phasing analysis of a primary colon cancer genome and structure of the *TP53* driver event.**

A robust benchmark for detection of germline large deletions and insertions

Article 15 June 2020

Massively parallel enrichment of low-frequency alleles enables duplex sequencing at low depth

Article 17 March 2022

Targeted phasing of 2–200 kilobase DNA fragments with a short-read sequencer and a single-tube linked-read library method

Article Open access 05 April 2024

Accession codes

Primary accessions

Sequence Read Archive

SRP051629

References

Kitzman, J.O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).
Article CAS Google Scholar
Adey, A. et al. The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line. Nature 500, 207–211 (2013).
Article CAS Google Scholar
1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Suk, E.K. et al. A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res. 21, 1672–1685 (2011).
Article CAS Google Scholar
Duitama, J. et al. Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques. Nucleic Acids Res. 40, 2041–2053 (2012).
Article CAS Google Scholar
Peters, B.A. et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 487, 190–195 (2012).
Article CAS Google Scholar
Kaper, F. et al. Whole-genome haplotyping by dilution, amplification, and sequencing. Proc. Natl. Acad. Sci. USA 110, 5552–5557 (2013).
Article CAS Google Scholar
Selvaraj, S., R Dixon, J., Bansal, V. & Ren, B. Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nat. Biotechnol. 31, 1111–1118 (2013).
Article CAS Google Scholar
Amini, S. et al. Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet. 46, 1343–1349 (2014).
Article CAS Google Scholar
Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).
Article CAS Google Scholar
Abate, A.R., Chen, C.H., Agresti, J.J. & Weitz, D.A. Beating Poisson encapsulation statistics using close-packed ordering. Lab Chip 9, 2628–2631 (2009).
Article CAS Google Scholar
Kuleshov, V. et al. Whole-genome haplotyping using long reads and statistical methods. Nat. Biotechnol. 32, 261–266 (2014).
Article CAS Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
Article Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS Google Scholar
Cleary, J.G. et al. Joint variant and de novo mutation identification on pedigrees from high-throughput sequencing data. J. Comput. Biol. 21, 405–419 (2014).
Article CAS Google Scholar
Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).
Article CAS Google Scholar
Layer, R.M., Chiang, C., Quinlan, A.R. & Hall, I.M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
Article Google Scholar
Mills, R.E. et al. 1000 Genomes Project. Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011).
Article CAS Google Scholar
Hopmans, E.S. et al. A programmable method for massively parallel targeted sequencing. Nucleic Acids Res. 42, e88 (2014).
Article CAS Google Scholar
Myllykangas, S., Buenrostro, J.D., Natsoulis, G., Bell, J.M. & Ji, H.P. Efficient targeted resequencing of human germline and cancer genomes by oligonucleotide-selective sequencing. Nat. Biotechnol. 29, 1024–1027 (2011).
Article CAS Google Scholar
Schrider, D.R. et al. Gene copy-number polymorphism caused by retrotransposition in humans. PLoS Genet. 9, e1003242 (2013).
Article CAS Google Scholar
Frampton, G.M. et al. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat. Biotechnol. 31, 1023–1031 (2013).
Article CAS Google Scholar
Lipson, D. et al. Identification of new ALK and RET gene fusions from colorectal and lung cancer biopsies. Nat. Med. 18, 382–384 (2012).
Article CAS Google Scholar
Choi, Y.L. et al. Identification of novel isoforms of the EML4-ALK transforming gene in non-small cell lung cancer. Cancer Res. 68, 4971–4976 (2008).
Article CAS Google Scholar
Koivunen, J.P. et al. EML4-ALK fusion gene and efficacy of an ALK kinase inhibitor in lung cancer. Clin. Cancer Res. 14, 4275–4283 (2008).
Article CAS Google Scholar
Soda, M. et al. Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer. Nature 448, 561–566 (2007).
Article CAS Google Scholar
Jung, Y. et al. Discovery of ALK-PTPN3 gene fusion from human non-small cell lung carcinoma cell line using next-generation RNA sequencing. Genes Chromosom. Cancer 51, 590–597 (2012).
Article CAS Google Scholar
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Article CAS Google Scholar
Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6, 677–681 (2009).
Article CAS Google Scholar
Shen, J.J. & Zhang, N.R. Change-point model on nonhomogeneous Poisson processes with application in copy number profiling by next-generation DNA sequencing. Ann. Appl. Stat. 6, 476–496 (2012).
Article Google Scholar
Fearon, E.R. & Vogelstein, B. A genetic model for colorectal tumorigenesis. Cell 61, 759–767 (1990).
Article CAS Google Scholar
Vogelstein, B. et al. Genetic alterations during colorectal-tumor development. N. Engl. J. Med. 319, 525–532 (1988).
Article CAS Google Scholar
Klein, A.M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
Article CAS Google Scholar
Macosko, E.Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Article CAS Google Scholar
Borgström, E. et al. Phasing of single DNA molecules by massively parallel barcoding. Nat. Commun. 6, 7173 (2015).
Article Google Scholar
de Vree, P.J. et al. Targeted sequencing by proximity ligation for comprehensive variant detection and local haplotyping. Nat. Biotechnol. 32, 1019–1025 (2014).
Article CAS Google Scholar
Regan, J.F. et al. A rapid molecular approach for chromosomal phasing. PLoS ONE 10, e0118270 (2015).
Article Google Scholar
Roach, J.C. et al. Chromosomal haplotypes by genetic phasing of human families. Am. J. Hum. Genet. 89, 382–397 (2011).
Article CAS Google Scholar
Kent, W.J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Article CAS Google Scholar

Download references

Acknowledgements

This work was supported by US National Institutes of Health grants NHGRI P01HG000205 (to B.T.L., E.S.H., S.M.G., J.M.B. and H.P.J.), NCI R33CA174575 (to J.M.B., S. Greer and H.P.J.) and NHGRI R01HG006137 (to H.P.J.). The American Cancer Society provided additional support to S. Greer and H.P.J. (Research Scholar grant, RSG-13-297-01-TBG). H.P.J. also received support from the Doris Duke Clinical Foundation, the Clayville Foundation, the Seiler Foundation and the Howard Hughes Medical Institute.

Author information

Grace X Y Zheng and Billy T Lau: These authors contributed equally to this work.

Authors and Affiliations

10X Genomics, Pleasanton, California, USA
Grace X Y Zheng, Michael Schnall-Levin, Mirna Jarosz, Christopher M Hindson, Sofia Kyriazopoulou-Panagiotopoulou, Donald A Masquelier, Landon Merrill, Jessica M Terry, Patrice A Mudivarti, Paul W Wyatt, Rajiv Bharadwaj, Anthony J Makarewicz, Yuan Li, Phillip Belgrader, Andrew D Price, Adam J Lowe, Patrick Marks, Gerard M Vurens, Paul Hardenbol, Luz Montesclaros, Melissa Luo, Lawrence Greenfield, Alexander Wong, David E Birch, Steven W Short, Keith P Bjornson, Pranav Patel, Sukhvinder Kaur, Glenn K Lockwood, David Stafford, Joshua P Delaney, Indira Wu, Heather S Ordonez, Josephine Y Lee, Kamila Belhocine, Kristina M Giorda, William H Heaton, Geoffrey P McDermott, Zachary W Bent, Francesca Meschi, Nikola O Kondov, Ryan Wilson, Jorge A Bernate, Shawn Gauby, Alex Kindwall, Clara Bermejo, Adrian N Fehr, Adrian Chan, Serge Saxonov, Kevin D Ness & Benjamin J Hindson
Stanford Genome Technology Center, Stanford University, Palo Alto, California, USA
Billy T Lau, John M Bell, Erik S Hopmans, Susan M Grimes & Hanlee P Ji
Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, California, USA
Christina Wood, Stephanie Greer & Hanlee P Ji

Authors

Grace X Y Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Billy T Lau
View author publications
You can also search for this author in PubMed Google Scholar
Michael Schnall-Levin
View author publications
You can also search for this author in PubMed Google Scholar
Mirna Jarosz
View author publications
You can also search for this author in PubMed Google Scholar
John M Bell
View author publications
You can also search for this author in PubMed Google Scholar
Christopher M Hindson
View author publications
You can also search for this author in PubMed Google Scholar
Sofia Kyriazopoulou-Panagiotopoulou
View author publications
You can also search for this author in PubMed Google Scholar
Donald A Masquelier
View author publications
You can also search for this author in PubMed Google Scholar
Landon Merrill
View author publications
You can also search for this author in PubMed Google Scholar
Jessica M Terry
View author publications
You can also search for this author in PubMed Google Scholar
Patrice A Mudivarti
View author publications
You can also search for this author in PubMed Google Scholar
Paul W Wyatt
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Bharadwaj
View author publications
You can also search for this author in PubMed Google Scholar
Anthony J Makarewicz
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Phillip Belgrader
View author publications
You can also search for this author in PubMed Google Scholar
Andrew D Price
View author publications
You can also search for this author in PubMed Google Scholar
Adam J Lowe
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Marks
View author publications
You can also search for this author in PubMed Google Scholar
Gerard M Vurens
View author publications
You can also search for this author in PubMed Google Scholar
Paul Hardenbol
View author publications
You can also search for this author in PubMed Google Scholar
Luz Montesclaros
View author publications
You can also search for this author in PubMed Google Scholar
Melissa Luo
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence Greenfield
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Wong
View author publications
You can also search for this author in PubMed Google Scholar
David E Birch
View author publications
You can also search for this author in PubMed Google Scholar
Steven W Short
View author publications
You can also search for this author in PubMed Google Scholar
Keith P Bjornson
View author publications
You can also search for this author in PubMed Google Scholar
Pranav Patel
View author publications
You can also search for this author in PubMed Google Scholar
Erik S Hopmans
View author publications
You can also search for this author in PubMed Google Scholar
Christina Wood
View author publications
You can also search for this author in PubMed Google Scholar
Sukhvinder Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Glenn K Lockwood
View author publications
You can also search for this author in PubMed Google Scholar
David Stafford
View author publications
You can also search for this author in PubMed Google Scholar
Joshua P Delaney
View author publications
You can also search for this author in PubMed Google Scholar
Indira Wu
View author publications
You can also search for this author in PubMed Google Scholar
Heather S Ordonez
View author publications
You can also search for this author in PubMed Google Scholar
Susan M Grimes
View author publications
You can also search for this author in PubMed Google Scholar
Stephanie Greer
View author publications
You can also search for this author in PubMed Google Scholar
Josephine Y Lee
View author publications
You can also search for this author in PubMed Google Scholar
Kamila Belhocine
View author publications
You can also search for this author in PubMed Google Scholar
Kristina M Giorda
View author publications
You can also search for this author in PubMed Google Scholar
William H Heaton
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey P McDermott
View author publications
You can also search for this author in PubMed Google Scholar
Zachary W Bent
View author publications
You can also search for this author in PubMed Google Scholar
Francesca Meschi
View author publications
You can also search for this author in PubMed Google Scholar
Nikola O Kondov
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Jorge A Bernate
View author publications
You can also search for this author in PubMed Google Scholar
Shawn Gauby
View author publications
You can also search for this author in PubMed Google Scholar
Alex Kindwall
View author publications
You can also search for this author in PubMed Google Scholar
Clara Bermejo
View author publications
You can also search for this author in PubMed Google Scholar
Adrian N Fehr
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Chan
View author publications
You can also search for this author in PubMed Google Scholar
Serge Saxonov
View author publications
You can also search for this author in PubMed Google Scholar
Kevin D Ness
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin J Hindson
View author publications
You can also search for this author in PubMed Google Scholar
Hanlee P Ji
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.T.L., M.S.-L., M.J., J.M.B., C.M.H., S.K.-P., L. Merrill, R.B., A.J.M., Y.L., A.D.P., A.J.L., P.H., L.G., K.B., P.P., E.S.H., C.W., K.M.G., S.S., K.D.N., B.J.H. and H.P.J. designed the experiments. B.T.L., J.M.B., C.M.H., L. Merrill, J.M.T., P.A.M., P.W.W., R.B., A.J.M., Y.L., P.B., A.D.P., A.J.L., P.M., G.M.V., L. Montesclaros, M.L., L.G., D.E.B., K.B., P.P., E.S.H., C.W., J.P.D., I.W., H.S.O, J.Y.L., Z.W.B., K.M.G, G.P.M., Z.W.B., F.M., N.O.K., J.A.B., S.G., C.B., A.N.F., A.C. and B.J.H. conducted the experiments. D.A.M., R.B., A.J.M., S.W.S., S.K., J.A.B., A.K., K.D.N. and B.J.H. designed the instrument. M.S.-L., M.J., C.M.H., P.W.W., R.B., A.J.M., Y.L., A.D.P., A.J.L., P.H., L. Merrill, L.G., K.P.B., P.P., S.K., J.P.D., J.A.B., K.D.N. and B.J.H. designed reagents for phasing. B.T.L, J.M.B., E.S.H. and H.P.J. designed reagents for targeted sequencing analysis. G.X.Y.Z., M.S.-L., S.K.-P., P.M., G.K.L., D.L.S., W.H.H., R.T.W., S.S. and K.D.N. wrote the haplotype analysis algorithms. J.M.B. and S.M.G. wrote the analysis algorithms for short-read sequencing analysis. M.S.-L., P.J.M, A.W., G.K.L., D.L.S., W.H.H. and R.T.W. wrote the analysis software. G.X.Y.Z., B.T.L., M.S.-L., M.J., J.M.B., C.M.H., S.K.P., J.M.T., R.B., A.J.M., Y.L., P.B., P.M., P.H., L. Merrill, M.L., A.W., K.B., P.P., S.K., J.P.D., I.W., H.S.O., S.M.G., S. Greer, J.Y.L., Z.W.B., K.M.G., W.H.H., G.P.M., Z.W.B., F.M., J.A.B., S. Gauby, C.B., A.N.F., W.H.H., A.C., S.S., K.D.N., B.J.H. and H.P.J. analyzed the data. G.X.Y.Z., B.T.L., M.S.-L., M.J., S. Greer, B.J.H. and H.P.J. wrote the manuscript. H.P.J. oversaw the overall genetic experiments and analysis.

Corresponding authors

Correspondence to Benjamin J Hindson or Hanlee P Ji.

Ethics declarations

Competing interests

G.X.Y.Z., M.S.-L., M.J., C.M.H., S.K.-P., D.A.M., L. Merrill, J.M.T., P.A.M., P.W.W., R.B., A.J.M., Y.L., P.B., A.D.P., A.J.L., P.M., G.M.V., P.H., L. Montesclaros, M.L., L.G., A.W., D.E.B., S.W.S., K.P.B., P.P., S.K., G.K.L., D.S., J.P.D., I.W., H.S.O., J.Y.L., Z.W.B., K.M.G., W.H.H., G.P.M., Z.W.B., F.M., N.O.K., R.W., J.A.B., S. Gauby, A.K., C.B., A.N.F., A.C., S.S., K.D.N. and B.J.H. are employees of 10X Genomics.

Integrated supplementary information

Supplementary Figure 1 Barcode sequencing library and analysis software workflow.

(a) Barcoded primers are used to initiate primer extension in each droplet, which is then followed by (b) pooling of droplets, end-repair, and ligation of P7 sequencing adaptor. The library is completed by (c) sample indexing PCR and (d) sequencing on Illumina sequencers. (e) The barcode pipeline builds upon accepted aligners such as BWA and previously called variants or from variant callers such as Freebayes and GATK. It uses linked-reads to enable phasing and structural variant calling. The results are produced in standard file formats such as BAM, VCF, and BEDPE.

Supplementary Figure 2 Sequencing and phasing performance of NA12878 trio.

(a) Number of reads corresponding to each barcoded oligonucleotide is plotted against its rank to illustrate the uniformity of counts over 100,000 barcodes. (b) Pulse-field gel electrophoresis of the trio input DNA. NA12878 DNA was run on a separate gel from NA12877 and NA12882, along with 5 kb and 8-48 kb ladders to estimate the size of input DNA. (c) Gap size distribution of GemCode NA12878 WGS sample. (d) Coverage vs. GC fraction of barcode libraries from NA12878 WGS sample. The relative coverage, normalized by the median, is plotted against GC fraction brackets, spanning from 29% to 60%. (e) Cumulative distribution function of phase block length of NA12878 trio exome samples. (f) Phasing accuracy of the nuclear trio exome data.

Supplementary Figure 3 Comparison between barcoded and standard TruSeq libraries.

Coverage distributions of NA12878 from (a) phased library from 1ng of genomic DNA, (b) standard TruSeq library from 100 ng of genomic DNA. (c) Coverage statistics between NA12878 phased barcoded library versus a standard Illumina TruSeq library.

Supplementary Figure 4 Barcode overlap of structural variants.

We generated non-overlapping window size of 100 kb to visualize structural alterations with uniquely mapping, non-duplicated reads. (a) Schematics of barcode overlap in reference (WT), deletion, inversion and tandem duplication. Matrix view of representative barcode overlap patterns for (b) reference, (c) deletion, (d) inversion and (e) tandem duplication events. Barcode overlap of heterozygous (f) inversion and (g) inversion and tandem duplication events in NA12878.

Supplementary Figure 5 Barcode count analysis of eight deletion candidates in linked-read WGS data from NA12878.

(a) Barcode counts in regions of five high-scoring deletions. (b) Barcode counts in the interval covering of three low-scoring deletions.

Supplementary Figure 6 Validation of genomic deletions with targeted sequencing.

We used a targeted sequencing approach called Oligonucleotide Selective-Sequencing (OS-Seq) for validating breakpoints of the deletions. Four out of five of the high-ranked candidates had a minimum of 450 reads aligning beyond the opposite breakpoint and at least 90 reads covering the breakpoint. The remaining high scoring deletion was found to have added sequence complexity that was observed in the targeted sequencing data. An example of a high scoring deletion that was validated is shown. (a) Ribbon plot displaying the location of reads mapped to breakpoints of a high-scoring deletion. Left, position of reads mapped to the left breakpoint, where red represents probes mapping to 5’ end of the breakpoint (using coordinates at the bottom of the plot), and blue represents probes mapping to the 3’ end of the breakpoint (using coordinates at the top of the plot). Right, position of reads mapped to the right breakpoint. The y-axis indicates the index of the reads. Pink line represents the mappability of the reads, where 1 indicates unique mapping, and 0 indicates mapping to multiple places in the genome. Because the deletion is heterozygous, reads colored in red on the left plot represent reads from the wild type allele, and reads colored in blue on the left plot represents reads from the deleted haplotype. The asterisks and arrows denote locations of primer probes, their direction of capture, and their typical capture distance. (b) Validation of breakpoint structure by soft-clipped read counting. Read 1s are grouped based on primer probe (read 2) identity. Soft-clipped reads supporting the breakpoint structure are tallied based on each breakpoint’s start and end location, and are reported as reads mapping “across” the breakpoint in Supplemental Table 6. (c) IGV screenshots of read alignment from a high-scoring deletion by left and right breakpoints, and Haplotype 1 and Haplotype 2. The deletion involves Haplotype 2 is shown by missing reads from left and right breakpoints of the haplotype. (d) IGV screenshots of read alignment from a low-scoring deletion by left and right breakpoints, and Haplotype 1 and Haplotype 2. Reads are missing from the right breakpoint of both Haplotype 1 and Haplotype 2, suggesting that reads cannot be properly mapped to the breakpoint, and the breakpoint is not accurate.

Supplementary Figure 7 ALK gene fusions in NA12878 exome and NCI-H2228 WGS data.

Heatmap of barcode overlap of (a) EML4-ALK and (b) ALK-PTPN3 in NA12878 exome (a negative control). Barcode overlap of (c) EML4-ALK and (d) ALK-PTPN3 in NCI-H2228 WGS. (e) RT-PCR data of EML4-ALK and ALK-PTPN3 transcripts in NA12878 and NCI-H2228.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7 (PDF 1570 kb)

Supplementary Information

Supplementary Tables 1–6, Supplementary Tables 8–13 and Supplementary Notes 1 and 2 (PDF 1939 kb)

Supplementary Table 7 (XLSX 66 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, G., Lau, B., Schnall-Levin, M. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat Biotechnol 34, 303–311 (2016). https://doi.org/10.1038/nbt.3432

Download citation

Received: 16 May 2015
Accepted: 12 November 2015
Published: 01 February 2016
Issue Date: March 2016
DOI: https://doi.org/10.1038/nbt.3432

This article is cited by

Heteromultivalency enables enhanced detection of nucleic acid mutations
- Brendan R. Deal
- Rong Ma
- Khalid Salaita
Nature Chemistry (2024)
Chromosome-scale genome assembly of Lepus oiostolus (Lepus, Leporidae)
- Shuo Feng
- Yaying Zhang
- Yongzhi Yang
Scientific Data (2024)
Next-Generation Sequencing in Medicinal Plants: Recent Progress, Opportunities, and Challenges
- Deeksha Singh
- Shivangi Mathur
- Rajiv Ranjan
Journal of Plant Growth Regulation (2024)
The effect of hyperthyroidism on cognitive function, neuroinflammation, and necroptosis in APP/PS1 mice
- Kai Lou
- Shudong Liu
- Shizhan Ma
Journal of Translational Medicine (2023)
Pairwise comparative analysis of six haplotype assembly methods based on users’ experience
- Shuying Sun
- Flora Cheng
- Alison B. Johnson
BMC Genomic Data (2023)