Article | Published:

Picky comprehensively detects high-resolution structural variants in nanopore long reads

Nature Methodsvolume 15pages455460 (2018) | Download Citation

Abstract

Acquired genomic structural variants (SVs) are major hallmarks of cancer genomes, but they are challenging to reconstruct from short-read sequencing data. Here we exploited the long reads of the nanopore platform using our customized pipeline, Picky (https://github.com/TheJacksonLaboratory/Picky), to reveal SVs of diverse architecture in a breast cancer model. We identified the full spectrum of SVs with superior specificity and sensitivity relative to short-read analyses, and uncovered repetitive DNA as the major source of variation. Examination of genome-wide breakpoints at nucleotide resolution uncovered micro-insertions as the common structural features associated with SVs. Breakpoint density across the genome is associated with the propensity for interchromosomal connectivity and was found to be enriched in promoters and transcribed regions of the genome. Furthermore, we observed an over-representation of reciprocal translocations from chromosomal double-crossovers through phased SVs. We demonstrate that Picky analysis is an effective tool for comprehensive detection of SVs in cancer genomes from long-read data.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).

  2. 2.

    Feuk, L., Carson, A. R. & Scherer, S. W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).

  3. 3.

    1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  4. 4.

    Bochukova, E. G. et al. Large, rare chromosomal deletions associated with severe early-onset obesity. Nature 463, 666–670 (2010).

  5. 5.

    Diskin, S. J. et al. Copy number variation at 1q21.1 associated with neuroblastoma. Nature 459, 987–991 (2009).

  6. 6.

    Edwards, P. A. Fusion genes and chromosome translocations in the common epithelial cancers. J. Pathol. 220, 244–254 (2010).

  7. 7.

    Menghi, F. et al. The tandem duplicator phenotype as a distinct genomic configuration in cancer. Proc. Natl. Acad. Sci. USA 113, E2373–E2382 (2016).

  8. 8.

    Weischenfeldt, J., Symmons, O., Spitz, F. & Korbel, J. O. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat. Rev. Genet. 14, 125–138 (2013).

  9. 9.

    Stankiewicz, P. & Lupski, J. R. Structural variation in the human genome and its role in disease. Annu. Rev. Med. 61, 437–455 (2010).

  10. 10.

    Chaisson, M. J. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015).

  11. 11.

    Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).

  12. 12.

    Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

  13. 13.

    Mills, R. E. et al. Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011).

  14. 14.

    Sović, I. et al. Fast and sensitive mapping of nanopore sequencing reads with GraphMap. Nat. Commun. 7, 11307 (2016).

  15. 15.

    Spies, N. et al. Genome-wide reconstruction of complex structural variants using read clouds. Nat. Methods 14, 915–920 (2017).

  16. 16.

    Cretu Stancu, M. et al. Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nat. Commun. 8, 1326 (2017).

  17. 17.

    Sedlazeck, F.J. et al. Accurate detection of complex structural variations using single molecule sequencing. bioRxiv Preprint at https://www.biorxiv.org/content/early/2017/07/28/169557 (2017).

  18. 18.

    Jain, M. et al. Improved data analysis for the MinION nanopore sequencer. Nat. Methods 12, 351–356 (2015).

  19. 19.

    Deamer, D., Akeson, M. & Branton, D. Three decades of nanopore sequencing. Nat. Biotechnol. 34, 518–524 (2016).

  20. 20.

    Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239 (2016).

  21. 21.

    Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345 (2018).

  22. 22.

    Gazdar, A. F. et al. Characterization of paired tumor and non-tumor cell lines established from patients with breast cancer. Int. J. Cancer 78, 766–774 (1998).

  23. 23.

    Li, H. Minimap2: fast pairwise alignment for long nucleotide sequences. arXiv Preprint at https://arxiv.org/abs/1708.01492 (2017).

  24. 24.

    Kiełbasa, S. M., Wan, R., Sato, K., Horton, P. & Frith, M. C. Adaptive seeds tame genomic sequence comparison. Genome Res. 21, 487–493 (2011).

  25. 25.

    Frith, M. C., Hamada, M. & Horton, P. Parameters for accurate genome alignment. BMC Bioinformatics 11, 80 (2010).

  26. 26.

    Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).

  27. 27.

    Stephens, P. J. et al. Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature 462, 1005–1010 (2009).

  28. 28.

    Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).

  29. 29.

    Bignell, G. R. et al. Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. Genome Res. 17, 1296–1303 (2007).

  30. 30.

    Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008).

  31. 31.

    Cahill, D., Connor, B. & Carney, J. P. Mechanisms of eukaryotic DNA double strand break repair. Front. Biosci. 11, 1958–1976 (2006).

  32. 32.

    Howarth, K. D. et al. Array painting reveals a high frequency of balanced translocations in breast cancer cell lines that break in cancer-relevant genes. Oncogene 27, 3345–3359 (2008).

  33. 33.

    Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F. & Chen, L. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat. Biotechnol. 30, 90–98 (2011).

  34. 34.

    Branco, M. R. & Pombo, A. Intermingling of chromosome territories in interphase suggests role in translocations and transcription-dependent associations. PLoS Biol. 4, e138 (2006).

  35. 35.

    Tjong, H. et al. Population-based 3D genome structure analysis reveals driving forces in spatial genome organization. Proc. Natl. Acad. Sci. USA 113, E1663–E1672 (2016).

  36. 36.

    Chung, I. F. et al. DriverDBv2: a database for human cancer driver gene research. Nucleic Acids Res. 44, D975–D979 (2016).

Download references

Acknowledgements

The authors thank P. Shreckengast for collecting the HCC1187 cells; C. Robinett and A. Lau for their comments on the manuscript; and B. Hanson and M. Bolisetty for their help in setting up the initial nanopore runs. Research reported in this publication was partially supported by the National Cancer Institute of the National Institutes of Health under Award Number P30CA034196. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Author notes

  1. These authors contributed equally: Liang Gong, Chee-Hong Wong.

Affiliations

  1. The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA

    • Liang Gong
    • , Chee-Hong Wong
    • , Harianto Tjong
    • , Francesca Menghi
    • , Chew Yee Ngan
    • , Edison T. Liu
    •  & Chia-Lin Wei
  2. China Medical University, Taichung, Taiwan

    • Wei-Chung Cheng
    •  & Chia-Lin Wei

Authors

  1. Search for Liang Gong in:

  2. Search for Chee-Hong Wong in:

  3. Search for Wei-Chung Cheng in:

  4. Search for Harianto Tjong in:

  5. Search for Francesca Menghi in:

  6. Search for Chew Yee Ngan in:

  7. Search for Edison T. Liu in:

  8. Search for Chia-Lin Wei in:

Contributions

L.G., C.-H.W., and C.-L.W. designed the experiment, analyzed the data, and wrote the manuscript. L.G. performed the experiments. C.-H.W. developed the Picky pipeline. W.-C.C. analyzed the TCGA data. H.T. performed the ICP analysis. F.M., C.Y.N., E.T.L., and C.-L.W. contributed to manuscript preparation.

Competing interests

L.G., C.-H.W., and C.-L.W. have received a few batches of reagent from Oxford Nanopore. C.-L.W. has received travel and accommodation support from Oxford Nanopore as an invited speaker at the Oxford Nanopore user meeting.

Corresponding author

Correspondence to Chia-Lin Wei.

Integrated supplementary information

  1. Supplementary Figure 1 Correlation between read length and percentage of reads with breakpoints.

    Each blue dot represents a single 2D nanopore run. N = 13

  2. Supplementary Figure 2 Analysis for phased SVs using multi-breakpoint long reads.

    (a) The total counts and (b) the log-likelihood of the adjacent SVs phased by the multi-breakpoint long reads. Red count indicates observation > 2X expected. Blue count indicates observation < 0.5X expected. N = 2,374.

  3. Supplementary Figure 3 Examples of validated breakpoints and their detailed junction sequences.

    Nanopore read-to-genome alignments, junction sequences and affected genes were shown in each SV class. The micro-homologous sequences shared between junctions were highlighted in red boxes. (a) TDJ. (b) INS. (c) DEL. (d) INV. (e) TLC. The translocation t(1;8) identified is consistent with translocation identified previously by spectral karyotyping (SKY)32 with base resolution. (f) Amplified PCR fragments across breakpoints for each SVs shown in (a)-(e) were analyzed by Bioanalyzer (Agilent Technologies). L: molecular size markers. Independent repeats = 2.

  4. Supplementary Figure 4 The sensitivity and specificity of the Picky-called SVs.

    (a) Summary of the validated SVs by PCR strategy. (b) Numbers of SVs called by LUMPY from different depth of short-read data. *: deletion and DEL in INDEL. **: thresholds used in SV calling by LUMPY (see Online Methods). ***: not called by standard LUMPY pipeline. (c) The numbers of high confidence SVs previously described in HCC1187 detected by nanopore sequencing.

  5. Supplementary Figure 5 The prevalence of SV heterozygosity in the HCC1187 genome.

    (a) PCR products corresponding to different haplotypes in two validated SVs. Independent repeats = 2. (b) Reads supporting both SV and the normal genotypes from the same locus were visualized in IGV browser. (c) Heterozygosity analysis from 50 randomly selected loci from each of the seven SV types.

  6. Supplementary Figure 6 A comprehensive comparison of SV detection in long-read and short-read analyses.

    LR, long-read. SR, short-read. (a) Numbers of SVs found in each data and their overlaps. (b) Distributions of the SV span size.

  7. Supplementary Figure 7 Comparison of Picky, Sniffles, and NanoSV.

    Overview of the different components and features among Picky, Sniffles and NanoSV. Yes represents the SV type can be reported by the pipeline while N/A represents that cannot be reported.

  8. Supplementary Figure 8 The SV span distributions and the SVs enriched in repeat regions.

    (a) The span distribution of DEL, INS and INDEL. (b) Relative percentages of repeats across different span sizes in simple DEL. (c) Relative percentages of repeats across different span sizes in simple INS.

  9. Supplementary Figure 9 Selected cases of micro-insertions from nanopore results confirmed by PacBio sequencing.

    (a) A 36 bp insertion associated with a 329 bp deletion on chromosome 20. (b) A 75 bp insertion associated with a 3,262 bp deletion on chromosome X.

  10. Supplementary Figure 10 Distribution of the SV breakpoints along the genomic features of transcription.

    (a) Enrichment of breakpoint from each SV class. (b) Distributions of the breakpoints from different types of TDCs.

  11. Supplementary Figure 11 Control of the multidimensional scaling (MDS) analysis.

    (a) Histogram of gene expression from SVs-genes (log2 transferred). (b) Histogram of gene expression from the control genes. Similar expression profiles and the equivalent numbers of SVs-genes are shown (log2 transferred). (c) The MDS plot expressions of the SVs-genes by sample-wise permutation. Red, non-TNBC tissues, n = 851. Green, TNBC tissues, n = 113. (d) The MDS plot of the expressions from the control genes. All data are from the breast carcinoma (BRCA) dataset within the cancer genome atlas (TCGA). Red, non-TNBC tissues, n = 851. Green, TNBC tissues, n = 113.

  12. Supplementary Figure 12 The logic and criteria used to define seven SV types by Picky.

    High-scoring Segment Pair (HSP) i between the read segment and the reference segment is denoted by Qi and Si respectively. Linked alignment extensions with 3 segments will have 3 HSPs indicated by Q1:S1, Q2:S2 and Q3:S3. Each segment span is denoted by (start,end] as per UCSC 0-start, half-open coordinate system. sDiff between reference segment Si and Si+1 is given by Si+1(start)-Si(end). qDiff between read segment Qi and Qi+1 is given by Qi+1(start)-Qi(end).

  13. Supplementary Figure 13 Homopolymer analysis of nanopore reads.

    (a) The ratio of the observed versus expected instances of all 1,024 5-mers. Highlighted are the 4 under-called homopolymers. (b) The annotated current trace for the segment harboring basecalled deletion. The trace indicates the clear existence of the two homopolymers (marked (A)20 and (T)18) rather than the deletion flanked by (A)5 and (T)5.

  14. Supplementary Figure 14

    Overview of the process of assigning breakpoints to their corresponding genomic features on the basis of the gene model.

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1–14

  2. Reporting Summary

  3. Combined Supplementary Information

    Supplementary Note 1

  4. Supplementary Table 1

    Summary of the 15 nanopore runs in this study

  5. Supplementary Table 2

    Summary of the mapping and SV-calling results

  6. Supplementary Table 3

    List of seven SV types detected in nanopore data

  7. Supplementary Table 4

    SVs selected for validation analysis

  8. Supplementary Table 5

    Details of nanopore sequencing kits, devices, and software

  9. Supplementary Table 6

    List of all primers used in this study

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41592-018-0002-6